We often need to select some columns out of all the columns in the dataframe for our analyses. We can do so using the dplyr package in R. In this blog, we’ll see some common functions to filter the columns. We’ll be using ‘iris’ dataset which is the built-in dataset in R.
select() function
Using select() function, we can select the columns we want or don’t want.
#Select columns Species & Sepal.Length from iris dataset
iris%>%select(Species, Sepal.Length)
#Exclude Species column
iris %>%select(-Species)
#Provide range of columns
iris %>%select(Sepal.Length:Petal.Length)
#Exclude group of columns
iris %>%select(-(Sepal.Length:Petal.Length))
Name based Selection
We can select the columns containing the name or string.
#Return columns beginning with 'S'
iris %>%select(starts_with("S"))
#Return columns ending with 's'
iris %>%select(ends_with("s"))
#Return columns containing string 'Length'
iris %>%select(contains("Length"))
Content based Selection
We can also select the columns using some criteria or custom conditions.
#Select only numeric columns
iris %>%select_if(is.numeric)
#Select numeric columns where number of unique values in the column is more than 30. (Use ~ to denote we're writing a custom condition
iris %>%select_if(~is.numeric(.) & n_distinct(.)>30)
If you want to reuse some conditions multiple times, we can convert it into a function using as_mapper()
custom_cond <- as_mapper(
~is.numeric(.) & n_distinct(.)>30
)
This can be used in a standalone fashion or within select_if() functions.
#Returns TRUE/FALSE
custom_cond(LETTERS)
custom_cond(1:50)
#Use in select_if() function
iris%>%select_if(custom_cond)
Thank you for reading 🙂
References: https://itsalocke.com/files/DataManipulationinR.pdf
Leave a Reply