Filtering rows are always required while working with dataframes in R. In this blog, we’ll see common functions of dplyr package that we can use to filter the rows in various ways. We’ll be using ‘iris’ dataset in our examples which is built-in dataset in R.
- slice()
Slice() function takes vector of values that denote the positions. They can be positive for including the rows and negative for excluding the rows.
#Select top 5 rows
iris%>%slice(1:5)
#Exclude row 3 from top
iris%>%slice(-3)
#Remove top 50 rows. [ n() returns total rows ]
iris %>%slice(-(1:floor(n()/3)))
2. filter()
filter() function filters the rows based on certain conditions if the condition evaluates to True.
#Filter data with Species='Virginica'
iris%>%filter(Species=="virginica")
#Filter data with Species='Virginica' and Sepal.Length >= mean of Sepal.Length
iris%>%filter(Species=="virginica" & Sepal.Length >= mean(Sepal.Length))
3. filter_all()
filter_all() applies the filter to each column. It returns only the rows where condition is TRUE for all columns (AND) or where condition is TRUE for any single column (OR).
- If condition is TRUE for all the columns, wrap the condition in all_vars()
- If condition id TRUE for any one of the columns, wrap the condition in any_vars()
#Return any row where a column's value exceeds a 7
iris%>%filter_all(any_vars(.>7.5))
#Return each row where every numeric column's value is smaller than average
data %>%filter_all(all_vars(. < mean(.)))
4. filter_if()
filter_if() first applies a column level check and then filter the rows.
#Return each row where every numeric column's value is smaller than average
iris %>%filter_if(is.numeric, all_vars(.<mean(.)))
We can also use custom functions by using a tilde (~) and data place holder (.)
#For all numeric columns and if distinct count of rows >20 in dataframe, return rows where column's value is smaller than average
iris %>%filter_if(~is.numeric(.) & n_distinct(.)>20,any_vars(.<mean(.)))
5. filter_at()
filter_at() applies filter to columns that match some criteria.
#Based on columns which ends with 'Length', return rows where column's value is smaller than average
iris %>%filter_at(vars(ends_with("Length")),all_vars(.<mean(.)))
Thank you for reading 🙂
References: https://itsalocke.com/files/DataManipulationinR.pdf
Leave a Reply