I recently realised that dplyr can be used to aggregate and summarise data the same way that aggregate() does. I wrote a post on using the aggregate() function in R back in 2013 and in this post I'll contrast between dplyr and aggregate().
I'll use the same ChickWeight data set as per my previous post.
?ChickWeight # The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on early growth of chicks. # ... data <- ChickWeight
Finding the mean weight depending on diet:
aggregate(data$weight, list(diet = data$Diet), mean) diet x 1 1 102.6455 2 2 122.6167 3 3 142.9500 4 4 135.2627 # alternatively using a formula # the weight is dependent on the diet # diet explains the weight response aggregate(weight ~ Diet, data = data, mean) Diet weight 1 1 102.6455 2 2 122.6167 3 3 142.9500 4 4 135.2627 # dplyr approach group_by(data, Diet) %>% summarise(mean = mean(weight)) # A tibble: 4 x 2 Diet mean <fctr> <dbl> 1 1 102.6455 2 2 122.6167 3 3 142.9500 4 4 135.2627
Aggregating on time.
aggregate(data$weight, list(time=data$Time), mean) time x 1 0 41.06000 2 2 49.22000 3 4 59.95918 4 6 74.30612 5 8 91.24490 6 10 107.83673 7 12 129.24490 8 14 143.81250 9 16 168.08511 10 18 190.19149 11 20 209.71739 12 21 218.68889 group_by(data, Time) %>% summarise(mean = mean(weight)) # A tibble: 12 x 2 Time mean <dbl> <dbl> 1 0 41.06000 2 2 49.22000 3 4 59.95918 4 6 74.30612 5 8 91.24490 6 10 107.83673 7 12 129.24490 8 14 143.81250 9 16 168.08511 10 18 190.19149 11 20 209.71739 12 21 218.68889
Aggregating on two variables.
head(aggregate(data$weight, list(time = data$Time, diet = data$Diet), mean)) time diet x 1 0 1 41.40000 2 2 1 47.25000 3 4 1 56.47368 4 6 1 66.78947 5 8 1 79.68421 6 10 1 93.05263 # alternatively head(aggregate(weight ~ Time + Diet, data = data, mean)) Time Diet weight 1 0 1 41.40000 2 2 1 47.25000 3 4 1 56.47368 4 6 1 66.78947 5 8 1 79.68421 6 10 1 93.05263 group_by(data, Diet, Time) %>% summarise(mean = mean(weight)) Source: local data frame [48 x 3] Groups: Diet [?] Diet Time mean <fctr> <dbl> <dbl> 1 1 0 41.40000 2 1 2 47.25000 3 1 4 56.47368 4 1 6 66.78947 5 1 8 79.68421 6 1 10 93.05263 7 1 12 108.52632 8 1 14 123.38889 9 1 16 144.64706 10 1 18 158.94118 # ... with 38 more rows
Aggregating and calculating two summaries.
aggregate(weight ~ Diet, data = data, FUN = function(x) c(mean = mean(x), n = length(x))) Diet weight.mean weight.n 1 1 102.6455 220.0000 2 2 122.6167 120.0000 3 3 142.9500 120.0000 4 4 135.2627 118.0000 group_by(data, Diet) %>% summarise(mean = mean(weight), n = length(weight)) # A tibble: 4 x 3 Diet mean n <fctr> <dbl> <int> 1 1 102.6455 220 2 2 122.6167 120 3 3 142.9500 120 4 4 135.2627 118
Aggregating on a data subset.
aggregate(weight ~ Diet, data = subset(data, Diet!=1), mean) Diet weight 1 2 122.6167 2 3 142.9500 3 4 135.2627 data %>% filter(Diet != 1) %>% group_by(Diet) %>% summarise(mean = mean(weight)) # A tibble: 3 x 2 Diet mean <fctr> <dbl> 1 2 122.6167 2 3 142.9500 3 4 135.2627
Summary
I prefer the dplyr approach, which allows you to "pipe" or "chain" different functions. Once you learn the dplyr functions a.k.a. verbs, you can easily string together a nice pipeline.
data %>% filter(Diet != 1) %>% group_by(Diet) %>% summarise(mean = mean(weight)) %>% arrange(mean) # A tibble: 3 x 2 Diet mean <fctr> <dbl> 1 2 122.6167 2 4 135.2627 3 3 142.9500

This work is licensed under a Creative Commons
Attribution 4.0 International License.
I think plyr feels easier than dplyr
What if i want to agregate a whole dataset
As in for your case you only agregated weight
Suppose i had, other variables like height,BMI etc
How would i agregate them in dplyr
I guess there might be an easier way if you want to add a lot of variables, but it is possible, to add them to the expression, separated by a comma. In this case it would make sense to name the first mean by the variable name. This means, that after “weight = mean(weight)” you could add “, height = mean(weight)”.
Can dplyr:summarise can aggregate by FUN=”sum” ?
Yes, you can use the sum function.