I recently realised that dplyr can be used to aggregate and summarise data the same way that aggregate() does. I wrote a post on using the aggregate() function in R back in 2013 and in this post I'll contrast between dplyr and aggregate().
I'll use the same ChickWeight data set as per my previous post.
?ChickWeight # The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on early growth of chicks. # ... data <- ChickWeight
Finding the mean weight depending on diet:
aggregate(data$weight, list(diet = data$Diet), mean)
diet x
1 1 102.6455
2 2 122.6167
3 3 142.9500
4 4 135.2627
# alternatively using a formula
# the weight is dependent on the diet
# diet explains the weight response
aggregate(weight ~ Diet, data = data, mean)
Diet weight
1 1 102.6455
2 2 122.6167
3 3 142.9500
4 4 135.2627
# dplyr approach
group_by(data, Diet) %>% summarise(mean = mean(weight))
# A tibble: 4 x 2
Diet mean
<fctr> <dbl>
1 1 102.6455
2 2 122.6167
3 3 142.9500
4 4 135.2627
Aggregating on time.
aggregate(data$weight, list(time=data$Time), mean)
time x
1 0 41.06000
2 2 49.22000
3 4 59.95918
4 6 74.30612
5 8 91.24490
6 10 107.83673
7 12 129.24490
8 14 143.81250
9 16 168.08511
10 18 190.19149
11 20 209.71739
12 21 218.68889
group_by(data, Time) %>% summarise(mean = mean(weight))
# A tibble: 12 x 2
Time mean
<dbl> <dbl>
1 0 41.06000
2 2 49.22000
3 4 59.95918
4 6 74.30612
5 8 91.24490
6 10 107.83673
7 12 129.24490
8 14 143.81250
9 16 168.08511
10 18 190.19149
11 20 209.71739
12 21 218.68889
Aggregating on two variables.
head(aggregate(data$weight,
list(time = data$Time, diet = data$Diet),
mean))
time diet x
1 0 1 41.40000
2 2 1 47.25000
3 4 1 56.47368
4 6 1 66.78947
5 8 1 79.68421
6 10 1 93.05263
# alternatively
head(aggregate(weight ~ Time + Diet, data = data, mean))
Time Diet weight
1 0 1 41.40000
2 2 1 47.25000
3 4 1 56.47368
4 6 1 66.78947
5 8 1 79.68421
6 10 1 93.05263
group_by(data, Diet, Time) %>% summarise(mean = mean(weight))
Source: local data frame [48 x 3]
Groups: Diet [?]
Diet Time mean
<fctr> <dbl> <dbl>
1 1 0 41.40000
2 1 2 47.25000
3 1 4 56.47368
4 1 6 66.78947
5 1 8 79.68421
6 1 10 93.05263
7 1 12 108.52632
8 1 14 123.38889
9 1 16 144.64706
10 1 18 158.94118
# ... with 38 more rows
Aggregating and calculating two summaries.
aggregate(weight ~ Diet, data = data, FUN = function(x) c(mean = mean(x), n = length(x)))
Diet weight.mean weight.n
1 1 102.6455 220.0000
2 2 122.6167 120.0000
3 3 142.9500 120.0000
4 4 135.2627 118.0000
group_by(data, Diet) %>% summarise(mean = mean(weight), n = length(weight))
# A tibble: 4 x 3
Diet mean n
<fctr> <dbl> <int>
1 1 102.6455 220
2 2 122.6167 120
3 3 142.9500 120
4 4 135.2627 118
Aggregating on a data subset.
aggregate(weight ~ Diet, data = subset(data, Diet!=1), mean)
Diet weight
1 2 122.6167
2 3 142.9500
3 4 135.2627
data %>%
filter(Diet != 1) %>%
group_by(Diet) %>%
summarise(mean = mean(weight))
# A tibble: 3 x 2
Diet mean
<fctr> <dbl>
1 2 122.6167
2 3 142.9500
3 4 135.2627
Summary
I prefer the dplyr approach, which allows you to "pipe" or "chain" different functions. Once you learn the dplyr functions a.k.a. verbs, you can easily string together a nice pipeline.
data %>%
filter(Diet != 1) %>%
group_by(Diet) %>%
summarise(mean = mean(weight)) %>%
arrange(mean)
# A tibble: 3 x 2
Diet mean
<fctr> <dbl>
1 2 122.6167
2 4 135.2627
3 3 142.9500

This work is licensed under a Creative Commons
Attribution 4.0 International License.
I think plyr feels easier than dplyr
What if i want to agregate a whole dataset
As in for your case you only agregated weight
Suppose i had, other variables like height,BMI etc
How would i agregate them in dplyr
I guess there might be an easier way if you want to add a lot of variables, but it is possible, to add them to the expression, separated by a comma. In this case it would make sense to name the first mean by the variable name. This means, that after “weight = mean(weight)” you could add “, height = mean(weight)”.
Can dplyr:summarise can aggregate by FUN=”sum” ?
Yes, you can use the sum function.