Merging multiple data frames

When I searched for “merging multiple data frames”, I got this hit as the top result. The post was a bit too terse for me, so here’s my expansion of merging multiple data frames in R.

First let’s create multiple data frames that we wish to merge together:

set.seed(31)
a <- data.frame(one=sample(1:10), two=runif(10,0,1))
set.seed(21)
b <- data.frame(one=sample(1:10), two=runif(10,0,1))
set.seed(11)
c <- data.frame(one=sample(1:10), two=runif(10,0,1))

head(a)
#  one        two
#1   6 0.32825976
#2   9 0.83023225
#3   4 0.65186753
#4   3 0.01410936
#5  10 0.82100377
#6   7 0.11513744

head(b)
#  one        two
#1   8 0.66754012
#2   3 0.93521022
#3   6 0.05818433
#4   2 0.61861583
#5  10 0.17491846
#6   5 0.03767539

head(c)
#  one       two
#1   3 0.1751129
#2   1 0.4407503
#3   5 0.9071830
#4   9 0.8510419
#5   7 0.7339875
#6   8 0.5736857

We now have three data frames: a, b, and c. I want to merge them based on the first column, which is not in the same order in each data frame.

#if a, b, and c were in the same order
#things would be much easier
cbind(a,b$two,c$two)
#   one        two      b$two     c$two
#1    6 0.32825976 0.66754012 0.1751129
#2    9 0.83023225 0.93521022 0.4407503
#3    4 0.65186753 0.05818433 0.9071830
#4    3 0.01410936 0.61861583 0.8510419
#5   10 0.82100377 0.17491846 0.7339875
#6    7 0.11513744 0.03767539 0.5736857
#7    8 0.10124820 0.52531317 0.4817655
#8    1 0.47392895 0.28218425 0.3306110
#9    2 0.23244771 0.49904520 0.1576602
#10   5 0.10766965 0.63382510 0.4801341

#if I wanted to merge a and b
#I could use the merge() function
merge(a,b, by='one')
#   one      two.x      two.y
#1    1 0.47392895 0.52531317
#2    2 0.23244771 0.61861583
#3    3 0.01410936 0.93521022
#4    4 0.65186753 0.28218425
#5    5 0.10766965 0.03767539
#6    6 0.32825976 0.05818433
#7    7 0.11513744 0.49904520
#8    8 0.10124820 0.66754012
#9    9 0.83023225 0.63382510
#10  10 0.82100377 0.17491846

#using the reshape and reshape2 package
library(reshape)
library(reshape2)
list_df <- list(a,b,c)
merged_df <- merge_all(list_df)
head(merged_df)
#   one       two
#8    1 0.4739290
#17   1 0.5253132
#22   1 0.4407503
#9    2 0.2324477
#14   2 0.6186158
#29   2 0.1576602

#merging done in the order of a, b, c
#check below results with head(merged_df)
a[a$one==1,]
#  one      two
#8   1 0.473929

b[b$one==1,]
#  one       two
#7   1 0.5253132

c[ c$one == 1, ]
#  one       two
#2   1 0.4407503

Now what if I wanted the merged dataset in a data frame like object?

blah <- data.frame(one=merged_df$one, class=rep(c('a','b','c'), 10), two=merged_df$two)
head(blah)
#  one class       two
#1   1     a 0.4739290
#2   1     b 0.5253132
#3   1     c 0.4407503
#4   2     a 0.2324477
#5   2     b 0.6186158
#6   2     c 0.1576602

abc <- dcast(blah, one ~ class)
#Using two as value column: use value.var to override.
abc
#   one          a          b         c
#1    1 0.47392895 0.52531317 0.4407503
#2    2 0.23244771 0.61861583 0.1576602
#3    3 0.01410936 0.93521022 0.1751129
#4    4 0.65186753 0.28218425 0.3306110
#5    5 0.10766965 0.03767539 0.9071830
#6    6 0.32825976 0.05818433 0.4817655
#7    7 0.11513744 0.49904520 0.7339875
#8    8 0.10124820 0.66754012 0.5736857
#9    9 0.83023225 0.63382510 0.8510419
#10  10 0.82100377 0.17491846 0.4801341

#visualise http://davetang.org/muse/2012/04/09/creating-a-matrix-of-scatter-plots-in-r/
pairs(~a+b+c, abc)

merge_df_pairsPaired scatter plots of a, b, and c using the merged data frame.

See also

Tips on merging data frames.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
Posted in RTagged

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.