An adjacency list is simply an unordered list that describes connections between vertices. It’s a commonly used input format for graphs. In this post, I use the melt() function from the reshape2 package to create an adjacency list from a correlation matrix. I use the geneData dataset, which consists of real but anonymised microarray expression data, from the Biobase package as an example. Finally, I’ll show some features of the igraph package.

In the first section, I will load the dataset, calculate the correlations, and finally create the adjacency list.

# install if necessary # source("https://bioconductor.org/biocLite.R") # biocLite("Biobase") library(Biobase) data(geneData, package = "Biobase") dim(geneData) [1] 500 26 # correlation of samples my_cor_matrix <- cor(geneData) # symmetrical matrix my_cor_matrix[1:6, 1:6] A B C D E F A 1.0000000 0.9471591 0.9191233 0.9403129 0.9689690 0.9621362 B 0.9471591 1.0000000 0.8888108 0.9108174 0.9703906 0.9425248 C 0.9191233 0.8888108 1.0000000 0.8788107 0.9077245 0.9078971 D 0.9403129 0.9108174 0.8788107 1.0000000 0.9477058 0.9017977 E 0.9689690 0.9703906 0.9077245 0.9477058 1.0000000 0.9531459 F 0.9621362 0.9425248 0.9078971 0.9017977 0.9531459 1.0000000 dim(my_cor_matrix) [1] 26 26 # replace upper triangle of the matrix # with number that can be used for filtering my_cor_matrix[upper.tri(my_cor_matrix)] <- 42 # install if necessary # install.packages('reshape2') library(reshape2) my_cor_df <- melt(my_cor_matrix) head(my_cor_df) Var1 Var2 value 1 A A 1.0000000 2 B A 0.9471591 3 C A 0.9191233 4 D A 0.9403129 5 E A 0.9689690 6 F A 0.9621362 dim(my_cor_df) [1] 676 3 # install if ncessary # install.packages('dplyr') library(dplyr) # filter out the upper matrix values # filter out the self correlations my_cor_df <- filter(my_cor_df, value != 42) %>% filter(Var1 != Var2) dim(my_cor_df) [1] 325 3 head(my_cor_df) Var1 Var2 value 1 B A 0.9471591 2 C A 0.9191233 3 D A 0.9403129 4 E A 0.9689690 5 F A 0.9621362 6 G A 0.9529811 # all of the samples are positively correlated to each other summary(my_cor_df$value) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.6645 0.8945 0.9284 0.9140 0.9522 0.9923 # create adjacency list with correlations > 0.95 my_adj_list <- my_cor_df %>% filter(value > 0.95) names(my_adj_list) <- c('from', 'to', 'weight') class(my_adj_list) [1] "data.frame" dim(my_adj_list) [1] 89 3

Next, I’ll just create some simple graphs using the igraph package.

# install if necessary # install.packages('igraph') library(igraph) # create igraph S3 object net <- graph.data.frame(my_adj_list, directed = FALSE) # store original margins orig_mar <- par()$mar # set new margins to limit whitespace in plot par(mar=rep(.1, 4)) # not much difference in the edge width given the values # but I included it for reference's sake plot(net, layout = layout_components(net), edge.width = E(net)$weight) plot(net, layout = layout_components(net), edge.width = E(net)$weight, vertex.shape="none")

*A simple graph without the node shape*.

The igraph package can detect communities or subgraphs using the Newman-Girvan algorithm.

# community detection based on edge betweenness (Newman-Girvan) ceb <- cluster_edge_betweenness(net) class(ceb) [1] "communities" plot(ceb, net) # community membership for each node membership(ceb) E F G I J K L M O P U V W Y Q S X T N A B D H 1 1 2 1 3 1 1 1 4 1 2 2 1 1 5 5 5 1 1 1 1 5 1 par(mar=orig_mar) dendPlot(ceb, mode="hclust")

*A graph with each node grouped by community membership*.

*A simple dendrogram created using hierarchical clustering*.

Lastly, the igraph package has various functions for showing the properties of your graph.

# the edges of your graph E(net) + 89/89 edges (vertex names): [1] E--A F--A G--A I--A J--A K--A L--A M--A O--A P--A U--A V--A W--A Y--A E--B I--B K--B P--B W--B Y--B M--D Q--D S--D X--D [25] E--F E--I E--K E--L E--M E--P E--T E--W E--X E--Y F--K F--M F--P F--W F--Y G--U G--V I--H K--H L--H P--H T--H W--H I--K [49] I--L I--M I--P I--Q I--T I--W I--Y K--L K--M K--N K--P K--Q K--T K--W K--Y L--M L--N L--P L--Q L--T L--W L--Y M--P M--S [73] M--T M--W M--X M--Y O--U O--Y P--T P--W P--Y Q--S Q--T Q--X S--X W--T X--T U--V W--Y # the vertices of your graph V(net) + 23/23 vertices, named: [1] E F G I J K L M O P U V W Y Q S X T N A B D H # the proportion of present edges from all possible edges in the network edge_density(net, loops = FALSE) [1] 0.3517787 # all nodes fully connected to every other node full_graph <- make_full_graph(23) edge_density(full_graph, loops = FALSE) [1] 1 # degree distribution of the vertices degree(net, mode="all") E F G I J K L M O P U V W Y Q S X T N A B D H 12 7 3 12 1 14 12 13 3 12 4 3 12 11 7 4 6 10 2 14 6 4 6 # cliques length(cliques(net)) [1] 965 largest_cliques(net) [[1]] + 9/23 vertices, named: [1] K I L P W E M Y A largest_cliques(full_graph) [[1]] + 23/23 vertices: [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

If you are interested in the igraph package, I highly recommend this tutorial; you can start reading from “Networks in igraph”, though the introductory material is great too.

This work is licensed under a Creative Commons

Attribution 4.0 International License.