Matrix to adjacency list in R

An adjacency list is simply an unordered list that describes connections between vertices. It's a commonly used input format for graphs. In this post, I use the melt() function from the reshape2 package to create an adjacency list from a correlation matrix. I use the geneData dataset, which consists of real but anonymised microarray expression data, from the Biobase package as an example. Finally, I'll show some features of the igraph package.

In the first section, I will load the dataset, calculate the correlations, and finally create the adjacency list.

# install if necessary
# source("https://bioconductor.org/biocLite.R")
# biocLite("Biobase")

library(Biobase)
data(geneData, package = "Biobase")
dim(geneData)
[1] 500  26

# correlation of samples
my_cor_matrix <- cor(geneData)

# symmetrical matrix
my_cor_matrix[1:6, 1:6]
          A         B         C         D         E         F
A 1.0000000 0.9471591 0.9191233 0.9403129 0.9689690 0.9621362
B 0.9471591 1.0000000 0.8888108 0.9108174 0.9703906 0.9425248
C 0.9191233 0.8888108 1.0000000 0.8788107 0.9077245 0.9078971
D 0.9403129 0.9108174 0.8788107 1.0000000 0.9477058 0.9017977
E 0.9689690 0.9703906 0.9077245 0.9477058 1.0000000 0.9531459
F 0.9621362 0.9425248 0.9078971 0.9017977 0.9531459 1.0000000

dim(my_cor_matrix)
[1] 26 26

# replace upper triangle of the matrix
# with number that can be used for filtering
my_cor_matrix[upper.tri(my_cor_matrix)] <- 42

# install if necessary
# install.packages('reshape2')
library(reshape2)
my_cor_df <- melt(my_cor_matrix)

head(my_cor_df)
  Var1 Var2     value
1    A    A 1.0000000
2    B    A 0.9471591
3    C    A 0.9191233
4    D    A 0.9403129
5    E    A 0.9689690
6    F    A 0.9621362

dim(my_cor_df)
[1] 676   3

# install if ncessary
# install.packages('dplyr')
library(dplyr)

# filter out the upper matrix values
# filter out the self correlations
my_cor_df <- filter(my_cor_df, value != 42) %>% filter(Var1 != Var2)
dim(my_cor_df)
[1] 325   3

head(my_cor_df)
  Var1 Var2     value
1    B    A 0.9471591
2    C    A 0.9191233
3    D    A 0.9403129
4    E    A 0.9689690
5    F    A 0.9621362
6    G    A 0.9529811

# all of the samples are positively correlated to each other
summary(my_cor_df$value)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.6645  0.8945  0.9284  0.9140  0.9522  0.9923

# create adjacency list with correlations > 0.95
my_adj_list <- my_cor_df %>% filter(value > 0.95)
names(my_adj_list) <- c('from', 'to', 'weight')

class(my_adj_list)
[1] "data.frame"

dim(my_adj_list)
[1] 89  3

Next, I'll just create some simple graphs using the igraph package.

# install if necessary
# install.packages('igraph')
library(igraph)

# create igraph S3 object
net <- graph.data.frame(my_adj_list, directed = FALSE)

# store original margins
orig_mar <- par()$mar

# set new margins to limit whitespace in plot
par(mar=rep(.1, 4))

# not much difference in the edge width given the values
# but I included it for reference's sake
plot(net, layout = layout_components(net), edge.width = E(net)$weight)
plot(net, layout = layout_components(net), edge.width = E(net)$weight, vertex.shape="none")

A simple graph.

A simple graph without the node shape.

The igraph package can detect communities or subgraphs using the Newman-Girvan algorithm.

# community detection based on edge betweenness (Newman-Girvan)
ceb <- cluster_edge_betweenness(net)
class(ceb)
[1] "communities"

plot(ceb, net)

# community membership for each node
membership(ceb)
E F G I J K L M O P U V W Y Q S X T N A B D H 
1 1 2 1 3 1 1 1 4 1 2 2 1 1 5 5 5 1 1 1 1 5 1 

par(mar=orig_mar)
dendPlot(ceb, mode="hclust")

A graph with each node grouped by community membership.

A simple dendrogram created using hierarchical clustering.

Lastly, the igraph package has various functions for showing the properties of your graph.

# the edges of your graph
E(net)
+ 89/89 edges (vertex names):
 [1] E--A F--A G--A I--A J--A K--A L--A M--A O--A P--A U--A V--A W--A Y--A E--B I--B K--B P--B W--B Y--B M--D Q--D S--D X--D
[25] E--F E--I E--K E--L E--M E--P E--T E--W E--X E--Y F--K F--M F--P F--W F--Y G--U G--V I--H K--H L--H P--H T--H W--H I--K
[49] I--L I--M I--P I--Q I--T I--W I--Y K--L K--M K--N K--P K--Q K--T K--W K--Y L--M L--N L--P L--Q L--T L--W L--Y M--P M--S
[73] M--T M--W M--X M--Y O--U O--Y P--T P--W P--Y Q--S Q--T Q--X S--X W--T X--T U--V W--Y

# the vertices of your graph
V(net)
+ 23/23 vertices, named:
 [1] E F G I J K L M O P U V W Y Q S X T N A B D H

# the proportion of present edges from all possible edges in the network
edge_density(net, loops = FALSE)
[1] 0.3517787

# all nodes fully connected to every other node
full_graph <- make_full_graph(23)
edge_density(full_graph, loops = FALSE)
[1] 1

# degree distribution of the vertices
degree(net, mode="all")
 E  F  G  I  J  K  L  M  O  P  U  V  W  Y  Q  S  X  T  N  A  B  D  H 
12  7  3 12  1 14 12 13  3 12  4  3 12 11  7  4  6 10  2 14  6  4  6

# cliques
length(cliques(net))
[1] 965

largest_cliques(net)
[[1]]
+ 9/23 vertices, named:
[1] K I L P W E M Y A

largest_cliques(full_graph)
[[1]]
+ 23/23 vertices:
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

If you are interested in the igraph package, I highly recommend this tutorial; you can start reading from "Networks in igraph", though the introductory material is great too.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.