My modification of code which is originally available here. Probably easier to understand the original code. I altered the code so that I could use an anonymous 2d array and with strictures, so that I could plug it into my own code.
Comments are included in the code below to assist use.
#!/usr/bin/perl
use strict;
use warnings;
#make up some matrix for demonstration purposes
my $x = [];
#generate 10 row
for (my $i=1;$i<11;++$i){
#generate 2 columns, where column 1 = $i and column 2 = $i
for (my $j=1;$j<3;++$j){
$x->[$i][$j]= $i;
}
}
#end matrix code
my $correlation = correlation($x);
#Since the values in the columns are identical in $x, the correlation will be 1
print "$correlation\n";
#to use this code, remove the matrix code above
#generate an anonymous 2D array where $x->[1] is the row
#$x->[1][1] is the value in row 1 column 1 and $x->[1][2] is the value of row 1 column 2
#once you build the entire array, pass it to the correlation subroutine as above
#my $corrleation = correlation($x)
#if you want to see what's inside $x use the code below
#for (my $i = 1; $i < scalar(@{$x}); ++$i){
# my $line_to_print = '';
# for (my $j = 1; $j < scalar(@{$x->[$i]}); ++$j){
# $line_to_print .= "$x->[$i]->[$j]\t";
# }
# $line_to_print =~ s/\t$//;
# print "$line_to_print\n";
#}
sub mean {
my ($x)=@_;
my $num = scalar(@{$x}) - 1;
my $sum_x = '0';
my $sum_y = '0';
for (my $i = 1; $i < scalar(@{$x}); ++$i){
$sum_x += $x->[$i][1];
$sum_y += $x->[$i][2];
}
my $mu_x = $sum_x / $num;
my $mu_y = $sum_y / $num;
return($mu_x,$mu_y);
}
### ss = sum of squared deviations to the mean
sub ss {
my ($x,$mean_x,$mean_y,$one,$two)=@_;
my $sum = '0';
for (my $i=1;$i<scalar(@{$x});++$i){
$sum += ($x->[$i][$one]-$mean_x)*($x->[$i][$two]-$mean_y);
}
return $sum;
}
sub correlation {
my ($x) = @_;
my ($mean_x,$mean_y) = mean($x);
my $ssxx=ss($x,$mean_x,$mean_y,1,1);
my $ssyy=ss($x,$mean_x,$mean_y,2,2);
my $ssxy=ss($x,$mean_x,$mean_y,1,2);
my $correl=correl($ssxx,$ssyy,$ssxy);
my $xcorrel=sprintf("%.4f",$correl);
return($xcorrel);
}
sub correl {
my($ssxx,$ssyy,$ssxy)=@_;
my $sign=$ssxy/abs($ssxy);
my $correl=$sign*sqrt($ssxy*$ssxy/($ssxx*$ssyy));
return $correl;
}

This work is licensed under a Creative Commons
Attribution 4.0 International License.
for (my $i=1;$i[$i][$one]-$mean_x)*($x->[$i][$two]-$mean_y);
}
…. ?
Thanks Mark. The code wasn’t showing properly before.
No problem.
I also made re-write of the original code because I kept getting correl=1.00 for everything when I tried to do this myself. The code is here:
http://pastebin.com/i5m4qa99
Thanks for sharing 🙂
Can you please tell me about the format of both input files. query and annotaion files
note: I shift off the first element of the arrays because they contained array id’s in my data files.
Can one also get a p-value from this?
Hi Roy,
The code will only return the Pearson correlation coefficient.
Cheers,
Dave
Shouldn’t this:
my $ssxx=ss($x,$mean_x,$mean_y,1,1);
my $ssyy=ss($x,$mean_x,$mean_y,2,2);
be this:
my $ssxx=ss($x,$mean_x,$mean_x,1,1);
my $ssyy=ss($x,$mean_y,$mean_y,2,2);
That’s how it looks to be in your original code.
Rob
Hi Rob,
When you said “your original code”, did you mean the link I reference right at the start of this post (http://homepages.ulb.ac.be/~dgonze/SCRIPTS/PERL/correlation.pl)? If so, the code was written by someone else; I merely just modified it so I get the correlation value by passing an array reference.
If you were referring to that particular original code, then here’s my answer. The original code did not use strictures (i.e. use strict) and thus the ss() function could access the values of $mean[1] and $mean[2] globally. What I did was that I confined the mean values within the correlation() function, thus the ss() function is unable to access the values. Therefore I have to pass the mean values to via ss($x,$mean_x,$mean_y,1,1) and the order of $mean_x and $mean_y would be same each time I call the ss() function.
Cheers,
Dave