Calculating Pearson correlation using Perl

My modification of code which is originally available here. Probably easier to understand the original code. I altered the code so that I could use an anonymous 2d array and with strictures, so that I could plug it into my own code.

Comments are included in the code below to assist use.

#!/usr/bin/perl

use strict;
use warnings;

#make up some matrix for demonstration purposes
my $x = [];
#generate 10 row
for (my $i=1;$i<11;++$i){
   #generate 2 columns, where column 1 = $i and column 2 = $i
   for (my $j=1;$j<3;++$j){
      $x->[$i][$j]= $i;
   }
}
#end matrix code

my $correlation = correlation($x);
#Since the values in the columns are identical in $x, the correlation will be 1
print "$correlation\n";

#to use this code, remove the matrix code above
#generate an anonymous 2D array where $x->[1] is the row
#$x->[1][1] is the value in row 1 column 1 and $x->[1][2] is the value of row 1 column 2
#once you build the entire array, pass it to the correlation subroutine as above
#my $corrleation = correlation($x)

#if you want to see what's inside $x use the code below
#for (my $i = 1; $i < scalar(@{$x}); ++$i){
#   my $line_to_print = '';
#   for (my $j = 1; $j < scalar(@{$x->[$i]}); ++$j){
#      $line_to_print .= "$x->[$i]->[$j]\t";
#   }
#   $line_to_print =~ s/\t$//;
#   print "$line_to_print\n";
#}

sub mean {
   my ($x)=@_;
   my $num = scalar(@{$x}) - 1;
   my $sum_x = '0';
   my $sum_y = '0';
   for (my $i = 1; $i < scalar(@{$x}); ++$i){
      $sum_x += $x->[$i][1];
      $sum_y += $x->[$i][2];
   }
   my $mu_x = $sum_x / $num;
   my $mu_y = $sum_y / $num;
   return($mu_x,$mu_y);
}

### ss = sum of squared deviations to the mean
sub ss {
   my ($x,$mean_x,$mean_y,$one,$two)=@_;
   my $sum = '0';
   for (my $i=1;$i<scalar(@{$x});++$i){
     $sum += ($x->[$i][$one]-$mean_x)*($x->[$i][$two]-$mean_y);
   }
   return $sum;
}

sub correlation {
   my ($x) = @_;
   my ($mean_x,$mean_y) = mean($x);
   my $ssxx=ss($x,$mean_x,$mean_y,1,1);
   my $ssyy=ss($x,$mean_x,$mean_y,2,2);
   my $ssxy=ss($x,$mean_x,$mean_y,1,2);
   my $correl=correl($ssxx,$ssyy,$ssxy);
   my $xcorrel=sprintf("%.4f",$correl);
   return($xcorrel);

}

sub correl {
   my($ssxx,$ssyy,$ssxy)=@_;
   my $sign=$ssxy/abs($ssxy);
   my $correl=$sign*sqrt($ssxy*$ssxy/($ssxx*$ssyy));
   return $correl;
}
Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
10 comments Add yours
  1. Shouldn’t this:

    my $ssxx=ss($x,$mean_x,$mean_y,1,1);
    my $ssyy=ss($x,$mean_x,$mean_y,2,2);

    be this:

    my $ssxx=ss($x,$mean_x,$mean_x,1,1);
    my $ssyy=ss($x,$mean_y,$mean_y,2,2);

    That’s how it looks to be in your original code.

    Rob

    1. Hi Rob,

      When you said “your original code”, did you mean the link I reference right at the start of this post (http://homepages.ulb.ac.be/~dgonze/SCRIPTS/PERL/correlation.pl)? If so, the code was written by someone else; I merely just modified it so I get the correlation value by passing an array reference.

      If you were referring to that particular original code, then here’s my answer. The original code did not use strictures (i.e. use strict) and thus the ss() function could access the values of $mean[1] and $mean[2] globally. What I did was that I confined the mean values within the correlation() function, thus the ss() function is unable to access the values. Therefore I have to pass the mean values to via ss($x,$mean_x,$mean_y,1,1) and the order of $mean_x and $mean_y would be same each time I call the ss() function.

      Cheers,

      Dave

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.