Equivalents in R, Python and Perl

Last update 2018 May 24th

Perl was used by many computational biologists back in early 2000. The popularity of Perl may have been driven by its involvement with the human genome project. An article titled “How Perl Saved the Human Genome Project” explains why Perl was a good fit for computational biology projects (as well as listing some of the disadvantages of Perl).

However, the use of Perl in open-source projects has been on the decline since 2005, whereas Python on the other hand, has been on the rise. One major reason for its popularity is because of its clean syntax. Here’s a more comprehensive article on its popularity, which has a longer discussion on the syntax of Python.

Another language on the rise, is R, which has become very popular in the big data era. R has many packages for performing statistical analyses and creating nice visualisations, which makes it well suited for data analysis. In addition, the Bioconductor project has made R very popular in the computational biology space, since a lot of bioinformatic tools are available (many times exclusively) as an R package.

If I could go back in time, I wish I had dedicated my time learning Python instead of Perl. That said, it’s not too late! I initially wrote this guide back in 2012 when I wanted to learn both Python and R at the same time. I now have a much better grasp of R, so I’m updating this guide with a focus on learning more about Python. If you are interested in a web platform for learning Python (and/or R), I would recommend using DataCamp. In addition, I have created a GitHub repository containing Jupyter notebooks that I have written while learning Python.

Finally, I’d just like to point out that the Python equivalents in this post may not be the ideal choice since I am still in the process of learning. As for Perl, there will be several ways of accomplishing the same tasks, since this is the motto of Perl.

Before we begin

Here are the versions of R, Python, and Perl that I used in the examples.

Rscript --version
R scripting front-end version 3.4.4 (2018-03-15)

python --version
Python 3.6.4 :: Anaconda custom (64-bit)

perl --version

This is perl 5, version 24, subversion 2 (v5.24.2) built for darwin-thread-multi-2level
(with 1 registered patch, see perl -V for more detail)
...output trimmed...

For Perl, the examples take strictness into account, which is used to ensure that variables are not used out of scope.

R Markdown

I am using R Markdown and knitr to document and execute the R, Python, and Perl code since knitr can execute chunks of code written in Perl and Python. I found this to be a great way of documenting and running the code; I’ll share this document soon on my learning Python repository.

Scripting

Here are the equivalent shebangs for writing a R, Python, and Perl script.

Saved as hello.R.

#!/usr/bin/env Rscript

cat("Hello world!\n")

Saved as hello.py.

#!/usr/bin/env python

print("Hello world!")

Saved as hello.pl

#!/usr/bin/env perl

print("Hello world!\n")

Data structures

In Perl there are three main data structures: scalars, arrays, and hashes. In Python, there are scalars, arrays and lists, and dictionaries (and many others). In R, there are no scalars, just vectors of length one. There is a Perl hash-like implementation in R.

Arrays

In R a vector is different from an array; the array structure allows for multidimensional arrays.

a <- array(1:5)
a
[1] 1 2 3 4 5

for(i in a){
  print(i)
}

You need the array library to use arrays in Python or use the popular NumPy library.

from array import array
a = array('l', [1, 2, 3, 4, 5])

for i in a:
   print(i)

# or use numpy
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = a/10

print(type(b))
# <class 'numpy.ndarray'>

print(b)
# [0.1 0.2 0.3 0.4 0.5]

In Perl:

my @a = (1, 2, 3, 4, 5);
for my $i (@a) {
   print "$i\n";
}

Associative arrays

In R use the hash package.

library(hash)
my_hash <- hash(keys = c('colour', 'book', 'movie'),
                values = c('blue', '1984', 'Memento'))

my_hash$movie

my_hash$movie <- 'Inception'
my_hash$movie
del('book', my_hash)

keys(my_hash)
values(my_hash)

my_hash

In Python:

dict = {'colour': 'blue', 'book': '1984', 'movie': 'Memento'}
print(dict['movie'])

dict['movie'] = 'Inception'
print(dict['movie'])

del dict['movie']

print(dict.keys())
print(dict.values())

print(dict)

In Perl:

my %hash = ('colour' => 'blue', 'book' => '1984', 'movie' => 'Memento');
print("$hash{movie}\n");

$hash{'movie'} = 'Inception';
print("$hash{movie}\n");

delete($hash{'movie'});

print "$_\n" for keys %hash;
print "$_\n" for values %hash;

print(%hash, "\n")

Loops

For loops

As mentioned in the comments by Pablo and Jason (and as I found out later from experience), don’t use loops in R but use apply() or some variant of apply() like sapply(), lapply(), mapply(), etc. R is a vectorised language and large for loops are very slow compared to using the vectorised equivalent.

# prints 0 to 9
for (i in 0:9){
  print(i)
}

# using an index
index <- c('one', 'two', 'three')
for (i in index){
  print(i)
}

# example from Pablo
sapply(0:9, FUN = function(x) x )

# load the purrr package
library(purrr)

0:9 %>% map(function(x) x) %>% unlist()

In Python (indentation is required as part of the language):

# prints 0 to 9
for num in range(0, 10):
   print(num)

# using a list
my_list = [1, 2, 3, 4, 5, 6]
for i in my_list:
    print(i)

In Perl:

for (my $i = 0; $i < 10; ++$i){
   print "$i\n";
}

for my $i (0 .. 9) {
   print "$i\n";
}

While loops

In R:

i <- 1
while (i < 10){
  print(i)
  i <- i + 1
}

In Python:

count = 0
while (count < 9):
   print(count)
   count = count + 1

In Perl:

my $i = 0;
while ($i < 10) {
   print ++$i, "\n";
}

Defining functions

In R:

multiply_by_two <- function(n) {
   n*2
}
multiply_by_two(2)
#4

In Python (see http://docs.python.org/release/1.5.1p1/tut/functions.html):

# The keyword def introduces a function definition
def multiply_by_two(n):
   print n * 2

multiply_by_two(2)

In Perl:

sub multiply_by_two {
   my ($n) = @_;
   #last evaluation is returned by default
   $n*2;
}
print multiply_by_two(2), "\n";

Conditionals

In R:

greater_less_equal_5 <- function(answer){
  if (answer > 5){
    print(1)
  }
  else if (answer < 5){
    print(-1)
  } else {
    print(0)
  }
}
greater_less_equal_5(4)
greater_less_equal_5(5)
greater_less_equal_5(6)

In Python:

def greater_less_equal_5(answer):
    if answer > 5:
        return 1
    elif answer < 5:
        return -1
    else:
        return 0
        
print greater_less_equal_5(4)
print greater_less_equal_5(5)
print greater_less_equal_5(6)

In Perl:

sub greater_less_equal_5 {
   $answer = @_[0];
   if ($answer > 5){
      return(1)
   } elsif ($answer < 5){
      return(-1)
   } else {
      return(0)
   }
}

print greater_less_equal_5(4), "\n";
print greater_less_equal_5(5), "\n";
print greater_less_equal_5(6), "\n";

Objects

Firstly, everything in R is treated like as an object. However, R has three object oriented (OO) systems: [[S3]], [[S4]] and [[R5]]. The basic idea is that a function is defined which creates a list.

NorthAmerican <- function(eatsBreakfast=TRUE,myFavorite="cereal"){
   me <- list(hasBreakfast = eatsBreakfast,
              favoriteBreakfast = myFavorite
             )
   ## Set the name for the class
   class(me) <- append(class(me),"NorthAmerican")
   return(me)
}

bubba <- NorthAmerican()

In Python, the simplest form of class definition looks like this:

class ClassName:
    <statement-1>
    .
    .
    .
    <statement-N>

class MyClass:
    """A simple example class"""
    i = 12345
    def f(self):
        return 'hello world'

From perlootut:

An object is a data structure that bundles together data and subroutines which operate on that data. An object’s data is called attributes, and its subroutines are called methods. An object can be thought of as a noun (a person, a web service, a computer). A class defines the behaviour of a category of objects. A class is a name for a category (like “File”), and a class also defines the behaviour of objects in that category; all objects belong to a specific class.

In Perl most objects are hashes and a Perl class is defined in a Perl module file (*.pm) with a package declaration. Here’s an example:

package Shape;

sub new {
    my $class = shift;
    my $self = {
        color  => 'black',
        length => 1,
        width  => 1,
    };
    return bless $self, $class;
}

1;

And a script:

#!/usr/bin/env perl

use strict;
use warnings;
use Shape;

# create a new Shape object
my $shape = Shape->new;

# print the shape object attributes
print join("\t", $shape->{color}, $shape->{length}, $shape->{width}), "\n";

exit(0);

Notes

As suggested in the comments, R is not optimised for loops so please see the link in the comments section on how to use the apply() function.

Will be continually updated…

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
9 comments Add yours
  1. Want to give you some better Perl examples. In Perl you actually don’t use the 3 part for loop that much and a for loop over a range should be the following to be equivalent to the other languages:

    #!/usr/bin/perl
    for $i (0 .. 9) {
    print "$i\n";
    }

    In the while loop example merge the two lines of code in the loop:

    #!/usr/bin/perl
    $i = 0;
    while ($i < 10) {
    print ++$i, "\n";
    }

    The array example would be

    #!/usr/bin/perl
    @a = (1,2,3,4,5);
    for $i (@a) {
    print "$i\n";
    }

    1. I wrote the 3 part for loop for people that may be coming from C. But surely I think it is better to list them both. Thank you for the suggestions.

  2. Avoid big loops if you can in R. It is optimised for vectorisation (like Numpy) and really bad at loops. Instead use the various apply statements. R being a functional language is a natural Map/Reduce platform.

          1. Thanks for your blog. I enjoy it very much.
            I also bumped into R while studying data science and I use it quite often now.

            Just to smooth the move from your original basic examples to the advanced material in the post by Neil Saunders (referred to here by Dave) allow me to simplify further the R code that does a for loop printing 0 to 9 to the screen.

            For that let’s use the functional programming style that R is more efficient at.
            The philosophy is to apply a function over a range of values described by a vector. It looks like this using the simple apply function:

            > sapply( X=0:9, FUN=function(x) x )
            [1] 0 1 2 3 4 5 6 7 8 9

            or even like this if you strip the optional parameter labels of sapply:

            > sapply(0:9, function(x) x )
            [1] 0 1 2 3 4 5 6 7 8 9

            Thanks again

            1. Thanks Pablo; glad you found the blog enjoyable 🙂 I now avoid loops whenever possible in R, after learning a bit more about R since writing this post.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.