Combinations and permutations in R

Time to get another concept under my belt, combinations and permutations. While I'm at it, I will examine combinations and permutations in R. As you may recall from school, a combination does not take into account the order, whereas a permutation does. Using the example from my favourite website as of late, mathsisfun.com:

  • A fruit salad is a combination of apples, bananas and grapes, since it's the same fruit salad regardless of the order of fruits
  • To open a safe you need the right order of numbers, thus the code is a permutation

As a matter of fact, a permutation is an ordered combination. There are basically two types of permutations, with repetition (or replacement) and without repetition (without replacement).

Permutations with repetition

The number of permutations with repetition (or with replacement) is simply calculated by:

$latex n^r &s=2 $

where n is the number of things to choose from, r number of times.

For example, you have a urn with a red, blue and black ball. If you choose two balls with replacement/repetition, there are $latex 3^2 $ permutations: {red, red}, {red, blue}, {red, black}, {blue, red}, {blue, blue}, {blue, black}, {black, red}, {black, blue}, and {black, black}. In R:

https://gist.github.com/davetang/6485079

A biological example of this are all the possible codon permutations. We have 4 choices (A, C, G, and T) and we are choosing 3 nucleotides: $latex n^r = 4^3 = 64 $ (though some permutations code for the same amino acid). How about the number of 11-mers that are possible: $latex n^r = 4^{11} = 4194304 $. For miRNAs of size 21, there are $latex 4^{21} = 4398046511104 $ possible miRNAs, but since the majority of these 21-mers are probably not biologically relevant or even possible, they most likely don't exist.

Permutations without repetition

Calculating permutations without repetition/replacement, just means that for cases where r > 1, n gets smaller after each pick. For example, if we choose two balls from the urn with the red, blue, and black ball but without repetition/replacement, the first pick has 3 choices and the second pick has 2 choices: {red, blue}, {red, black}, {blue, red}, {blue, black}, {black, red} and {black, blue}. In R:

https://gist.github.com/davetang/6485110

Let's use an example with more choices, say choosing pool balls. If we chose 3 balls without replacement (out of a total of 16), the number of permutations would be:

$latex 16 \times 15 \times 14 = 3360. &s=2 $

How do we reach a formula for this? If we chose all 16 balls, there are this many permutations:

$latex 16 \times 15 \times 14 \times \ldots = 20922789888000. &s=2 $

Since we want $latex 16 \times 15 \times 14 $, we can perform this division:

$latex \frac{16 \times 15 \times 14 \times 13 \times 12 \ldots}{13 \times 12 \times \ldots} = 16 \times 15 \times 14 = 3360 &s=2 $

which is $latex 16! / 13!. $ The formula is: the total number of permutations without replacement for $latex n $ divided by the factorial of the subtraction of the total number of possible choices ($latex n $) and the number of choices we make ($latex r $); note that $latex 0! = 1 $:

$latex \frac{n!}{(n - r)!} &s=2 $

I wrote this formula as a function in R:

https://gist.github.com/davetang/6485552

Combinations without repetition

By now you've probably heard of induced Pluripotent Stem Cells (iPSCs), which are a type of pluripotent stem cell artificially derived from a non-pluripotent cell through the forced expression of four specific transcription factors (TFs). This discovery was made by Yamanaka-sensei and his team. Prior to the discovery, Yamanaka-sensei and his team investigated 24 TFs known to be important in early embryo development and tested different combinations of these TFs. How many combinations are there for a set of 4 TFs?

Since the TFs are used once, the number of combinations are without repetition/replacement and the order does not matter; all four TFs just need to be added. Since combinations are order agnostic, there will be less combinations than permutations under the same scenario. Let's calculate the number of permutations to get an upper bound:

$latex \frac{n!}{(n - r)!} = \frac{24!}{(24 - 4)!} = 255024 &s=2 $

As pointed out above, many of these permutations are the same combination. Actually, for 4 TFs, there are $latex 4 \times 3 \times 2 \times 1 = 24 $ permutations that are the same combination. So we should reduce the number of permutations, by 24, to arrive at the number of combinations: which is $latex 255024 / 24 = 10626. $ In R, these is already a built in function for this called choose():

#calculate the number of combinations without replacement/repetition
choose(n=24,k=4)
[1] 10626

The formula, for the number of combinations without repetition/replacement, would be very similar to working out the number of permutations without repetition/replacement; it is simply the same formula but decreased by the number of size r permutations without replacement/repetition:

$latex \frac{n!}{r! (n - r)!} &s=2 $

where n is the number of things to choose from, choosing r of them. For the above example:

$latex \frac{n!}{r! (n - r)!} = \frac{24!}{4! (24 - 4)!} = \frac{24!}{4!20!} = \frac{24 \times 23 \times 22 \times 21}{24} = 10626. &s=2 $

Combinations with repetition

This is actually quite complicated to figure out intuitively. To get started, let's use the urn with the 3 balls, a red, blue and black ball. If we choose 3 balls with replacement, the possible combinations are:

  • 3 colours: {red, red, red}, {blue, blue, blue}, {black, black, black}
  • 2 red: {red, red, blue}, {red, red, black}
  • 2 blue: {blue, blue, red}, {blue, blue, black}
  • 2 black: {black, black, red}, {black, black, blue}
  • 1 of each: {red, blue, black}
  • scenarios with 1 red or 1 blue or 1 black are redundant, i.e. already listed above

which equals 10 possible combinations. The first thing I thought about was, with repetition/replacement, how many permutations are there for choosing 3 balls in a urn with 3 balls?

$latex n^r = 3^3 = 27. &s=2 $

How many combinations are there without repetition/replacement?

$latex \frac{n!}{r!(n - r)!} = \frac{3!}{3!} = 1. &s=2 $

So the answer is in between 1 and 27 (it is 10) but the question is how do we remove the redundant combinations from the permutations with repetition (the 27 cases)?

On mathsisfun.com, they used a special technique to work this out. Instead of balls in a urn, they considered scoops of ice cream per customer order. Imagine, there are 3 tubs of ice cream in succession, chocolate, vanilla and strawberry and there is a robot that moves from left to right and can also perform scoops. If a customer orders 3 scoops of vanilla, the robot will move once to the right, make 3 scoops and then once to the right to reach the last tub. To represent this in symbols: -> S S S ->, where S = scoop and -> means move right once. If the customer ordered a Neapolitan ice cream, it would be S -> S -> S. So now the question is how many arrangements of scoops and arrows can we have? If we list them all out, where chocolate = c, vanilla = v, and strawberry = s:

  1. c, c, c: S S S -> ->
  2. c, c, v: S S -> S ->
  3. c, c, s: S S -> -> S
  4. v, v, v: -> S S S ->
  5. c, v, v: S -> S S ->
  6. v, v, s: -> S S -> S
  7. s, s, s: -> -> S S S
  8. c, s, s: S -> -> S S
  9. v, s, s: -> S -> S S
  10. c, v, s: S -> S -> S

Notice that we have 3 scoops and 2 arrows each time. Asking for the number of arrangements of scoops and arrows is actually the same as asking for the number of combinations without repetition/replacement for n = 5 and r = 3:

$latex \frac{n!}{r! (n - r)!} = \frac{5!}{3!2!} = \frac{120}{12} = 10. &s=2 $

However, for our original question we had n = 3 and r = 3; we need to make n = 5. It turns out that r + (n - 1) will give us the 5 (when n = 3 and r = 3). So we can substitute r + (n - 1) as n:

$latex \frac{n!}{r! (n - r)!} = \frac{(r + n - 1)!}{r! (r + n - 1 - r)!} = \frac{(r + n - 1)!}{r! (n - 1)!}. &s=2 $

For our example of 3 scoops of ice cream from 3 tubs, the number of combinations with repetition is:

$latex \frac{(3 + 3 - 1)!}{3! (3 - 1)!} = \frac{5!}{3!2!} = \frac{120}{12} = 10. &s=2 $

I wrote the function in R:

https://gist.github.com/davetang/6490917

Now back to the four Yamanaka factors; if each TF is actually a discrete unit, where we can add the same TF one or more times (i.e. with repetition), such as {Sox2, Sox2, Sox2, Sox2}, how many combinations are there? Intuitively this number is > $latex 10626 $ (number of combinations without repetition/replacement):

comb_with_replacement(24,4)
[1] 17550

Conclusions

I'm starting to learn things intuitively and not by rote, especially mathematical concepts. I may forget the formulae for the 4 scenarios above (ordered with repetition, ordered without repetition, order agnostic with repetition and order agnostic without repetition), but I can figure them out again because they make intuitive sense.

I decided to dedicate time to finally lock in the concepts of permutations and combinations in my head because there are so many applications of these concepts in everyday life and in biology (as I've tried to demonstrate). I have also written some functions for calculating combinations and permutations in R, and shown examples of using the gtools package to list out all possible permutations; I wrote the functions to replicate the formulae in R.

A note that Yamanaka-sensei, didn't actually go about checking all the combinations. As far as I'm aware, he used all 24 transcription factors and kept subtracting different TFs, i.e. deductive reasoning, to see which ones were important for the formation of iPSCs.

And lastly, maths is indeed fun!

Further reading

Combinations and permutations on mathsisfun.com

Permutations and combinations (without repetition/replacement) on betterexplained.com

Another explanation of combinations with repetition/replacement.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
13 comments Add yours
  1. Hei, thanks for this post. It really help me to pick a list of 5 stocks from a big-list of 50 stocks. (50 p 5) . Can you please help me with the code to get the combinations?

    Regards,
    Jagan

    1. Hi Jagan,

      try adapting this code to your own example:

      my_list <- c('one','two','three','four','five')
      
      combn(my_list, 4, simplify=F)
      

      Cheers,

      Dave

  2. I have a similar but different issue. I have 4 learners { ada, rf, svm, and nnet }.

    So I am trying to average those learners to find the lowest error against my response.

    So,

    I would want to pass to a function these but I think I would have to do a sapply() for each row below.

    c(ada,rf,svm,nnet) is 4!/4!
    c(ada,rf,svm), c(ada, rf, nnet), c(ada, svm, nnet), c(rf, svm, nnet) is cuberoot (4^3)
    c(ada,rf), c(ada,svm), c(ada,nnet), c(rf,svm), c(rf,nnet), c(svm, nnet) is (4!/2!) / 2

    So, what is this? I want all of the unique sets possible out of 4 excluding the sets of 1. Unless I made an error there are 11 of them as described above.

    How do you find these using the formulas above without backing into it?

  3. Hi?I found something interesting about the permutation and combination function, when I try this code below:

    x <- c("id","A","B","C")
    permutations(n=3,r=2,v=x)

    I expected to get the result below:
    [,1] [,2]
    [1,] "id" "A"
    [2,] "id" "B"
    [3,] "A" "B"
    [4,] "A" "C"
    [5,] "B" "A"
    [6,] "B" "C"

    but I actually get was this:
    [,1] [,2]
    [1,] "A" "B"
    [2,] "A" "C"
    [3,] "B" "A"
    [4,] "B" "C"
    [5,] "C" "A"
    [6,] "C" "B"

    It's wired, isn't it? This function automatically ignore the "id", do you know why?

    1. Your vector size is 4 but you set it to 3 (n=3). Use:

      x <- c("id","A","B","C"); permutations(n=4, r=2, v=x)

  4. Hi, thank you so much for the post.

    I did get the combinations but I’m trying to figure out the way to apply those different combinations to a function and get the results from each combination. Is there any way to make R do that by itself instead of me putting all those combinations manually to get each result? Could you please help me with that?

    Thanks!

    Hanna

  5. Hello.
    I want to write an R code to generate all distinct permutations of a list with a repeated characters in an efficient way. For example, let x<-c(1,1,2,2,3,4) then

    library(permute)
    unique(permn(x))

    works, but it is very inefficient.

  6. Hi,
    This is a great post and as a follow up perhaps you could enlighten a doubt I have.

    Would you be able to think of a solution in R to derive a dataframe with all different combinations from 2 large dataframes (100k rows x 2 columns)?

    The data would be geographical coordinates and my idea is to run all these combinations between the points (each row of each data frame is a geo-point) to thereafter calculate distances.

    Appreciate your thoughts.

    Best regards

  7. You have no idea how much you have helped by point me out to this concept and package!
    I was struggling with a code and the answer was simple as a combination! Many thanks!
    Please continue to do this amazing work you do!

    1. I’m very glad that you have found this useful! I haven’t been blogging lately but thank you for the encouraging comment!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.