# Combinations and permutations in R

Time to get another concept under my belt, combinations and permutations. While I’m at it, I will examine combinations and permutations in R. As you may recall from school, a combination does not take into account the order, whereas a permutation does. Using the example from my favourite website as of late, mathsisfun.com:

• A fruit salad is a combination of apples, bananas and grapes, since it’s the same fruit salad regardless of the order of fruits
• To open a safe you need the right order of numbers, thus the code is a permutation

As a matter of fact, a permutation is an ordered combination. There are basically two types of permutations, with repetition (or replacement) and without repetition (without replacement).

### Permutations with repetition

The number of permutations with repetition (or with replacement) is simply calculated by:

$n^r$

where n is the number of things to choose from, r number of times.

For example, you have a urn with a red, blue and black ball. If you choose two balls with replacement/repetition, there are $3^2$ permutations: {red, red}, {red, blue}, {red, black}, {blue, red}, {blue, blue}, {blue, black}, {black, red}, {black, blue}, and {black, black}. In R:

A biological example of this are all the possible codon combinations. We have 4 choices (A, C, G and T) and we are choosing 3 nucleotides: $n^r = 4^3 = 64$. How about the number of 11-mers that are possible: $n^r = 4^{11} = 4194304$. For miRNAs of size 21, there are $4^{21} = 4398046511104$ possible miRNAs, but since the majority of these 21-mers are probably not biologically relevant or even possible, they most likely don’t exist.

### Permutations without repetition

Calculating permutations without repetition/replacement, just means that for cases where r > 1, n gets smaller after each pick. For example, if we choose two balls from the urn with the red, blue and black ball but without repetition/replacement, the first pick has 3 choices and the second pick has 2 choices: {red, blue}, {red, black}, {blue, red}, {blue, black}, {black, red} and {black, blue}. In R:

Let’s use an example with more choices, say choosing pool balls. If we chose 3 balls without replacement (out of a total of 16), the number of permutations would be:

$16 \times 15 \times 14 = 3360.$

How do we reach a formula for this? If we chose all 16 balls, there are this many permutations:

$16 \times 15 \times 14 \times \ldots = 20922789888000.$

Since we want $16 \times 15 \times 14$, we can perform this division:

$\frac{16 \times 15 \times 14 \times 13 \times 12 \ldots}{13 \times 12 \times \ldots} = 16 \times 15 \times 14 = 3360$

which is $16! / 13!.$ The formula is the total number of permutations without replacement divided by the factorial of the subtraction of the total number of possible choices and the number of choices we make (note that $0! = 1$):

$\frac{n!}{(n - r)!}$

where n is the number of things to choose from, choosing r of them. I wrote this formula as a function in R:

### Combinations without repetition

By now you’ve probably heard of induced Pluripotent Stem Cells (iPSCs), which are a type of pluripotent stem cell artificially derived from a non-pluripotent cell through the forced expression of four specific transcription factors (TFs). This discovery was made by Yamanaka-sensei and his team. Prior to the discovery, Yamanaka-sensei and his team investigated 24 TFs known to be important in the early embryo and tested different combinations of these TFs. How many combinations are there for a set of 4 TFs?

Since the TFs are used once, the number of combinations are without repetition/replacement and the order does not matter; all four TFs just need to be added. Since combinations are order agnostic, there will be less combinations than permutations under the same scenario. Let’s calculate the number of permutations anyway:

$\frac{n!}{(n - r)!} = \frac{24!}{(24 - 4)!} = 255024$

As pointed out above, many of these permutations are the same combination. Actually, for 4 TFs, there are $4 \times 3 \times 2 \times 1 = 24$ permutations that are the same combination. So we should reduce the number of permutations, by 24, to arrive at the number of combinations: which is $255024 / 24 = 10626.$ In R, these is already a built in function for this called choose():

#calculate the number of combinations without replacement/repetition
choose(n=24,k=4)
[1] 10626


The formula, for the number of combinations without repetition/replacement, would be very similar to working out the number of permutations without repetition/replacement; it is simply the same formula but decreased by the number of size r permutations without replacement/repetition:

$\frac{n!}{r! (n - r)!}$

where n is the number of things to choose from, choosing r of them. For the above example:

$\frac{n!}{r! (n - r)!} = \frac{24!}{4! (24 - 4)!} = \frac{24!}{4!20!} = \frac{24 \times 23 \times 22 \times 21}{24} = 10626.$

### Combinations with repetition

This is actually quite complicated to figure out intuitively. To get started, let’s use the urn with the 3 balls, a red, blue and black ball. If we choose 3 balls with replacement, the possible combinations are:

• 3 colours: {red, red, red}, {blue, blue, blue}, {black, black, black}
• 2 red: {red, red, blue}, {red, red, black}
• 2 blue: {blue, blue, red}, {blue, blue, black}
• 2 black: {black, black, red}, {black, black, blue}
• 1 of each: {red, blue, black}
• scenarios with 1 red or 1 blue or 1 black are redundant, i.e. already listed above

which equals 10 possible combinations. The first thing I thought about was, with repetition/replacement, how many permutations are there for choosing 3 balls in a urn with 3 balls?

$n^r = 3^3 = 27.$

How many combinations are there without repetition/replacement?

$\frac{n!}{r!(n - r)!} = \frac{3!}{3!} = 1.$

So the answer is in between 1 and 27 (it is 10) but the question is how do we remove the redundant combinations from the permutations with repetition (the 27 cases)?

On mathsisfun.com, they used a special technique to work this out. Instead of balls in a urn, they considered scoops of ice cream per customer order. Imagine, there are 3 tubs of ice cream in succession, chocolate, vanilla and strawberry and there is a robot that moves from left to right and can also perform scoops. If a customer orders 3 scoops of vanilla, the robot will move once to the right, make 3 scoops and then once to the right to reach the last tub. To represent this in symbols: -> S S S ->, where S = scoop and -> means move right once. If the customer ordered a Neapolitan ice cream, it would be S -> S -> S. So now the question is how many arrangements of scoops and arrows can we have? If we list them all out, where chocolate = c, vanilla = v, and strawberry = s:

1. c, c, c: S S S -> ->
2. c, c, v: S S -> S ->
3. c, c, s: S S -> -> S
4. v, v, v: -> S S S ->
5. c, v, v: S -> S S ->
6. v, v, s: -> S S -> S
7. s, s, s: -> -> S S S
8. c, s, s: S -> -> S S
9. v, s, s: -> S -> S S
10. c, v, s: S -> S -> S

Notice that we have 3 scoops and 2 arrows each time. Asking for the number of arrangements of scoops and arrows is actually the same as asking for the number of combinations without repetition/replacement for n = 5 and r = 3:

$\frac{n!}{r! (n - r)!} = \frac{5!}{3!2!} = \frac{120}{12} = 10.$

However, for our original question we had n = 3 and r = 3; we need to make n = 5. It turns out that r + (n – 1) will give us the 5 (when n = 3 and r = 3). So we can substitute r + (n – 1) as n:

$\frac{n!}{r! (n - r)!} = \frac{(r + n - 1)!}{r! (r + n - 1 - r)!} = \frac{(r + n - 1)!}{r! (n - 1)!}.$

For our example of 3 scoops of ice cream from 3 tubs, the number of combinations with repetition is:

$\frac{(3 + 3 - 1)!}{3! (3 - 1)!} = \frac{5!}{3!2!} = \frac{120}{12} = 10.$

I wrote the function in R:

Now back to the four Yamanaka factors; if each TF is actually a discrete unit, where we can add the same TF one or more times (i.e. with repetition), such as {Sox2, Sox2, Sox2, Sox2}, how many combinations are there? Intuitively this number is > $10626$ (number of combinations without repetition/replacement):

comb_with_replacement(24,4)
[1] 17550


### Conclusions

I’m starting to learn things intuitively and not by rote, especially mathematical concepts. I may forget the formulae for the 4 scenarios above (ordered with repetition, ordered without repetition, order agnostic with repetition and order agnostic without repetition), but I can figure them out again because they make intuitive sense.

I decided to dedicate time to finally lock in the concepts of permutations and combinations in my head because there are so many applications of these concepts in everyday life and in biology (as I’ve tried to demonstrate). I have also written some functions for calculating combinations and permutations in R, and shown examples of using the gtools package to list out all possible permutations; I wrote the functions to replicate the formulae in R.

A note that Yamanaka-sensei, didn’t actually go about checking all the combinations. As far as I’m aware, he used all 24 transcription factors and kept subtracting different TFs, i.e. deductive reasoning, to see which ones were important for the formation of iPSCs.

And lastly, maths is indeed fun!

Combinations and permutations on mathsisfun.com

Permutations and combinations (without repetition/replacement) on betterexplained.com

Another explanation of combinations with repetition/replacement.

Pro Tip : Load/access your mathematics and analytics software remotely from anywhere, anytime on any device(PC/android/iOS) with hosted virtual desktop from CloudDesktopOnline.com with top-notch support from Apps4Rent.com.

.
1. Jagan says:

Hei, thanks for this post. It really help me to pick a list of 5 stocks from a big-list of 50 stocks. (50 p 5) . Can you please help me with the code to get the combinations?

Regards,
Jagan

1. Davo says:

Hi Jagan,

my_list <- c('one','two','three','four','five')

combn(my_list, 4, simplify=F)


Cheers,

Dave

2. I have a similar but different issue. I have 4 learners { ada, rf, svm, and nnet }.

So I am trying to average those learners to find the lowest error against my response.

So,

I would want to pass to a function these but I think I would have to do a sapply() for each row below.

So, what is this? I want all of the unique sets possible out of 4 excluding the sets of 1. Unless I made an error there are 11 of them as described above.

How do you find these using the formulas above without backing into it?

1. Thinking about it, I realized what I have….I have 4^2 minus the(4) single sets and the empty set. DUH!

3. Kailasa says:

Hi?I found something interesting about the permutation and combination function, when I try this code below:

x <- c("id","A","B","C")
permutations(n=3,r=2,v=x)

I expected to get the result below:
[,1] [,2]
[1,] "id" "A"
[2,] "id" "B"
[3,] "A" "B"
[4,] "A" "C"
[5,] "B" "A"
[6,] "B" "C"

[,1] [,2]
[1,] "A" "B"
[2,] "A" "C"
[3,] "B" "A"
[4,] "B" "C"
[5,] "C" "A"
[6,] "C" "B"

It's wired, isn't it? This function automatically ignore the "id", do you know why?

1. Davo says:

Your vector size is 4 but you set it to 3 (n=3). Use:

x < - c("id","A","B","C"); permutations(n=4, r=2, v=x)

4. Hanna says:

Hi, thank you so much for the post.

I did get the combinations but I’m trying to figure out the way to apply those different combinations to a function and get the results from each combination. Is there any way to make R do that by itself instead of me putting all those combinations manually to get each result? Could you please help me with that?

Thanks!

Hanna

1. Davo says:

I’m sorry but I don’t understand your question.

5. Ehsan says:

Hello.
I want to write an R code to generate all distinct permutations of a list with a repeated characters in an efficient way. For example, let x<-c(1,1,2,2,3,4) then

library(permute)
unique(permn(x))

works, but it is very inefficient.

6. Joao Costa says:

Hi,
This is a great post and as a follow up perhaps you could enlighten a doubt I have.

Would you be able to think of a solution in R to derive a dataframe with all different combinations from 2 large dataframes (100k rows x 2 columns)?

The data would be geographical coordinates and my idea is to run all these combinations between the points (each row of each data frame is a geo-point) to thereafter calculate distances.