A Markov chain is a mathematical system that undergoes transitions from one state to another on a state space in a stochastic (random) manner. Examples of Markov chains include the board game snakes and ladders, where each state represents the position of a player on the board and a player moves between states (different positions on the board) by rolling a dice. An important property of Markov chains, called the Markov property, is the memoryless property of the stochastic process. Basically what this means is that the transition between states depends only on the current state and not on the states preceding the current state; in terms of the board game, your next position on the board depends only on where you are currently positioned and not on the sequence of moves that got you there. Another way of thinking about it is that the future is independent of the past, given the present.
To illustrate some applications of Markov chains, I will follow the examples from this series of videos explaining Markov chains (which I highly recommend watching):
Modelling a marketing campaign
A company selling orange juice (call it brand A) has 20% of the market share and wishes to increase their market share via a marketing campaign. They estimated that the marketing campaign has the effect that:
- Someone using brand A will stay with brand A with 90% probability
- Someone NOT using brand A will switch to brand A with 70% probability
In this particular example, we have two states: 1) someone using brand A (state A) and 2) someone not using brand A (state A'). We can represent this as a transition diagram (using the diagram package in R):
# install if necessary install.packages('diagram') # load library library(diagram) P <- matrix(c(0.9, 0.7, 0.1, 0.3), byrow=T, nrow=2) plotmat(P, # transition matrix name = c('A', "A'"), # names of the states pos = c(2), # position of the states box.lwd=1, # outline of state cex.txt=1, # size of probabilities box.prop=1, # size of box box.type = 'circle', self.cex = 0.6, # size of self probability lwd = 1, # outline of probabilities box.cex=2 # size of text in box )
Transition diagram showing the states and probabilities of switching states.
The transition diagram can also be represented as a transition probability matrix (P):
The initial state distribution matrix called , is the market share of brand A prior the marketing campaign. Since brand A had a 20% market share, 80% of the market share does not belong to brand A:
After the first marketing campaign, the company will be at the first state matrix, . The first state matrix can be calculated by multiplying the initial state distribution matrix, to the transition probability matrix P:
To perform matrix multiplications in R use the syntax %*%:
# transition matrix P P <- matrix(c(0.9, 0.1, 0.7, 0.3), byrow=T, nrow=2) P # [,1] [,2] # [1,] 0.9 0.1 # [2,] 0.7 0.3 # initial state matrix state_0 <- matrix(c(0.2, 0.8), nrow=1) state_0 # [,1] [,2] # [1,] 0.2 0.8 # initial state multiplied by transition matrix state_0 %*% P # [,1] [,2] # [1,] 0.74 0.26
After the first marketing campaign, the market share of brand A is now 74% (first state). What happens if we have another round of marketing (second state) and another and so on?
# transition matrix P P <- matrix(c(0.9, 0.1, 0.7, 0.3), byrow=T, nrow=2) # different states state_0 <- matrix(c(0.2, 0.8), nrow=1) state_1 <- state_0 %*% P state_2 <- state_1 %*% P state_3 <- state_2 %*% P state_4 <- state_3 %*% P state_5 <- state_4 %*% P state_6 <- state_5 %*% P # bind the states states <- rbind(state_0, state_1, state_2, state_3, state_4, state_5, state_6) states # [,1] [,2] # [1,] 0.2000000 0.8000000 # [2,] 0.7400000 0.2600000 # [3,] 0.8480000 0.1520000 # [4,] 0.8696000 0.1304000 # [5,] 0.8739200 0.1260800 # [6,] 0.8747840 0.1252160 # [7,] 0.8749568 0.1250432
The market share of brand A is starting to plateau around 87.5%, i.e., it is approaching its steady state.
Finding the steady state
The steady state represents the state at which the system is at an equilibrium; for our previous example, this means that the market share remains the same despite the marketing campaign:
Given a transition probability matrix, one can find the steady state by solving the equation. Using the marketing campaign example above:
For two states:
From the matrix multiplications we can come up with a system of equations:
Since the two states represent the market share of brand A and not brand A, they should sum to 1:
We can solve for b by substituting a into :
Substitute a back:
As we approximated before, the steady state of the transition matrix P is:
Stationary matrices and regular Markov chains
A) Does every Markov chain have a unique stationary (steady) matrix?
B) If a Markov chain has an unique stationary matrix, will the successive state matrices always approach this stationary matrix?
The answer to both questions is NO except for regular Markov chains. A Markov chain is regular if its transition matrix is regular. A transition matrix, P, is regular if some power of P has only positive entries. And in case you were wondering, a positive number is a number that is greater than zero and thus zero is NOT a positive number.
Below is a regular transition matrix, because all the values are positive (the first power of P):
Is the transition matrix below regular?
Let's take some powers of it:
The transition matrix above alternates between two matrices, which both do not contain only positive entries, thus it is not a regular transition matrix.
Sometimes we need to calculate the second power of a matrix to reveal that it is a regular transition matrix:
If the transition matrix is regular, given any initial-state matrix , the state matrices will approach the stationary matrix S.
Insurance statistics
Using one of the examples from this lecture: Imagine that 23% of drivers involved in an accident are also involved in an accident in the following year. 11% of drivers not involved in an accident are involved in an accident the following year. So if we had state A as accident and A' as no accident:
library(diagram) P <- matrix(c(0.23, 0.11, 0.77, 0.89), byrow=T, nrow=2) plotmat(P, #transition matrix name = c('A', "A'"), # names of the states pos = c(2), # position of the states box.lwd=1, # outline of state cex.txt=1, # size of probabilities box.prop=1, # size of box box.type = 'circle', self.cex = 0.6, # size of self probability lwd = 1, # outline of probabilities box.cex=2 # size of text in box )
Now if 5% of all drivers had an accident one year, what is the probability that a driver, picked at random, has an accident in the following year?
Our initial state, and our transition matrix, P:
So therefore:
The probability of an accident in the following year is 11.6%. What about the long run behaviour? What percentage of drivers will have an accident in a given year?
Since our transition matrix has all positive values, this is a regular Markov chain and we can solve the equation:
Solve these system of equations:
Substitute:
So in the long run, 12.5% of the drivers will have an accident.
Conclusions
This post is a lead up to hidden Markov models (HMMs), which I am currently reading about in the book Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids and learning about from material I can find online.
This work is licensed under a Creative Commons
Attribution 4.0 International License.
This is a solid post. Thanks for sharing. As you probably already have learned, HMM is hugely important in Information Retrieval(IR) as well.
Thanks Heath. A HMM post is still in the works 😉