**Markov Chain**implementation coming in Solr 7.2. We'll first cover how Markov Chains work and then show how they are supported through the Streaming Expression statistical library.

**Markov Chain**

A Markov Chain uses probabilities to model the

**state transitions**of a process. A simple example taken from the Markov Chain wiki page will help illustrate this.

In this example we'll be modeling the state transitions of the stock market. In our model the stock market can have three possible states:

**Bull****Bear****Stagnant**

There are two important characteristics of a Markov Process:

1) The process can only be in

**one state**at a time.
2) The probability of transitioning between states is based only on the current state. This is known as "memorylessness".

Below is a state diagram of the probabilities of transferring between the different states.

In the diagram above, the lines show the probabilities of transitioning between the different states. For example there is .075 probability of transitioning from a

**Bull**market to a**Bear**market. There is .025 probability of transitioning from a**Bull**market to a**Stagnant**market. There is a .9 probability of a**Bull**market transitioning to another**Bull**market.
The state transition probabilities in this example can be captured in a 3x3 matrix called a

**transition matrix**. The transition matrix for this example is:**Bull Bear Stagnant**

**Bull**| .9 .075 .025 |

**Bear**| .15 .8 .05 |

**Stagnant**| .25 .25 .5 |

Notice each state has a row in the matrix. The values in the columns hold the transition probabilities for each state.

For example row 0, column 0 is the probability of the

**Bull**market transitioning to another**Bull**market. Row 1, column 0 is the probability of the**Bear**market transitioning to a**Bull**market.
A Markov Chain uses the transition matrix to model and simulate the transitions in the process. A code example will make this more clear.

**Working with Matrices**

In Solr 7.2 support for matrices has been added to the Streaming Expression statistical function library. Below is the expression for creating the example transition matrix:

let(a=array(.9, .075, .025),

b=array(.15, .8, .05),

c=array(.25, .25, .5),

d=matrix(a, b, c))

In the expression above the rows of the matrix are created as numeric arrays and set to variables

**,***a***and***b***. Then the arrays are passed to the***c***matrix**function to instantiate the matrix.
If we send this expression to Solr's /stream handler it responds with:

{
"result-set": {
"docs": [
{
"d": [
[
0.9,
0.075,
0.025
],
[
0.15,
0.8,
0.05
],
[
0.25,
0.25,
0.5
]
]
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}

**Markov Chain Simulations**

Once the transition matrix is created its very easy to create a Markov Chain and simulate the process. Here is a sample expression:

let(a=array(.9, .075, .025),

b=array(.15, .8, .05),

c=array(.25, .25, .5),

d=matrix(a, b, c),

e=markovChain(d),

f=sample(e, 5))

In the expression above the transition matrix is created and then passed as a parameter to

**the****markovChain**function. The markovChain function returns a Markov Chain for the specific transition matrix.
The Markov Chain can then be sampled using the

**function. In the example above 5 samples are taken from the Markov Chain. The samples represent the state that the process is in. This transition matrix has three states so each sample will either be***sample***(Bull),***0***(Bear) or***1***(Stagnant).***2*
Each time the Markov Chain is sampled it returns the next state of the process based on the transition probabilities of its current state.

If we send this expression to the /stream handler it may respond with:

{
"result-set": {
"docs": [
{
"f": [
0,
0,
0,
2,
2
]
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}

Notice that 5 samples were returned 0, 0, 0, 2, 2. This corresponds to three consecutive

**Bull**states followed by two**Stagnant**states.**Finding the Long Term Average of the States**

By increasing the number of samples we can determine how much time the process will spend in each state over the long term. Here is an example expression:

let(a=array(.9, .075, .025),

b=array(.15, .8, .05),

c=array(.25, .25, .5),

d=matrix(a, b, c),

e=sample(markovChain(d), 200000),

f=freqTable(e))

Notice that now instead of 5 samples we are taking 200,000 samples. Then we are creating a frequency table from the simulation array using the

**function. This will tell us the percentage of time spent in each state.***freqTable*
If we send this expression to the /stream handler it responds with the breakdown of the frequency table:

{
"result-set": {
"docs": [
{
"f": [
{
"pct": 0.62636,
"count": 125272,
"cumFreq": 125272,
"cumPct": 0.62636,
"value": 0
},
{
"pct": 0.310705,
"count": 62141,
"cumFreq": 187413,
"cumPct": 0.937065,
"value": 1
},
{
"pct": 0.062935,
"count": 12587,
"cumFreq": 200000,
"cumPct": 1,
"value": 2
}
]
},
{
"EOF": true,
"RESPONSE_TIME": 56
}
]
}
}

Notice in the response above that there are three tuples returned in the frequency table, one for each state (Bull, Bear, Stagnant).

The

*value*field in each tuple is the numeric mapping of the state (0=Bull, 1=Bear, 2=Stagnant).
The

**field in each tuple is the percentage of time the value appears in the sample set. We can see that the process is in a Bull market 62.6% of the time, Bear market 31% and Stagnant 6.3% of the time.***pct*