This article was published as a part of the Data Science Blogathon.
Analytics can be broadly segmented into 3 buckets by nature — Descriptive (telling us what happened) Predictive (telling us what is most likely to happen), Prescriptive (recommending us on actions to take for an outcome). Here in this article, I touch base with one component of Predictive analytics, Markov Chains.
Traditionally, Predictive analytics or modeling estimates the probability of an outcome based on the history of data that is available and try to understand the underlying path. However, that is not the case when it comes to Markov Chains, it is a method under Predictive modelling which is considered fast and most important basis the estimates of the probability of an outcome or event on the present situation.
Here I share an overview of Markov Chain and common concepts around it purely from an academic perspective. But before starting with Markov Chain, here is a brief introduction to what is a Stochastic Process.
A Stochastic Process is a family of random variables ordered in time that describes evolution through time of some physical process, i.e. collection of random variables {X(t), t ∈ T} is a Stochastic Process such that for each t ∈ T, X(t) is a random variable.
Index ‘t’ or otherwise known as Indexing Parameter could be time, distance, length, etc. and X(t) is the state of the process at ‘t’. T is a parametric space
Markov Chains are devised referring to the memoryless property of Stochastic Process which is the Conditional Probability Distribution of future states of any process depends only and only on the present state of those processes. Which are then used upon by Data Scientists to define predictions.
One-dimensional Stochastic process can be classified into 4 types of process.
State Space is the set of all possible values that random variable X(t) can assume, state space is discrete it contains finite no. of points otherwise continuous.
The set of possible values of the indexing parameter is called Parameter space, which can either be discrete or continuous.
If Xn = j, then the process is said to be in state ‘j’ at a time ’n’ or as an effect of the nth transition. Therefore, the above equation may be interpreted as stating that for a Markov Chain that the conditional distribution of any future state Xn given the past states Xo, X1, Xn-2 and present state Xn-1 is independent of past states and depends only on the present state and time elapsed.
The value pij from the equation is the conditional probability that the process when initially in state ‘i’ will be in state ‘j’ in the next transition and this probability is known as One-Step Transition probability.
However, it may be noted transition probability may or may not be independent of ’n’ and is called homogenous in the case or stationary transition probability. If it is dependent on ’n’ then non-homogeneous.
In a Markov chain with ‘k’ states, there would be k2 probabilities. The One-Step Transition probability in matrix form is known as the Transition Probability Matrix(tpm). Below is the tpm ‘P’ of Markov Chain with non-negative elements and whose order = no. of states (unit row sum).
Think of a gambling game and consider a gambler who at each play of the game either wins $1 with probability ‘p’ or loses $1 with probability ‘q’. Suppose that gambler quits playing either when he goes broke (‘0’) or achieves a fortune of $N.
The case can be explained mathematically using transition probabilities and the concept of the Markov Chain. Where let’s say state space of the Markov Chain is integer i = 0, ±1, ±2, … is said to be a Random Walk Model if for some number 0<p<1
In an oversimplified version, we may think of the above concept of Random Walk as an individual walking on a straight line who at each point of time walks one step to the right with probability ‘p’ or to the left with probability ‘q’. The gambling example is a finite state random walk, with two absorbing barriers ‘0’ and ’N’, therefore if Xn denoted the gambler’s fortune at the nth game, then {Xn, n≥1] is a Markov Chain with the following tpm
It is of great aid in visualizing a Markov Chain and is a also useful to study properties like irreducibility of the chain.
If state ‘j’ is accessible from state ‘i’ (denoted as i → j). Vertices ‘j’ and ‘i’ are joined by a directed arc towards ‘j’. A diagram such that its arc weight is positive and the sum of the arc weights are unity is called a Stochastic Graph. The transition graph of a Markov Chain is a Stochastic Graph.
State ‘3’ is absorbing state of this Markov Chain with three classes (0 ← → 1, 2,3). Absorbing state is which once reached in a Markov Chain, cannot be left. For state ‘i’ when Pi,i=1, where P be the transition matrix of Markov chain {Xo, X1, …}
There are variety of descriptions of usually a specific state or the entire Markov Chain that may allow for further understanding on the behavior of the chain.
(I) Communication States– if lets say states ‘i’ and ‘j’ are accessible from each other, then they form communication states. Denoted by i ← → j. Relation of communication satisfies the following
(II) Periodicity– a state ‘i’ with period d(i)=1 is said to be a periodic state and ‘i’ is said to be aperiodic if d(i)>1 when
Periodicity is a class property, i.e. distinctive states belonging to the same class have the same period. For instance, if state ‘i’ has period ‘d’ and state ‘i’, ‘j’ communicate then state ‘j’ also has a period ‘d’.
(III) Recurring and Transient State– if the random variable Tjj be the time at which the particle returns to state ‘j’ for the first time time where Tjj = 1 and if the particle stays in ‘j’ for a time unit, then state ‘j’ is recurrent if P[Tjj < ∞]=1 and transient if P[Tjj <∞] < 1
Markov Chain can be used to solve many scenarios varying from Biology to predicting the weather to studying the stock market and solving to Economics. If a sequence of events can be made into fit the Markov Chain assumption can be estimated using the concept of Markov Chain.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,