*This is the first in a multi-part examination of the theory and implementation of Markov probability models in JavaScript. In this post, we’ll examine the simplest type of Markov Model, the Observable Markov Chain.*

Ideas have a funny way of spreading. Sometimes a concept that has become ubiquitous in a particular field of study can take years to be widely understood in another field that could have equal use for it.

Variations of Markov Models have found many uses in computer programming recently, from stock trading algorithms to speech recognition systems. However, there is a dearth of solid explanation for client-side JavaScript implementation. I’d imagine this has to do with the fact that JavaScript in web browsers was historically too slow and lacked the low-level access to audio/video input for many of the common uses of these probability models to be implemented. Fortunately, with massive cross-browser performance improvements, and new multimedia API’s such as WebRTC in the pipeline, these limitations are becoming problems of the past.

We’ll begin this series with the simplest Markov Model data structure, which all others are built upon: the Observable Markov Model (OMM). Also known as a Markov Chain, an OMM is a way of estimating the probability of a future series of events using only the previous event for each item in the series as inference. If a data structure exhibits this behavior, we can say it possesses the Markov Property.

To understand why this is useful, let’s consider a practical example relating to client-side web development. Say we have a bandwidth-heavy web application. It would be nice to lazy load content so the initial wait time is manageable. However, we need to balance this with a responsive user interface. When a user attempts to access a certain functionality, they shouldn’t have to wait an inordinate amount of time for it to load for the first time.

One solution is to try and predict what functionality the user will access next. This is where an OMM comes into play. If we have an accurate model of the likelihood that a user will transition from the current state to a new state, we will have much better odds of correctly guessing what data our app will need to load ahead of time.

Let’s assume that QA has been running usability tests on our app, and has compiled statistics showcasing the chance of transitioning from one app state to another. The following is a chart based on their findings:

NEXT STATE >
CURRENT STATE v |
2 Megabyte Animated GIF | 500 Photos | Video |
---|---|---|---|

2 Megabyte Animated GIF | .2 | .65 | .15 |

500 Photos | .1 | .8 | .1 |

Video | .7 | .25 | .05 |

Think of each decimal number in the chart as the percentage probability of transitioning from our current state to the next state. For example, if we are in state ’2 Megabyte Animated GIF’, there is a .65, or 65% chance of transitioning to state ’500 Photos’. Notice how the content of each row adds up to 1. This is a way of saying that we’re examining an entire mutually exclusive set of outcomes. In other words, there is a 100% chance of the current state transitioning to one of these three states.

Now let’s examine what the ‘chain’ in Markov Chain is all about. We’re going to use an OMM to calculate the likelihood of events occurring several states in the future, instead of just one.

Say we are in state ’500 Photos’. State ‘Video’ is an especially large download. We’d like to begin even if the user is two states away from navigating to it, provided there is greater than a 15% probability they will end up in the section at that time.

Let’s break the problem down. We begin in state ’500 Photos’. It doesn’t matter what the state after ’500 Photos’ is; it can be any of the three. The final state will always be ‘Video’. For any one chain of probabilities, we multiply the individual likelihoods together. Then we add the three separate chains together to get our total probability. In pseudo-equation form:

(500 Photos -> Video)* (Video ->Video) +

(500 Photos -> Animated GIF)*(Animated GIF -> Video) +

(500 Photos -> 500 Photos)*(500 Photos -> Video)

=

(.1)*(.05) +

(.1)*(.15) +

(.8)*(.1)

=

.1

Based on our model, we have an approximately 10% chance of navigating to ‘Video’ two navigation steps from now. This is below our threshold for lazy loading.

We can infer another key advantage of Markov Models from this example. To express the probabilities of all possible states for any length of Markov Chain, we only need s^2 numbers stored, with ‘s’ representing the number of possible states. In contrast, a statistical model relying on knowledge of every possible past probability state will require s^n space, where ‘n’ represents the chain length. Obviously, the non-Markov method can become very large very quickly as chain length increases.

I’ve created a helper library (the GitHub repository is located here) which contains a method that returns the probability of a single Markov Chain occurring. Just pass in a chain and probability model as parameters. Let’s look at how to calculate the above solution using the library:

It’s important to understand that the accuracy of our prediction is only as good as the statistics collected, and the relevance of our underlying Markov Assumption (for exhibit A of what can happen if this isn’t the case, see this excellent Wired article on the probability algorithm that contributed to the 2008 financial crisis). If, in fact, there are much more important underlying trends that affect state transition than just the previous state, we’re in trouble. Regarding statistics collected, we can always augment our QA stats with locally collected and stored user navigation information to create a more accurate model.

*Check back soon for part 2 of the series. We’ll examine how to increase the usefulness of the core OMM data structure by incorporating the aforementioned underlying trends to influence state transitions.*