[1]:
%run ../initscript.py
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
from ipywidgets import *
%matplotlib inline
Sequential Data Models¶
The assumption that data points are assumed to be independent and identically distributed (i.i.d.) allows us to express the likelihood function as the product over all data points of the probability distribution evaluated at each data point. For many applications, the i.i.d. assumption may not hold such as
Time-series: stock market, speech, video analysis
Ordered: text, genes
One of the simplest ways to relax the i.i.d. assumption is to consider Markov model.
A first order Markov chain of observations \(\mathbf{x}_t\) has joint distribution
\begin{align*} p(\mathbf{x}_1, \ldots, \mathbf{x}_T) = p(\mathbf{x}_1) \prod_{t=2}^{T} p(\mathbf{x}_t|\mathbf{x}_{t-1}). \end{align*}
Hidden Markov Models¶
The joint distribution is
\begin{align*} p(\mathbf{X}, \mathbf{Z} | \theta) = p(\mathbf{z}_1|\pmb{\pi}) \left( \prod_{t=2}^{T} p(\mathbf{z}_t|\mathbf{z}_{t-1}, \mathbf{A}) \right) \prod_{t=1}^{T} p(\mathbf{x}_t|\mathbf{z}_{t}, \psi) \end{align*}
where \(\mathbf{X} = \{\mathbf{x}_1, \ldots, \mathbf{x}_T\}, \mathbf{Z}=\{\mathbf{z}_1, \ldots, \mathbf{z}_T\}\) and \(\theta = \{ \pmb{\pi}, \mathbf{A}, \psi \}\) denotes the set of parameters governing the model.