[1]:
%run ../initscript.py
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
from ipywidgets import *
%matplotlib inline

Sequential Data Models

The assumption that data points are assumed to be independent and identically distributed (i.i.d.) allows us to express the likelihood function as the product over all data points of the probability distribution evaluated at each data point. For many applications, the i.i.d. assumption may not hold such as

  • Time-series: stock market, speech, video analysis

  • Ordered: text, genes

One of the simplest ways to relax the i.i.d. assumption is to consider Markov model.

A first order Markov chain of observations \(\mathbf{x}_t\) has joint distribution

\begin{align*} p(\mathbf{x}_1, \ldots, \mathbf{x}_T) = p(\mathbf{x}_1) \prod_{t=2}^{T} p(\mathbf{x}_t|\mathbf{x}_{t-1}). \end{align*}

Hidden Markov Models

The joint distribution is

\begin{align*} p(\mathbf{X}, \mathbf{Z} | \theta) = p(\mathbf{z}_1|\pmb{\pi}) \left( \prod_{t=2}^{T} p(\mathbf{z}_t|\mathbf{z}_{t-1}, \mathbf{A}) \right) \prod_{t=1}^{T} p(\mathbf{x}_t|\mathbf{z}_{t}, \psi) \end{align*}

where \(\mathbf{X} = \{\mathbf{x}_1, \ldots, \mathbf{x}_T\}, \mathbf{Z}=\{\mathbf{z}_1, \ldots, \mathbf{z}_T\}\) and \(\theta = \{ \pmb{\pi}, \mathbf{A}, \psi \}\) denotes the set of parameters governing the model.

Linear Dynamical System