[1]:

%run ../initscript.py
HTML("""
<div id="popup" style="padding-bottom:5px; display:none;">
    <div>Enter Password:</div>
    <input id="password" type="password"/>
    <button onclick="done()" style="border-radius: 12px;">Submit</button>
</div>
<button onclick="unlock()" style="border-radius: 12px;">Unclock</button>
<a href="#" onclick="code_toggle(this); return false;">show code</a>
""")

[1]:

show code

[2]:

%run ../../notebooks/loadtsfuncs.py
%matplotlib inline
toggle()

[2]:

show code

Time Series Data¶

Time series is a sequence of observations recorded at regular time intervals with many applications such as in demand and sales, number of visitors to a website, stock price, etc. In this section, we focus on two time series datasets that one is the US houses sales and the other is the soft drink sales.

Read and Plot Data¶

The python package pandas is used to read .cvs data file. The first 5 rows are shown as below.

[3]:

df_house.head()

[3]:

	sales	year	month
date
1991-01-01	401	1991	Jan
1991-02-01	482	1991	Feb
1991-03-01	507	1991	Mar
1991-04-01	508	1991	Apr
1991-05-01	517	1991	May

[4]:

df_drink.head()

[4]:

	sales	year	quarter
date
2001-03-31	1807.37	2001	Q1
2001-06-30	2355.32	2001	Q2
2001-09-30	2591.83	2001	Q3
2001-12-31	2236.39	2001	Q4
2002-03-31	1549.14	2002	Q1

There are univariate and multivariate time series where - A univariate time series is a series with a single time-dependent variable, and - A Multivariate time series has more than one time-dependent variable. Each variable depends not only on its past values but also has some dependency on other variables. This dependency is used for forecasting future values.

Our datasets are univariate time series. Time series data can be thought of as special cases of panel data. Panel data (or longitudinal data) also involves measurements over time. The difference is that, in addition to time series, it also contains one or more related variables that are measured for the same time periods.

Now, We plot the time series data

[5]:

plot_time_series(df_house, 'sales', title='House Sales')

../../_images/docs_time_series_time_series_data_9_0.png

[6]:

plot_time_series(df_drink, 'sales', title='Drink Sales')

../../_images/docs_time_series_time_series_data_10_0.png

White Noise¶

A time series is white noise if the observations are independent and identically distributed with a mean of zero. This means that all observations have the same variance and each value has a zero correlation with all other values in the series. White noise is an important concept in time series analysis and forecasting because:

Predictability: if the time series is white noise, then, by definition, it is random. We cannot reasonably model it and make predictions.
Model diagnostics: the series of errors from a time series forecast model should ideally be white noise.

[7]:

pd.Series(np.random.randn(200)).plot(title='Random White Noise')
plt.show()

../../_images/docs_time_series_time_series_data_13_0.png

Random Walk¶

The series itself is not random. However, its differences - that is, the changes from one period to the next - are random. The random walk model is

\[Y_t = Y_{t-1} + \mu + e_t\]

and the difference form of random walk model is

\[DY_{t} = Y_t - Y_{t-1} = \mu + e_t\]

where \(\mu\) is the drift. Apparently, the series tends to trend upward if \(\mu > 0\) or downward if \(\mu < 0\).

[8]:

interact(random_walk, drift=widgets.FloatSlider(min=-0.1,max=0.1,step=0.01,value=0,description='Drift:'));

Seasonal Plot¶

The datasets are either a monthly or quarterly time series. They may follows a certain repetitive pattern every year. So, we can plot each year as a separate line in the same plot. This allows us to compare the year-wise patterns side-by-side.

[9]:

seasonal_plot(df_house, ['month','sales'], title='House Sales')

../../_images/docs_time_series_time_series_data_19_0.png

[10]:

seasonal_plot(df_drink, ['quarter','sales'], title='Drink Sales')

../../_images/docs_time_series_time_series_data_20_0.png

For house sales, we do not see a clear repetitive pattern in each year. It is also difficult to identify a clear trend among years.

For drink sales, there is a clear pattern repeating every year. It shows a steep increase in drink sales every 2nd quarter. Then, it is falling slightly in the 3rd quarter and so on. As years progress, the drink sales increase overall.

Boxplot of Seasonal and Trend Distribution¶

We can visualize the trend and how it varies each year in a year-wise or month-wise boxplot for house sales, as well as quarter-wise boxplot for drink sales.

[11]:

boxplot(df_house, ['month','sales'], title='House Sales')

../../_images/docs_time_series_time_series_data_24_0.png

[12]:

boxplot(df_drink, ['quarter','sales'], title='Drink Sales')

../../_images/docs_time_series_time_series_data_25_0.png

Smoothen a Time Series¶

Smoothening of a time series may be useful in: - Reducing the effect of noise in a signal get a fair approximation of the noise-filtered series. - The smoothed version of series can be used as a feature to explain the original series itself. - Visualize the underlying trend better

Moving average smoothing is certainly the most common smoothening method.

[13]:

interact(moving_average, span=widgets.IntSlider(min=3,max=30,step=3,value=6,description='Span:'));

There are other methods such as LOESS smoothing (LOcalized regrESSion) and LOWESS smoothing (LOcally Weighted regrESSion).

LOESS fits multiple regressions in the local neighborhood of each point. It is implemented in the statsmodels package, where we can control the degree of smoothing using frac argument which specifies the percentage of data points nearby that should be considered to fit a regression model.

[14]:

interact(lowess_smooth, frac=widgets.FloatSlider(min=0.05,max=0.3,step=0.05,value=0.05,description='Frac:'));