Autoregressive Moving Average (ARMA): Sunspots dataΒΆ

Link to Notebook GitHub

In [1]:
from __future__ import print_function
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt

import statsmodels.api as sm
In [2]:
from statsmodels.graphics.api import qqplot

Sunpots Data

In [3]:
print(sm.datasets.sunspots.NOTE)
::

    Number of Observations - 309 (Annual 1700 - 2008)
    Number of Variables - 1
    Variable name definitions::

        SUNACTIVITY - Number of sunspots for each year

    The data file contains a 'YEAR' variable that is not returned by load.


In [4]:
dta = sm.datasets.sunspots.load_pandas().data
In [5]:
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del dta["YEAR"]
In [6]:
dta.plot(figsize=(12,8));
In [7]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(dta.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(dta, lags=40, ax=ax2)
In [8]:
arma_mod20 = sm.tsa.ARMA(dta, (2,0)).fit()
print(arma_mod20.params)
const                49.659296
ar.L1.SUNACTIVITY     1.390656
ar.L2.SUNACTIVITY    -0.688571
dtype: float64

In [9]:
arma_mod30 = sm.tsa.ARMA(dta, (3,0)).fit()
In [10]:
print(arma_mod20.aic, arma_mod20.bic, arma_mod20.hqic)
2622.63633806 2637.56970317 2628.60672591

In [11]:
print(arma_mod30.params)
const                49.749943
ar.L1.SUNACTIVITY     1.300810
ar.L2.SUNACTIVITY    -0.508093
ar.L3.SUNACTIVITY    -0.129649
dtype: float64

In [12]:
print(arma_mod30.aic, arma_mod30.bic, arma_mod30.hqic)
2619.4036287 2638.07033508 2626.8666135

  • Does our model obey the theory?
In [13]:
sm.stats.durbin_watson(arma_mod30.resid.values)
Out[13]:
1.9564809246103119
In [14]:
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax = arma_mod30.resid.plot(ax=ax);
In [15]:
resid = arma_mod30.resid
In [16]:
stats.normaltest(resid)
Out[16]:
NormaltestResult(statistic=49.845018228245657, pvalue=1.5006928610248047e-11)
In [17]:
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid, line='q', ax=ax, fit=True)
In [18]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=40, ax=ax2)
In [19]:
r,q,p = sm.tsa.acf(resid.values.squeeze(), qstat=True)
data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print(table.set_index('lag'))
           AC          Q      Prob(>Q)
lag
1    0.009179   0.026286  8.712035e-01
2    0.041793   0.573042  7.508714e-01
3   -0.001335   0.573601  9.024483e-01
4    0.136089   6.408919  1.706205e-01
5    0.092468   9.111824  1.046861e-01
6    0.091948  11.793238  6.674359e-02
7    0.068748  13.297194  6.518998e-02
8   -0.015020  13.369222  9.976153e-02
9    0.187592  24.641902  3.393919e-03
10   0.213718  39.321986  2.229482e-05
11   0.201082  52.361129  2.344958e-07
12   0.117182  56.804180  8.574287e-08
13  -0.014055  56.868317  1.893909e-07
14   0.015398  56.945556  3.997671e-07
15  -0.024967  57.149311  7.741493e-07
16   0.080916  59.296761  6.872186e-07
17   0.041138  59.853730  1.110947e-06
18  -0.052021  60.747420  1.548437e-06
19   0.062496  62.041683  1.831648e-06
20  -0.010301  62.076971  3.381252e-06
21   0.074453  63.926645  3.193595e-06
22   0.124955  69.154763  8.978380e-07
23   0.093162  72.071026  5.799798e-07
24  -0.082152  74.346679  4.713029e-07
25   0.015695  74.430035  8.289063e-07
26  -0.025037  74.642894  1.367287e-06
27  -0.125861  80.041140  3.722575e-07
28   0.053225  81.009974  4.716289e-07
29  -0.038693  81.523800  6.916647e-07
30  -0.016904  81.622219  1.151663e-06
31  -0.019296  81.750931  1.868770e-06
32   0.104990  85.575058  8.927975e-07
33   0.040086  86.134560  1.247511e-06
34   0.008829  86.161803  2.047829e-06
35   0.014588  86.236440  3.263813e-06
36  -0.119329  91.248891  1.084456e-06
37  -0.036665  91.723859  1.521925e-06
38  -0.046193  92.480509  1.938737e-06
39  -0.017768  92.592877  2.990684e-06
40  -0.006220  92.606700  4.696991e-06

  • This indicates a lack of fit.
  • In-sample dynamic prediction. How good does our model do?
In [20]:
predict_sunspots = arma_mod30.predict('1990', '2012', dynamic=True)
print(predict_sunspots)
1990-12-31    167.047417
1991-12-31    140.993005
1992-12-31     94.859124
1993-12-31     46.860918
1994-12-31     11.242608
1995-12-31     -4.721269
1996-12-31     -1.166891
1997-12-31     16.185704
1998-12-31     39.021886
1999-12-31     59.449868
2000-12-31     72.170135
2001-12-31     75.376776
2002-12-31     70.436455
2003-12-31     60.731589
2004-12-31     50.201804
2005-12-31     42.076039
2006-12-31     38.114300
2007-12-31     38.454655
2008-12-31     41.963824
2009-12-31     46.869291
2010-12-31     51.423261
2011-12-31     54.399716
2012-12-31     55.321689
Freq: A-DEC, dtype: float64

In [21]:
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.ix['1950':].plot(ax=ax)
fig = arma_mod30.plot_predict('1990', '2012', dynamic=True, ax=ax, plot_insample=False)
In [22]:
def mean_forecast_err(y, yhat):
    return y.sub(yhat).mean()
In [23]:
mean_forecast_err(dta.SUNACTIVITY, predict_sunspots)
Out[23]:
5.6369500629250364

Exercise: Can you obtain a better fit for the Sunspots model? (Hint: sm.tsa.AR has a method select_order)

Simulated ARMA(4,1): Model Identification is Difficult

In [24]:
from statsmodels.tsa.arima_process import arma_generate_sample, ArmaProcess
In [25]:
np.random.seed(1234)
# include zero-th lag
arparams = np.array([1, .75, -.65, -.55, .9])
maparams = np.array([1, .65])

Let's make sure this model is estimable.

In [26]:
arma_t = ArmaProcess(arparams, maparams)
In [27]:
arma_t.isinvertible()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-70-d3a1a0e5898b> in <module>()
----> 1 arma_t.isinvertible()

TypeError: 'bool' object is not callable
In [28]:
arma_t.isstationary()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-71-55a9b2cc43b1> in <module>()
----> 1 arma_t.isstationary()

TypeError: 'bool' object is not callable
  • What does this mean?
In [29]:
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax.plot(arma_t.generate_sample(size=50));
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-72-d059f8971c1a> in <module>()
      1 fig = plt.figure(figsize=(12,8))
      2 ax = fig.add_subplot(111)
----> 3 ax.plot(arma_t.generate_sample(size=50));

TypeError: generate_sample() got an unexpected keyword argument 'size'
In [30]:
arparams = np.array([1, .35, -.15, .55, .1])
maparams = np.array([1, .65])
arma_t = ArmaProcess(arparams, maparams)
arma_t.isstationary()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-73-317f1b2ac56b> in <module>()
      2 maparams = np.array([1, .65])
      3 arma_t = ArmaProcess(arparams, maparams)
----> 4 arma_t.isstationary()

TypeError: 'bool' object is not callable
In [31]:
arma_rvs = arma_t.generate_sample(size=500, burnin=250, scale=2.5)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-74-e0a3cc13cb6e> in <module>()
----> 1 arma_rvs = arma_t.generate_sample(size=500, burnin=250, scale=2.5)

TypeError: generate_sample() got an unexpected keyword argument 'size'
In [32]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(arma_rvs, lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(arma_rvs, lags=40, ax=ax2)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-75-8e761b44cfae> in <module>()
      1 fig = plt.figure(figsize=(12,8))
      2 ax1 = fig.add_subplot(211)
----> 3 fig = sm.graphics.tsa.plot_acf(arma_rvs, lags=40, ax=ax1)
      4 ax2 = fig.add_subplot(212)
      5 fig = sm.graphics.tsa.plot_pacf(arma_rvs, lags=40, ax=ax2)

NameError: name 'arma_rvs' is not defined
  • For mixed ARMA processes the Autocorrelation function is a mixture of exponentials and damped sine waves after (q-p) lags.
  • The partial autocorrelation function is a mixture of exponentials and dampened sine waves after (p-q) lags.
In [33]:
arma11 = sm.tsa.ARMA(arma_rvs, (1,1)).fit()
resid = arma11.resid
r,q,p = sm.tsa.acf(resid, qstat=True)
data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print(table.set_index('lag'))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-76-03653831c71c> in <module>()
----> 1 arma11 = sm.tsa.ARMA(arma_rvs, (1,1)).fit()
      2 resid = arma11.resid
      3 r,q,p = sm.tsa.acf(resid, qstat=True)
      4 data = np.c_[range(1,41), r[1:], q, p]
      5 table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])

NameError: name 'arma_rvs' is not defined
In [34]:
arma41 = sm.tsa.ARMA(arma_rvs, (4,1)).fit()
resid = arma41.resid
r,q,p = sm.tsa.acf(resid, qstat=True)
data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print(table.set_index('lag'))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-77-30d9c2f35894> in <module>()
----> 1 arma41 = sm.tsa.ARMA(arma_rvs, (4,1)).fit()
      2 resid = arma41.resid
      3 r,q,p = sm.tsa.acf(resid, qstat=True)
      4 data = np.c_[range(1,41), r[1:], q, p]
      5 table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])

NameError: name 'arma_rvs' is not defined

Exercise: How good of in-sample prediction can you do for another series, say, CPI

In [35]:
macrodta = sm.datasets.macrodata.load_pandas().data
macrodta.index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3'))
cpi = macrodta["cpi"]

Hint:

In [36]:
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax = cpi.plot(ax=ax);
ax.legend();

P-value of the unit-root test, resoundly rejects the null of no unit-root.

In [37]:
print(sm.tsa.adfuller(cpi)[1])
0.990432818834