pmdarima.arima
.ARIMA¶

class
pmdarima.arima.
ARIMA
(order, seasonal_order=None, start_params=None, method=None, transparams=True, solver='lbfgs', maxiter=None, disp=0, callback=None, suppress_warnings=False, out_of_sample_size=0, scoring='mse', scoring_args=None, trend=None, with_intercept=True, **sarimax_kwargs)[source][source]¶ An ARIMA estimator.
An ARIMA, or autoregressive integrated moving average, is a generalization of an autoregressive moving average (ARMA) and is fitted to timeseries data in an effort to forecast future points. ARIMA models can be especially efficacious in cases where data shows evidence of nonstationarity.
The “AR” part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged (i.e., prior observed) values. The “MA” part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The “I” (for “integrated”) indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once). The purpose of each of these features is to make the model fit the data as well as possible.
Nonseasonal ARIMA models are generally denoted
ARIMA(p,d,q)
where parametersp
,d
, andq
are nonnegative integers,p
is the order (number of time lags) of the autoregressive model,d
is the degree of differencing (the number of times the data have had past values subtracted), andq
is the order of the movingaverage model. Seasonal ARIMA models are usually denotedARIMA(p,d,q)(P,D,Q)m
, wherem
refers to the number of periods in each season, and the uppercaseP
,D
,Q
refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.When two out of the three terms are zeros, the model may be referred to based on the nonzero parameter, dropping “AR”, “I” or “MA” from the acronym describing the model. For example,
ARIMA(1,0,0)
isAR(1)
,ARIMA(0,1,0)
isI(1)
, andARIMA(0,0,1)
isMA(1)
. [1]See notes for more practical information on the
ARIMA
class.Parameters: order : iterable or arraylike, shape=(3,)
The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters to use.
p
is the order (number of time lags) of the autoregressive model, and is a nonnegative integer.d
is the degree of differencing (the number of times the data have had past values subtracted), and is a nonnegative integer.q
is the order of the movingaverage model, and is a nonnegative integer.seasonal_order : arraylike, shape=(4,), optional (default=None)
The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity.
D
must be an integer indicating the integration order of the process, whileP
andQ
may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include.S
is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.start_params : arraylike, optional (default=None)
Starting parameters for
ARMA(p,q)
. If None, the default is given byARMA._fit_start_params
.transparams : bool, optional (default=True)
Whether or not to transform the parameters to ensure stationarity. Uses the transformation suggested in Jones (1980). If False, no checking for stationarity or invertibility is done.
method : str, one of {‘cssmle’,’mle’,’css’}, optional (default=None)
This is the loglikelihood to maximize. If “cssmle”, the conditional sum of squares likelihood is maximized and its values are used as starting values for the computation of the exact likelihood via the Kalman filter. If “mle”, the exact likelihood is maximized via the Kalman Filter. If “css” the conditional sum of squares likelihood is maximized. All three methods use start_params as starting parameters. See above for more information. If fitting a seasonal ARIMA, the default is ‘lbfgs’
solver : str or None, optional (default=’lbfgs’)
Solver to be used. The default is ‘lbfgs’ (limited memory BroydenFletcherGoldfarbShanno). Other choices are ‘bfgs’, ‘newton’ (NewtonRaphson), ‘nm’ (NelderMead), ‘cg’  (conjugate gradient), ‘ncg’ (nonconjugate gradient), and ‘powell’. By default, the limited memory BFGS uses m=12 to approximate the Hessian, projected gradient tolerance of 1e8 and factr = 1e2. You can change these by using kwargs.
maxiter : int, optional (default=None)
The maximum number of function evaluations. Statsmodels defaults this value to 50 for SARIMAX models and 500 for ARIMA and ARMA models. If passed as None, will use the seasonal order to determine which to use (50 for seasonal, 500 otherwise).
disp : int, optional (default=0)
If True, convergence information is printed. For the default ‘lbfgs’
solver
, disp controls the frequency of the output during the iterations. disp < 0 means no output in this case.callback : callable, optional (default=None)
Called after each iteration as callback(xk) where xk is the current parameter vector. This is only used in nonseasonal ARIMA models.
suppress_warnings : bool, optional (default=False)
Many warnings might be thrown inside of statsmodels. If
suppress_warnings
is True, all of these warnings will be squelched.out_of_sample_size : int, optional (default=0)
The number of examples from the tail of the time series to hold out and use as validation examples. The model will not be fit on these samples, but the observations will be added into the model’s
endog
andexog
arrays so that future forecast values originate from the end of the endogenous vector. Seeupdate()
.For instance:
y = [0, 1, 2, 3, 4, 5, 6] out_of_sample_size = 2 > Fit on: [0, 1, 2, 3, 4] > Score on: [5, 6] > Append [5, 6] to end of self.arima_res_.data.endog values
scoring : str, optional (default=’mse’)
If performing validation (i.e., if
out_of_sample_size
> 0), the metric to use for scoring the outofsample data. One of {‘mse’, ‘mae’}scoring_args : dict, optional (default=None)
A dictionary of keyword arguments to be passed to the
scoring
metric.trend : str or None, optional (default=None)
The trend parameter. If
with_intercept
is True,trend
will be used. Ifwith_intercept
is False, the trend will be set to a no intercept value. If None andwith_intercept
, ‘c’ will be used as a default.with_intercept : bool, optional (default=True)
Whether to include an intercept term. Default is True.
**sarimax_kwargs : keyword args, optional
Optional arguments to pass to the constructor for seasonal ARIMA fits. Examples of potentially valuable kwargs:
 time_varying_regression : boolean Whether or not coefficients on the exogenous regressors are allowed to vary over time.
 enforce_stationarity : boolean Whether or not to transform the AR parameters to enforce stationarity in the autoregressive component of the model.
 enforce_invertibility : boolean Whether or not to transform the MA parameters to enforce invertibility in the moving average component of the model.
 simple_differencing : boolean Whether or not to use partially conditional maximum likelihood estimation for seasonal ARIMA models. If True, differencing is performed prior to estimation, which discards the first \(s D + d\) initial rows but results in a smaller statespace formulation. If False, the full SARIMAX model is put in statespace form so that all datapoints can be used in estimation. Default is False.
Attributes
arima_res_ (ModelResultsWrapper) The model results, per statsmodels oob_ (float) The MAE or MSE of the outofsample records, if out_of_sample_size
is > 0, else np.nanoob_preds_ (np.ndarray or None) The predictions for the outofsample records, if out_of_sample_size
is > 0, else NoneSee also
Notes
 Since the
ARIMA
class currently wrapsstatsmodels.tsa.arima_model.ARIMA
, which does not provide support for seasonality, the only way to fit seasonal ARIMAs is to manually lag/preprocess your data appropriately. This might change in the future. [2]  After the model fit, many more methods will become available to the
fitted model (i.e.,
pvalues()
,params()
, etc.). These are delegate methods which wrap the internal ARIMA results instance.
References
[R37] https://wikipedia.org/wiki/Autoregressive_integrated_moving_average [R38] Statsmodels ARIMA documentation: http://bit.ly/2wc9Ra8 Methods
add_new_observations
(y[, exogenous])Update the endog/exog samples after a model fit. aic
()Get the AIC, the Akaike Information Criterion: aicc
()Get the AICc, the corrected Akaike Information Criterion: arparams
()Get the parameters associated with the AR coefficients in the model. arroots
()The roots of the AR coefficients are the solution to: bic
()Get the BIC, the Bayes Information Criterion: bse
()Get the standard errors of the parameters. conf_int
([alpha])Returns the confidence interval of the fitted parameters. df_model
()The model degrees of freedom: k_exog
+k_trend
+k_ar
+k_ma
.df_resid
()Get the residual degrees of freedom: fit
(y[, exogenous])Fit an ARIMA to a vector, y
, of observations with an optional matrix ofexogenous
variables.fit_predict
(y[, exogenous, n_periods])Fit an ARIMA to a vector, y
, of observations with an optional matrix ofexogenous
variables, and then generate predictions.get_params
([deep])Get parameters for this estimator. hqic
()Get the HannanQuinn Information Criterion: maparams
()Get the value of the moving average coefficients. maroots
()The roots of the MA coefficients are the solution to: oob
()If the model was built with out_of_sample_size
> 0, a validation score will have been computed.params
()Get the parameters of the model. plot_diagnostics
([variable, lags, fig, figsize])Plot an ARIMA’s diagnostics. predict
([n_periods, exogenous, …])Forecast future values predict_in_sample
([exogenous, start, end, …])Generate insample predictions from the fit ARIMA model. pvalues
()Get the pvalues associated with the tvalues of the coefficients. resid
()Get the model residuals. set_params
(**params)Set the parameters of this estimator. summary
()Get a summary of the ARIMA model to_dict
()Get the ARIMA model as a dictionary update
(y[, exogenous, maxiter])Update the model fit with additional observed endog/exog values. 
__init__
(order, seasonal_order=None, start_params=None, method=None, transparams=True, solver='lbfgs', maxiter=None, disp=0, callback=None, suppress_warnings=False, out_of_sample_size=0, scoring='mse', scoring_args=None, trend=None, with_intercept=True, **sarimax_kwargs)[source][source]¶ Initialize self. See help(type(self)) for accurate signature.

add_new_observations
(y, exogenous=None, **kwargs)[source][source]¶ Update the endog/exog samples after a model fit.
After fitting your model and creating forecasts, you’re going to need to attach new samples to the data you fit on. These are used to compute new forecasts (but using the same estimated parameters).
Parameters: y : arraylike or iterable, shape=(n_samples,)
The timeseries data to add to the endogenous samples on which the
ARIMA
estimator was previously fit. This may either be a PandasSeries
object or a numpy array. This should be a one dimensional array of finite floats.exogenous : arraylike, shape=[n_obs, n_vars], optional (default=None)
An optional 2d array of exogenous variables. If the model was fit with an exogenous array of covariates, it will be required for updating the observed values.
**kwargs : keyword args
Any keyword args that should be passed as
**fit_kwargs
in the new model fit.

aic
()[source][source]¶ Get the AIC, the Akaike Information Criterion:
2 * llf + 2 * df_model
Where
df_model
(the number of degrees of freedom in the model) includes all AR parameters, MA parameters, constant terms parameters on constant terms and the variance.Returns: aic : float
The AIC
References
[R39] https://en.wikipedia.org/wiki/Akaike_information_criterion

aicc
()[source][source]¶ Get the AICc, the corrected Akaike Information Criterion:
AIC + 2 * df_model * (df_model + 1) / (nobs  df_model  1)
Where
df_model
(the number of degrees of freedom in the model) includes all AR parameters, MA parameters, constant terms parameters on constant terms and the variance. Andnobs
is the sample size.Returns: aicc : float
The AICc
References
[R40] https://en.wikipedia.org/wiki/Akaike_information_criterion#AICc

arparams
()[source][source]¶ Get the parameters associated with the AR coefficients in the model.
Returns: arparams : arraylike
The AR coefficients.

arroots
()[source][source]¶ The roots of the AR coefficients are the solution to:
(1  arparams[0] * z  arparams[1] * z^2  ...  arparams[ p1] * z^k_ar) = 0
Stability requires that the roots in modulus lie outside the unit circle.
Returns: arroots : arraylike
The roots of the AR coefficients.

bic
()[source][source]¶ Get the BIC, the Bayes Information Criterion:
2 * llf + log(nobs) * df_model
Where if the model is fit using conditional sum of squares, the number of observations
nobs
does not include thep
presample observations.Returns: bse : float
The BIC
References
[R41] https://en.wikipedia.org/wiki/Bayesian_information_criterion

bse
()[source][source]¶ Get the standard errors of the parameters. These are computed using the numerical Hessian.
Returns: bse : arraylike
The BSE

conf_int
(alpha=0.05, **kwargs)[source][source]¶ Returns the confidence interval of the fitted parameters.
Returns: alpha : float, optional (default=0.05)
The significance level for the confidence interval. ie., the default alpha = .05 returns a 95% confidence interval.
**kwargs : keyword args or dict
Keyword arguments to pass to the confidence interval function. Could include ‘cols’ or ‘method’

df_model
()[source][source]¶ The model degrees of freedom:
k_exog
+k_trend
+k_ar
+k_ma
.Returns: df_model : arraylike
The degrees of freedom in the model.

df_resid
()[source][source]¶ Get the residual degrees of freedom:
nobs  df_model
Returns: df_resid : arraylike
The residual degrees of freedom.

fit
(y, exogenous=None, **fit_args)[source][source]¶ Fit an ARIMA to a vector,
y
, of observations with an optional matrix ofexogenous
variables.Parameters: y : arraylike or iterable, shape=(n_samples,)
The timeseries to which to fit the
ARIMA
estimator. This may either be a PandasSeries
object (statsmodels can internally use the dates in the index), or a numpy array. This should be a onedimensional array of floats, and should not contain anynp.nan
ornp.inf
values.exogenous : arraylike, shape=[n_obs, n_vars], optional (default=None)
An optional 2d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an
ARIMA
is fit on exogenous features, it must be provided exogenous features for making predictions.**fit_args : dict or kwargs
Any keyword arguments to pass to the statsmodels ARIMA fit.

fit_predict
(y, exogenous=None, n_periods=10, **fit_args)[source]¶ Fit an ARIMA to a vector,
y
, of observations with an optional matrix ofexogenous
variables, and then generate predictions.Parameters: y : arraylike or iterable, shape=(n_samples,)
The timeseries to which to fit the
ARIMA
estimator. This may either be a PandasSeries
object (statsmodels can internally use the dates in the index), or a numpy array. This should be a onedimensional array of floats, and should not contain anynp.nan
ornp.inf
values.exogenous : arraylike, shape=[n_obs, n_vars], optional (default=None)
An optional 2d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an
ARIMA
is fit on exogenous features, it must be provided exogenous features for making predictions.n_periods : int, optional (default=10)
The number of periods in the future to forecast.
fit_args : dict or kwargs, optional (default=None)
Any keyword args to pass to the fit method.

get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.

hqic
()[source][source]¶ Get the HannanQuinn Information Criterion:
2 * llf + 2 * (`df_model
) * log(log(nobs))`Like
bic()
if the model is fit using conditional sum of squares then thek_ar
presample observations are not counted innobs
.Returns: hqic : float
The HQIC
References
[R42] https://en.wikipedia.org/wiki/HannanQuinn_information_criterion

maparams
()[source][source]¶ Get the value of the moving average coefficients.
Returns: maparams : arraylike
The MA coefficients.

maroots
()[source][source]¶ The roots of the MA coefficients are the solution to:
(1 + maparams[0] * z + maparams[1] * z^2 + ... + maparams[ q1] * z^q) = 0
Stability requires that the roots in modules lie outside the unit circle.
Returns: maroots : arraylike
The MA roots.

oob
()[source][source]¶ If the model was built with
out_of_sample_size
> 0, a validation score will have been computed. Otherwise it will be np.nan.Returns: oob_ : float
The “outofbag” score.

params
()[source][source]¶ Get the parameters of the model. The order of variables is the trend coefficients and the
k_exog()
exogenous coefficients, then thek_ar()
AR coefficients, and finally thek_ma()
MA coefficients.Returns: params : arraylike
The parameters of the model.

plot_diagnostics
(variable=0, lags=10, fig=None, figsize=None)[source][source]¶ Plot an ARIMA’s diagnostics.
Diagnostic plots for standardized residuals of one endogenous variable
Parameters: variable : integer, optional
Index of the endogenous variable for which the diagnostic plots should be created. Default is 0.
lags : integer, optional
Number of lags to include in the correlogram. Default is 10.
fig : Matplotlib Figure instance, optional
If given, subplots are created in this figure instead of in a new figure. Note that the 2x2 grid will be created in the provided figure using fig.add_subplot().
figsize : tuple, optional
If a figure is created, this argument allows specifying a size. The tuple is (width, height).
See also
statsmodels.graphics.gofplots.qqplot
,pmdarima.utils.visualization.plot_acf
Notes
Produces a 2x2 plot grid with the following plots (ordered clockwise from top left):
 Standardized residuals over time
 Histogram plus estimated density of standardized residulas, along with a Normal(0,1) density plotted for reference.
 Normal QQ plot, with Normal reference line.
 Correlogram
References
[R43] https://www.statsmodels.org/dev/_modules/statsmodels/tsa/statespace/mlemodel.html#MLEResults.plot_diagnostics # noqa: E501

predict
(n_periods=10, exogenous=None, return_conf_int=False, alpha=0.05)[source][source]¶ Forecast future values
Generate predictions (forecasts)
n_periods
in the future. Note that ifexogenous
variables were used in the model fit, they will be expected for the predict procedure and will fail otherwise.Parameters: n_periods : int, optional (default=10)
The number of periods in the future to forecast.
exogenous : arraylike, shape=[n_obs, n_vars], optional (default=None)
An optional 2d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an
ARIMA
is fit on exogenous features, it must be provided exogenous features for making predictions.return_conf_int : bool, optional (default=False)
Whether to get the confidence intervals of the forecasts.
alpha : float, optional (default=0.05)
The confidence intervals for the forecasts are (1  alpha) %
Returns: forecasts : arraylike, shape=(n_periods,)
The array of forecasted values.
conf_int : arraylike, shape=(n_periods, 2), optional
The confidence intervals for the forecasts. Only returned if
return_conf_int
is True.

predict_in_sample
(exogenous=None, start=None, end=None, dynamic=False, return_conf_int=False, alpha=0.05, typ='levels')[source][source]¶ Generate insample predictions from the fit ARIMA model.
Predicts the original training (insample) time series values. This can be useful when wanting to visualize the fit, and qualitatively inspect the efficacy of the model, or when wanting to compute the residuals of the model.
Parameters: exogenous : arraylike, shape=[n_obs, n_vars], optional (default=None)
An optional 2d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an
ARIMA
is fit on exogenous features, it must be provided exogenous features for making predictions.start : int, optional (default=None)
Zeroindexed observation number at which to start forecasting, ie., the first forecast is start.
end : int, optional (default=None)
Zeroindexed observation number at which to end forecasting, ie., the first forecast is start.
dynamic : bool, optional (default=False)
The dynamic keyword affects insample prediction. If dynamic is False, then the insample lagged values are used for prediction. If dynamic is True, then insample forecasts are used in place of lagged dependent variables. The first forecasted value is start.
return_conf_int : bool, optional (default=False)
Whether to get the confidence intervals of the forecasts.
alpha : float, optional (default=0.05)
The confidence intervals for the forecasts are (1  alpha) %
typ : str, optional (default=’levels’)
The type of prediction to make. Options are (‘linear’, ‘levels’). This is only used when the underlying model is ARIMA (not ARMA or SARIMAX).
 ‘linear’: makes linear predictions in terms of the differenced endogenous variables.
 ‘levels’: predicts the levels of the original endogenous variables.
Returns: preds : array
The predicted values.
conf_int : arraylike, shape=(n_periods, 2), optional
The confidence intervals for the predictions. Only returned if
return_conf_int
is True.

pvalues
()[source][source]¶ Get the pvalues associated with the tvalues of the coefficients. Note that the coefficients are assumed to have a Student’s T distribution.
Returns: pvalues : arraylike
The pvalues.

resid
()[source][source]¶ Get the model residuals. If the model is fit using ‘mle’, then the residuals are created via the Kalman Filter. If the model is fit using ‘css’ then the residuals are obtained via
scipy.signal.lfilter
adjusted such that the firstk_ma()
residuals are zero. These zero residuals are not returned.Returns: resid : arraylike
The model residuals.

set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: self

to_dict
()[source][source]¶ Get the ARIMA model as a dictionary
Return the dictionary representation of the ARIMA model
Returns: res : dictionary
The ARIMA model as a dictionary.

update
(y, exogenous=None, maxiter=None, **kwargs)[source][source]¶ Update the model fit with additional observed endog/exog values.
Updating an ARIMA adds new observations to the model, updating the MLE of the parameters accordingly by performing several new iterations (
maxiter
) from the existing model parameters.Parameters: y : arraylike or iterable, shape=(n_samples,)
The timeseries data to add to the endogenous samples on which the
ARIMA
estimator was previously fit. This may either be a PandasSeries
object or a numpy array. This should be a one dimensional array of finite floats.exogenous : arraylike, shape=[n_obs, n_vars], optional (default=None)
An optional 2d array of exogenous variables. If the model was fit with an exogenous array of covariates, it will be required for updating the observed values.
maxiter : int, optional (default=None)
The number of iterations to perform when updating the model. If None, will perform
max(5, n_samples // 10)
iterations.**kwargs : keyword args
Any keyword args that should be passed as
**fit_kwargs
in the new model fit.Notes
 Internally, this calls
fit
again using the OLD model parameters as the starting parameters for the new model’s MLE computation.
 Internally, this calls