why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fa

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

刚分手的茶壶 · Amadeus and Sabre ...· 1 周前 ·

威武的蜡烛 · Annotations | Kubernetes· 2 月前 ·

会搭讪的猕猴桃 · 肃南裕固自治县人民政府-“饭圈集资”风险及法 ...· 3 月前 ·

不敢表白的啤酒 · 极米投影仪遥控器原装机通用蓝牙语音Z4X ...· 4 月前 ·

体贴的小刀 · XML Namespace 命名空间 - ...· 4 月前 ·

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225 why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225 ihadanny opened this issue Nov 4, 2019 · 9 comments

in this commit: https://github.com/statsmodels/statsmodels/commit/d03474da1aae5dac54c2b4441311d01517cd2567 the sarimax model was changed to only warn on non-invertible/stationary start_params and select "0" start params instead, while on ARMA we continue to fail if that happens.

Why? what's the reasoning behind not doing this automatically? will putting zeros instead always lead to fitting a non-invertible/stationary model? or is it bad otherwise?

Because the starting parameters estimators are usually consistent estimators of the true parameters (even if not efficient), if they suggest a non-stationary model then that likely indicates problems with the model specification. The original behavior of statsmodels was to raise an error in this case. However, there is nothing wrong with at least trying to fit a model in these cases, so using arbitrary stationary starting parameters (like all zeros) is a valid option.

More recently, there has been a greater emphasis on model selection / automatic forecasting / cross validation type exercises, where a large number of model specifications are evaluated, and the error became cumbersome to work around. Because of this, we modified SARIMAX to only issue a warning, so that these tasks would be easier.

We did not retrofit ARIMA because that model is essentially in a "maintenance-only" state.

(Closing as answered, but feel free to follow up if you have questions or comments).

thanks for your patience and quick response!!! several followup questions:

once the SARIMAX fit is done, how are you handling a non-invertible/stationary result? do you warn about it? raise an error? are you trying to convert it to a invertible/stationary result by some method?

and what is the code doing if the SARIMAX fit is done and the result is almost non-invertible/stationary? (the roots are close to the unit circle)


    ARIMA because that model is essentially in a "maintenance-only" state

- so are you recommending that we'll use SARIMAX instead whenever possible? or is there an advantage in continuing to use ARIMA?

If you have enforce_stationary=True and enforce_invertibility=True (the defaults), then it is not possible to get an non-stationary / non-invertible model.

If you set those equal to False , then we just return the results, whether or not they are stationary / invertible. There's nothing wrong with a non-stationary / non-invertible model.

and what is the code doing if the SARIMAX fit is done and the result is almost non-invertible/stationary? (the roots are close to the unit circle)

We don't do anything special in this case, we just return the results as usual.

ARIMA because that model is essentially in a "maintenance-only" state - so are you recommending that we'll use SARIMAX instead whenever possible? or is there an advantage in continuing to use ARIMA?

It's probably best to use SARIMAX, yes.

it is not possible to get an non-stationary / non-invertible model

can you please point me to a paper or to the code of how you're doing that? are you replacing the bad roots with their reciprocals? or is it another procedure?

There's nothing wrong with a non-stationary / non-invertible model

why do you say so? in https://otexts.com/fpp2/arima-r.html they say that:

Any roots close to the unit circle may be numerically unstable, and the corresponding model will not be good for forecasting

doesn't that mean that non-stationary / non-invertible models are bad? or is it only a problem if the roots are close to the unit root, not if they are far inside/outside the circle? and if it's not a problem, than why did you say about the non-stationarity of the start_params that:

if they suggest a non-stationary model then that likely indicates problems with the model specification

I don't understand why its a problem there but not here.

It's probably best to use SARIMAX, yes

Then maybe it's a good idea that the popular https://github.com/tgsmith61591/pmdarima package use SARIMAX instead of ARIMA...

On Wed, Nov 6, 2019, 21:13 Ido Hadanny ***@***.***> wrote: it is not possible to get an non-stationary / non-invertible model can you please point me to a paper or to the code of how you're doing that? are you replacing the bad roots with their reciprocals? or is it another procedure? There's nothing wrong with a non-stationary / non-invertible model why do you say so? in https://otexts.com/fpp2/arima-r.html <http://url> they say that: Any roots close to the unit circle may be numerically unstable, and the corresponding model will not be good for forecasting doesn't that mean that non-stationary / non-invertible models are bad? or is it only a problem if the roots are *close* to the unit root, not if they are far inside/outside the circle? and if it's not a problem, than why did you say about the non-stationarity of the start_params that: if they suggest a non-stationary model then that likely indicates problems with the model specification It's probably best to use SARIMAX, yes Then maybe it's a good idea that the popular https://github.com/tgsmith61591/pmdarima <http://url> package use SARIMAX instead of ARIMA... You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < #6225 ?email_source=notifications&email_token=ABKTSRMFE7DD7XCCV6OXHA3QSMXQFA5CNFSM4JIY4XA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIADDA#issuecomment-550502796>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABKTSRKRCR4D6V4MOCDIOHTQSMXQFANCNFSM4JIY4XAQ >

As for pdmarima, you will need to discuss that in the pdmarima tracker since that project is not affiliated with sm. On Wed, Nov 6, 2019, 21:25 Kevin Sheppard <[email protected]> wrote:

The log likelihood is only defined for stationary time series when using full MLE. This is how and why the model is restricted to be stationary. Invertibility isn't a strict requirement but helps with point identification. Unstable when need the unit circle means that they are not precisely estimated. This is leads to problems forecasting since forecasts from. 99 and. 975 and 0.9 look very different after a few steps. On Wed, Nov 6, 2019, 21:13 Ido Hadanny ***@***.***> wrote: > it is not possible to get an non-stationary / non-invertible model > can you please point me to a paper or to the code of how you're doing > that? are you replacing the bad roots with their reciprocals? or is it > another procedure? > There's nothing wrong with a non-stationary / non-invertible model > why do you say so? in https://otexts.com/fpp2/arima-r.html <http://url> > they say that: > Any roots close to the unit circle may be numerically unstable, and the > corresponding model will not be good for forecasting > doesn't that mean that non-stationary / non-invertible models are bad? or > is it only a problem if the roots are *close* to the unit root, not if > they are far inside/outside the circle? and if it's not a problem, than why > did you say about the non-stationarity of the start_params that: > if they suggest a non-stationary model then that likely indicates > problems with the model specification > It's probably best to use SARIMAX, yes > Then maybe it's a good idea that the popular > https://github.com/tgsmith61591/pmdarima <http://url> package use > SARIMAX instead of ARIMA... > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < #6225 ?email_source=notifications&email_token=ABKTSRMFE7DD7XCCV6OXHA3QSMXQFA5CNFSM4JIY4XA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIADDA#issuecomment-550502796>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ABKTSRKRCR4D6V4MOCDIOHTQSMXQFANCNFSM4JIY4XAQ >

can you please point me to a paper or to the code of how you're doing that? are you replacing the bad roots with their reciprocals? or is it another procedure?

We maximize the likelihood function numerically, and we do not consider parameter combinations that would lead to a non-stationary / non-invertible model (as long as enforce_stationary=True and enforce_invertibility=True ).

Specifically, this is done by numerically maximizing over an unconstrained parameter space so that these parameters essentially describe partial autocorrelations. Then we convert these unconstrained (partial autocorrelation) parameters into the corresponding autoregressive or moving average components, which will be stationary / invertible by definition. The citation is:

Monahan, John F. 1984. "A Note on Enforcing Stationarity in Autoregressive-moving Average Models." Biometrika 71 (2) (August 1): 403-404.

thank you very much for these insights, I'd be sure to check out this paper and try my best to understand the technique.
But just to close this logic puzzle, there's one piece that's avoiding me:

When I asked you if non-stationary fit in the start_params method (CSS or some other approximate estimator) are a problem you said it's a big problem, and you even considered throwing an error:

but when I asked you about non-stationary result of the fancy MLE estimation, you said that its not really a problem:

There's nothing wrong with a non-stationary / non-invertible model.

And you're not at all worried about almost non-stationary results, which can happen and Hyndman ( https://otexts.com/fpp2/arima-r.html ) suggests never to use:

The auto.arima() function is even stricter, and will not select a model with roots close to the unit circle either

The answer to your question is that there are three different issues here:

What is the correct order of integration of a time series

What constitutes a valid model for estimation

What parameter values are numerically stable

For (1): If your data is integrated (non-stationary) then if you select an SARIMAX model that enforces stationarity, you will not be able to recover the true data generating process. What will happen is that the parameter estimates will likely be very close to a non-stationary model, but as I mentioned above, they are constrained to be stationary.

That is why if you select a model that enforces stationarity, but the estimated starting parameters are non-stationary, we issue the warning, so that you know that your model may be inappropriate for the data.

In the page you liked to, Hyndman is describing a procedure to automatically determine an SARIMAX model specification, including the order of integration. He is using a heuristic procedure to do this, and so he has apparently made the choice that it is best to reject models that are very close to being non-stationary (I would guess in favor of an additional application of differencing).

It is different here, though, because SARIMAX requires that you specify the model you want to fit. It then finds the best parameters for the given specification . The closer analogue to SARIMAX in R is Arima() and not auto.arima() .

For (2): SARIMAX is estimated by putting the model into state space form, and there is no theoretical problem with non-stationary state space models. The likelihood function is slightly different due to different initializations, but our model class can handle this case with no problem.

For (3): there are various known statistical issues with numerical stability around the boundaries of parameter constraints. We bound our parameters very slightly away from the boundary for this reason. Hyndman's concern, however, appears to me to be not so much about numerical stability as it is about finding a good heuristic method for automatically selecting model orders.

auto_arima's error_action="ignore" does not work when alternative training methods are specified alkaline-ml/pmdarima#312