statsmodels: SARIMAX python np.linalg.linalg.LinAlgError: LU decomposition error

Hello, I have a problem with time series analysis. I have a dataset with 5 features. Following is the subset of my input dataset:

date,price,year,day,totaltx 1/1/2016 0:00,434.46,2016,1,126762 1/2/2016 0:00,433.59,2016,2,147449 1/3/2016 0:00,430.36,2016,3,148661 1/4/2016 0:00,433.49,2016,4,185279 1/5/2016 0:00,432.25,2016,5,178723 1/6/2016 0:00,429.46,2016,6,184207

My endogenous data is price column and exogenous data is totaltx price. This is the code I am running and getting an error:

import statsmodels.api as sm
import pandas as pd
import numpy as np
from numpy.linalg import LinAlgError

def arima(filteredData, coinOutput, window, horizon, trainLength):
    start_index = 0
    end_index = 0
    inputNumber = filteredData.shape[0]
    predictions = np.array([], dtype=np.float32)
    prices = np.array([], dtype=np.float32)
    # sliding on time series data with 1 day step
    while ((end_index) < inputNumber - 1):
        end_index = start_index + trainLength
        trainFeatures = filteredData[start_index:end_index]["totaltx"]
        trainOutput = coinOutput[start_index:end_index]["price"]

        arima = sm.tsa.statespace.SARIMAX(endog=trainOutput.values, exog=trainFeatures.values, order=(window, 0, 0))
        arima_fit = arima.fit(disp=0)
        testdata=filteredData[end_index:end_index+1]["totaltx"]
        total_sample = end_index-start_index
        predicted = arima_fit.predict(start=total_sample, end=total_sample, exog=np.array(testdata.values).reshape(-1,1))
        price = coinOutput[end_index:end_index + 1]["price"].values

        predictions = np.append(predictions, predicted)
        prices = np.append(prices, price)

        start_index = start_index + 1
    return predictions, prices

def processCoins(bitcoinPrice, window, horizon):
    output = bitcoinPrice[horizon:][["date", "day", "year", "price"]]
    return output

trainLength=100;
for window in [3,5]:
    for horizon in [1,2,5,7,10]:
        bitcoinPrice = pd.read_csv("..\\prices.csv", sep=",")
        coinOutput = processCoins(bitcoinPrice, window, horizon)
        predictions, prices = arima(bitcoinPrice, coinOutput, window, horizon, trainLength)

In this code, I am using rolling window regression technique. I am training arima for start_index:end_index and predicting the test data with end_index:end_index+1

This the error that is thrown from my code:

Traceback (most recent call last):
  File "C:/PycharmProjects/coinLogPrediction/src/arima.py", line 115, in <module>
    predictions, prices = arima(filteredBitcoinPrice, coinOutput, window, horizon, trainLength, outputFile)
  File "C:/PycharmProjects/coinLogPrediction/src/arima.py", line 64, in arima
    arima_fit = arima.fit(disp=0)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 469, in fit
    skip_hessian=True, **kwargs)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\model.py", line 466, in fit
    full_output=full_output)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 191, in _fit
    hess=hessian)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 410, in _fit_lbfgs
    **extra_kwargs)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py", line 193, in fmin_l_bfgs_b
    **opts)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py", line 328, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py", line 273, in func_and_grad
    f = fun(x, *args)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\optimize.py", line 292, in function_wrapper
    return function(*(wrapper_args + args))
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\model.py", line 440, in f
    return -self.loglike(params, *args) / nobs
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 646, in loglike
    loglike = self.ssm.loglike(complex_step=complex_step, **kwargs)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\kalman_filter.py", line 825, in loglike
    kfilter = self._filter(**kwargs)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\kalman_filter.py", line 747, in _filter
    self._initialize_state(prefix=prefix, complex_step=complex_step)
  File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\representation.py", line 723, in _initialize_state
    self._statespaces[prefix].initialize_stationary(complex_step)
  File "_representation.pyx", line 1351, in statsmodels.tsa.statespace._representation.dStatespace.initialize_stationary
  File "_tools.pyx", line 1151, in statsmodels.tsa.statespace._tools._dsolve_discrete_lyapunov
numpy.linalg.linalg.LinAlgError: LU decomposition error.

I believe that there is a bug in statsmodels if I do not have any error. Can you please help me to solve it?

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 22 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Hi folks, I was having the same error,

Erroneous code: mod = sm.tsa.SARIMAX(y, order=(0 1,0), seasonal_order=(1,0,0,12)) res = mod.fit()

This gave me error : LinAlgError: Schur decomposition solver error

I was able to solve this error after doing some modification in code: mod = sm.tsa.SARIMAX(y, order=(0 1,0), seasonal_order=(1,0,0,12),enforce_stationarity=False) res = mod.fit()

Hope this helps…🙂

(previous comment was submitted too fast)

Ultimately the issue here is the behavior when the parameters imply a non-stationary model, but only barely. I would guess that the symptom of an exception being raised (or not) has to do with the underlying LAPACK library.

Can you check what happens if you run the following?

mod = sm.tsa.SARIMAX(np.zeros(10))
res = mod.filter([1, 1.])
res.filter_results.initial_state_cov

I’m guessing that you will get an exception. When I run this, I get:

[[-4.98960077e+291]]

and of course this is not a valid covariance matrix. That is because there when the process is not covariance stationary then of course there is no stationary distribution, but the default initialization method, “stationary”, tries to solve for such a distribution. That’s why @SureshArr was able to solve the problem by using diffuse initialization, which does not assume a stationary distribution.

The problem is not the above example, though, where an error is appropriate because I have specified a non-stationary model. Solving for the stationary covariance requires solving a discrete Lyapunov equation, and the problem is what precision the underlying solver is able to handle.

To see this, just consider using scipy.linalg.solve_discrete_lyapunov. For me, I get:

> print(solve_discrete_lyapunov([[1 - 6e-17]], [[1.]]))
array([[4.50359963e+15]])
> print(solve_discrete_lyapunov([[1 - 5e-17]], [[1.]]))
...
LinAlgError: Matrix is singular.

Both of these models are theoretically stationary, but somewhere between 1-6e-17 and 1-5e-17, we lose enough precision that the model appears non-stationary.

Anyway, we use an internal Cython version of the discrete Lyapunov solver, and when you get the LU error, that’s basically responding to the same numerical problem that causes the LinAlgError: Matrix is singular. above.

Okay, so what should be done?

  1. In the example I gave above, we should probably raise an exception rather than run the Kalman filter with that invalid (negative) covariance matrix.
  2. Rather than the LU decomposition error, we should raise a more information error (‘Implied model is not stationary to numerical precision’ or something).

But I don’t think we can prevent this error condition from developing in all cases. We use a parameter transformation method to enforce parameters implying stationary models, but we can’t (as far as I know) enforce that it will be stationary to whatever precision is required by the discrete Lyapunov solver.