statsmodels: SARIMAX python np.linalg.linalg.LinAlgError: LU decomposition error
Hello, I have a problem with time series analysis. I have a dataset with 5 features. Following is the subset of my input dataset:
date,price,year,day,totaltx 1/1/2016 0:00,434.46,2016,1,126762 1/2/2016 0:00,433.59,2016,2,147449 1/3/2016 0:00,430.36,2016,3,148661 1/4/2016 0:00,433.49,2016,4,185279 1/5/2016 0:00,432.25,2016,5,178723 1/6/2016 0:00,429.46,2016,6,184207
My endogenous data is price column and exogenous data is totaltx price. This is the code I am running and getting an error:
import statsmodels.api as sm
import pandas as pd
import numpy as np
from numpy.linalg import LinAlgError
def arima(filteredData, coinOutput, window, horizon, trainLength):
start_index = 0
end_index = 0
inputNumber = filteredData.shape[0]
predictions = np.array([], dtype=np.float32)
prices = np.array([], dtype=np.float32)
# sliding on time series data with 1 day step
while ((end_index) < inputNumber - 1):
end_index = start_index + trainLength
trainFeatures = filteredData[start_index:end_index]["totaltx"]
trainOutput = coinOutput[start_index:end_index]["price"]
arima = sm.tsa.statespace.SARIMAX(endog=trainOutput.values, exog=trainFeatures.values, order=(window, 0, 0))
arima_fit = arima.fit(disp=0)
testdata=filteredData[end_index:end_index+1]["totaltx"]
total_sample = end_index-start_index
predicted = arima_fit.predict(start=total_sample, end=total_sample, exog=np.array(testdata.values).reshape(-1,1))
price = coinOutput[end_index:end_index + 1]["price"].values
predictions = np.append(predictions, predicted)
prices = np.append(prices, price)
start_index = start_index + 1
return predictions, prices
def processCoins(bitcoinPrice, window, horizon):
output = bitcoinPrice[horizon:][["date", "day", "year", "price"]]
return output
trainLength=100;
for window in [3,5]:
for horizon in [1,2,5,7,10]:
bitcoinPrice = pd.read_csv("..\\prices.csv", sep=",")
coinOutput = processCoins(bitcoinPrice, window, horizon)
predictions, prices = arima(bitcoinPrice, coinOutput, window, horizon, trainLength)
In this code, I am using rolling window regression technique. I am training arima for start_index:end_index and predicting the test data with end_index:end_index+1
This the error that is thrown from my code:
Traceback (most recent call last):
File "C:/PycharmProjects/coinLogPrediction/src/arima.py", line 115, in <module>
predictions, prices = arima(filteredBitcoinPrice, coinOutput, window, horizon, trainLength, outputFile)
File "C:/PycharmProjects/coinLogPrediction/src/arima.py", line 64, in arima
arima_fit = arima.fit(disp=0)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 469, in fit
skip_hessian=True, **kwargs)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\model.py", line 466, in fit
full_output=full_output)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 191, in _fit
hess=hessian)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 410, in _fit_lbfgs
**extra_kwargs)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py", line 193, in fmin_l_bfgs_b
**opts)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py", line 328, in _minimize_lbfgsb
f, g = func_and_grad(x)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\lbfgsb.py", line 273, in func_and_grad
f = fun(x, *args)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\scipy\optimize\optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\base\model.py", line 440, in f
return -self.loglike(params, *args) / nobs
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 646, in loglike
loglike = self.ssm.loglike(complex_step=complex_step, **kwargs)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\kalman_filter.py", line 825, in loglike
kfilter = self._filter(**kwargs)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\kalman_filter.py", line 747, in _filter
self._initialize_state(prefix=prefix, complex_step=complex_step)
File "C:\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\statespace\representation.py", line 723, in _initialize_state
self._statespaces[prefix].initialize_stationary(complex_step)
File "_representation.pyx", line 1351, in statsmodels.tsa.statespace._representation.dStatespace.initialize_stationary
File "_tools.pyx", line 1151, in statsmodels.tsa.statespace._tools._dsolve_discrete_lyapunov
numpy.linalg.linalg.LinAlgError: LU decomposition error.
I believe that there is a bug in statsmodels if I do not have any error. Can you please help me to solve it?
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 22 (6 by maintainers)
Hi folks, I was having the same error,
Erroneous code: mod = sm.tsa.SARIMAX(y, order=(0 1,0), seasonal_order=(1,0,0,12)) res = mod.fit()
This gave me error : LinAlgError: Schur decomposition solver error
I was able to solve this error after doing some modification in code: mod = sm.tsa.SARIMAX(y, order=(0 1,0), seasonal_order=(1,0,0,12),enforce_stationarity=False) res = mod.fit()
Hope this helps…🙂
(previous comment was submitted too fast)
Ultimately the issue here is the behavior when the parameters imply a non-stationary model, but only barely. I would guess that the symptom of an exception being raised (or not) has to do with the underlying LAPACK library.
Can you check what happens if you run the following?
I’m guessing that you will get an exception. When I run this, I get:
and of course this is not a valid covariance matrix. That is because there when the process is not covariance stationary then of course there is no stationary distribution, but the default initialization method, “stationary”, tries to solve for such a distribution. That’s why @SureshArr was able to solve the problem by using diffuse initialization, which does not assume a stationary distribution.
The problem is not the above example, though, where an error is appropriate because I have specified a non-stationary model. Solving for the stationary covariance requires solving a discrete Lyapunov equation, and the problem is what precision the underlying solver is able to handle.
To see this, just consider using
scipy.linalg.solve_discrete_lyapunov. For me, I get:Both of these models are theoretically stationary, but somewhere between
1-6e-17and1-5e-17, we lose enough precision that the model appears non-stationary.Anyway, we use an internal Cython version of the discrete Lyapunov solver, and when you get the LU error, that’s basically responding to the same numerical problem that causes the
LinAlgError: Matrix is singular.above.Okay, so what should be done?
But I don’t think we can prevent this error condition from developing in all cases. We use a parameter transformation method to enforce parameters implying stationary models, but we can’t (as far as I know) enforce that it will be stationary to whatever precision is required by the discrete Lyapunov solver.