statsmodels: VAR.select_order fails with error: leading minor not positive definite

Description

Error when using AIC/BIC for estimation of lag order of panel data VAR.select_order fails with error: leading minor not positive definit

Steps/Code to Reproduce

import sys
if sys.version_info[0] < 3: 
	from StringIO import StringIO
else:
	from io import StringIO

data = StringIO( """year,EMP,EXP,FDI,GDP,IMP,INV,POP,SCH
1985-12-31,-0.600002288818,-0.519251093649,0.0172895179326,343.047077715,-0.0155128274468,21.0900857189,10432.0,1.55879974365
1986-12-31,0.20000076294,-2.92588396895,-0.0471372879359,3781.80778054,-3.17834442278,21.6702492132,12088.0,0.934272766113
1987-12-31,-0.79999923706,-0.785600765759,0.127376039775,2767.24634409,0.183283017284,22.1634119422,12895.0,1.12815093994
1988-12-31,-0.100002288818,0.764353061268,0.281765102438,1372.11759093,0.337772953884,23.2364725578,3995.0,1.02146148682
1989-12-31,0.200000762939,1.28699396662,0.170768030872,8.4478333761,1.31272158497,24.1526823616,-7679.0,1.69625854492
1990-12-31,-0.200000762939,-0.671089285508,0.0334772596474,4090.27884202,-0.579022873342,24.292503403,-16110.0,2.24333953857
1991-12-31,-0.299999237061,0.356437465294,0.154186052052,-12.821621682,-0.114743997494,23.485842831,-283602.0,4.64342498779
1992-12-31,-0.5,-0.0191752044561,0.36237724611,2154.64071382,-0.990565321401,21.8468545733,245400.0,3.76345062256
1993-12-31,-0.599998474121,-0.539051088858,0.0100402471518,-1433.79606655,-1.54661046608,19.4364526769,-36354.0,0.619705200195
1994-12-31,-0.799999237061,0.851716486486,-0.433294236903,1122.26814589,1.05187537807,20.2064323616,-35122.0,8.32598876953
1995-12-31,0.399997711182,0.944024676778,0.347354312854,3412.44313497,0.852879203581,20.4010393749,-5722.0,0.019630432128
1996-12-31,0.0,0.416521681018,-0.113278657663,-22.7131728208,0.247989894414,19.5180904722,-3508.0,-1.5471496582
1997-12-31,-0.399997711182,2.42025194749,0.216734153813,-2655.83389725,1.22453573511,19.3478075214,549.0,-1.14661407471
1998-12-31,0.399997711182,0.553106401446,0.376023071468,741.94367509,0.871294299925,20.5757976296,9699.0,-1.00560760498
1999-12-31,0.299999237061,-0.0758958929632,1.11134508028,-302.072727339,0.303178743186,21.255270179,88978.0,-0.967002868653
2000-12-31,0.900001525879,2.4535495512,-0.041176207454,-2333.65416814,3.57280938856,22.3863413728,105359.0,-0.779197692871
2001-12-31,0.70000076294,-0.3792611061,0.602483229768,61.6759094867,-0.563435854478,22.0726142006,29150.0,-0.896873474121
2002-12-31,0.200000762939,-0.754410210115,-0.194424244833,1747.92485184,-1.1467465118,21.2296120016,2903.0,-0.170776367187
2003-12-31,0.899997711182,-1.42890925869,-1.14195337928,5415.93898089,-0.854627183828,21.098508768,-8219.0,0.466323852539
2004-12-31,-0.200000762939,0.298148376192,-0.615046802383,4183.56096417,0.766655201432,21.8033505576,20396.0,-0.255020141602
2005-12-31,-0.199996948242,0.459826336657,2.19020741025,1004.98378135,1.49540992583,22.3756728357,14446.0,4.35261535645
2006-12-31,-0.100002288818,0.807009225064,-0.469819962461,1664.78220523,1.22822148338,23.1735632265,-32439.0,-0.199272155762
2007-12-31,0.600002288818,-0.0474539712776,-0.24951771788,5056.0754404,0.407912321994,24.1112830028,-47167.0,-0.237197875977
2008-12-31,0.399997711182,0.257833629677,-0.820000150457,3812.48173737,0.718299970195,24.0879324599,-36092.0,-0.211791992187
2009-12-31,-0.700000762939,-3.31459631034,-1.64367108012,-3781.93429981,-3.64139261289,21.3046095006,-26707.0,-0.447036743164
2010-12-31,-0.199996948242,1.97159972357,0.787280471432,-925.365182428,2.4289912674,21.911241588,-11586.0,0.148376464843
2011-12-31,-0.100002288818,1.76026885872,0.07422687689,3101.7096733,2.44029648423,23.2105916425,-5204.0,-0.104484558105
2012-12-31,-0.200000762939,0.718314939639,0.00374871044388,-2969.45146641,0.31306555971,22.6469611568,1750.0,0.442909240723
2013-12-31,-0.199996948242,0.0862602849314,-0.352932056144,1733.17355291,-0.187841040292,22.307559936,-4707.0,-0.724670410157
2014-12-31,-0.200000762939,0.441108502551,-0.913401220112,-24.3592027136,0.552591351855,22.5902706231,211536.0,-0.296638488769""" )


from statsmodels.tsa.api import VAR
import pandas as pd


sta_data = pd.read_csv(data,index_col="year")
sta_data.index = pd.DatetimeIndex(sta_data.index)

model = VAR(sta_data)

maxlags = 10

print model.select_order(maxlags)
print model.fit(maxlags=maxlags, ic='aic').summary()

Expected Results

This is an example only not the real expected result:

                 VAR Order Selection
======================================================
            aic          bic          fpe         hqic
------------------------------------------------------
0        -27.70       -27.65    9.358e-13       -27.68
1        -28.02      -27.82*    6.745e-13      -27.94*
2        -28.03       -27.66    6.732e-13       -27.88
3       -28.04*       -27.52   6.651e-13*       -27.83
4        -28.03       -27.36    6.681e-13       -27.76
5        -28.02       -27.19    6.773e-13       -27.69
6        -27.97       -26.98    7.147e-13       -27.57
7        -27.93       -26.79    7.446e-13       -27.47
8        -27.94       -26.64    7.407e-13       -27.41
9        -27.96       -26.50    7.280e-13       -27.37
10       -27.91       -26.30    7.629e-13       -27.26
11       -27.86       -26.09    8.076e-13       -27.14
12       -27.83       -25.91    8.316e-13       -27.05
13       -27.80       -25.73    8.594e-13       -26.96
14       -27.80       -25.57    8.627e-13       -26.90
15       -27.81       -25.43    8.599e-13       -26.85
======================================================
* Minimum

{'aic': 3, 'bic': 1, 'fpe': 3, 'hqic': 1}

Actual Results

Traceback (most recent call last):
  File "Scratchbox.py", line 52, in <module>
    print model.select_order(maxlags)
  File "C:\Anaconda2\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py", line 505, in select_order
    for k, v in iteritems(result.info_criteria):
  File "C:\Anaconda2\lib\site-packages\statsmodels\base\wrapper.py", line 35, in __getattribute__
    obj = getattr(results, attr)
  File "C:\Anaconda2\lib\site-packages\statsmodels\tools\decorators.py", line 94, in __get__
    _cachedval = self.fget(obj)
  File "C:\Anaconda2\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py", line 1468, in info_criteria
    ld = logdet_symm(self.sigma_u_mle)
  File "C:\Anaconda2\lib\site-packages\statsmodels\tools\linalg.py", line 213, in logdet_symm
    c, _ = linalg.cho_factor(m, lower=True)
  File "C:\Anaconda2\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 132, in cho_factor
    check_finite=check_finite)
  File "C:\Anaconda2\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 30, in _cholesky
    raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 6-th leading minor not positive definite

Versions

Windows-10-10.0.10586
('Python', '2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)]')
('NumPy', '1.11.1')
('pandas', u'0.18.1')
('statsmodels', '0.6.1')

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Thanks, so this breaks at the same point in calculating the loglikelihood value, which is also used for the information criteria. It means that fit itself worked and we have params even though they won’t be very good with the small degrees of freedom (small sample size relative to the size of the model).

For now you cannot do much about this except looking for longer time series (e.g. quarterly data) or reducing the number of variables in the VAR. Essentially, the sample is much too short to get reliable estimates for time series analysis.

We won’t be able to get more information out of the data, but a possible solution is to use some penalization or regularization, or impose more prior structure. Specifically to the estimation of the covariance of the residuals: We could use SVD or eigenvalue decomposition instead of cholesky and handle singular sigma_u_mle. I’m not sure what the interpretation of a singular covariance matrix is in this case. We could also force it to be positive definite, but that’s a purely numerical solution.

As more general solution, I think this is also a candidate for #2942: Even if we can estimate a positive definite covariance matrix in not quite so small samples, it might still be very noisy and adding some shrinkage or regularization will most likely improve the estimate, eg. ledoit-wolf or regularized tyler estimate. (I guess this is the most promising approach in this case.) (also a bit related: #3250 penalization/prior infor for degenerate variance instead of degenerate correlation)

Another related issue: large sparse VAR #2933 or large penalized VAR. Sparse VAR will not applicable here because it is unlikely that the correlation with the macro-economics data is sparse.