statsmodels: Inconsistent results for mixed linear models under Python 2.7 and 3.4

statsmodels.mixedlm generates different results under Python 2.7 and Python 3.4. Moreover, in Python 3.4, the model does not converge when re_formula is used with variable slope.

Here is a simple example:

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
from copy import deepcopy

np.random.seed(seed=0)

data1 = pd.DataFrame(columns=['x','y','id'])
data2 = pd.DataFrame(columns=['x','y','id'])
data1['x'] = np.random.normal(size=100)
data1['y'] = deepcopy(data1['x'])*3 + np.random.normal(size=100)
data1['id'] = np.ones(100)
data2['x'] = np.random.normal(size=100)
data2['y'] = deepcopy(data2['x'])*2 + np.random.normal(size=100)
data2['id'] = np.ones(100)*2

data = pd.concat([data1,data2])
data = data.reset_index(drop=True)

md = smf.mixedlm('y ~ x', data, groups=data['id'], re_formula='~ x')
mdf = md.fit(method='cg')
print(mdf.summary())

Python 2.7 converges and generates the following results:

Mixed Linear Model Regression Results
==============================================================
Model:               MixedLM   Dependent Variable:   y        
No. Observations:    200       Method:               REML    
No. Groups:          2         Scale:                0.9636  
Min. group size:     100       Likelihood:           -286.7528
Max. group size:     100       Converged:            Yes      
Mean group size:     100.0                                    
--------------------------------------------------------------
                   Coef.  Std.Err.   z    P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept           -0.048    0.532 -0.091 0.928 -1.091  0.994
x                    2.631    1.673  1.572 0.116 -0.649  5.911
Intercept RE         0.558                                    
Intercept RE x x RE  1.660                                    
x RE                 5.597                                    
==============================================================

Python 3.4 does not converge and gives these results:

/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)
/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)
/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)
/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)
          Mixed Linear Model Regression Results
=========================================================
Model:              MixedLM Dependent Variable: y        
No. Observations:   200     Method:             REML     
No. Groups:         2       Scale:              0.9680   
Min. group size:    100     Likelihood:         -284.9909
Max. group size:    100     Converged:          No       
Mean group size:    100.0                                
---------------------------------------------------------
                    Coef.  Std.Err. z P>|z| [0.025 0.975]
---------------------------------------------------------
Intercept           -0.057                               
x                    2.620                               
Intercept RE         0.034                               
Intercept RE x x RE  0.129                               
x RE                 0.482                               
=========================================================

Which are pretty different for some of the parameters. Any ideas?

About this issue

Original URL
State: open
Created 7 years ago
Comments: 21 (12 by maintainers)

Most upvoted comments

@joergdietrich Thanks for the report. We would like this to be as consistent as possible with other packages. However the sitka example in the notebook and the simulated example above both have their MLE and RMLE on the boundary of the parameter space. This may be somewhat common in practice, but it leads to many computational and statistical issues.

I went back to the simulated example above and ran it in R and in statsmodels using 3 different optimizers. The optimized values of the log-likelihoods are:

R        -284.9909
CG       -284.99089658226671
Powell   -285.00451084828364
BFGS     -284.99189483242549

The final gradient values (for the transformed parameters in the optimization) are

CG      3.69642747e-04,   3.15785599e-05,  -1.48518980e-03
BFGS    1.15947898e+00,  -1.28081785e-04,  -9.35476367e-06
Powell  0.07870292       -0.03229172       -0.11194649

So CG is pretty well converged but the others are not quite there.

Looking at the fitted parameters, here is what we get in statsmodels under CG:

--------------------------------------------------------
              Coef.  Std.Err.   z    P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept     -0.057    0.187 -0.303 0.762 -0.423  0.310
x              2.620    0.590  4.438 0.000  1.463  3.777
Group Var      0.034    3.737
Group x x Cov  0.129    6.487
x Var          0.482    3.913

And here is what R produces:

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 0.03434  0.1853
          x           0.48174  0.6941   1.00
 Residual             0.96804  0.9839
Number of obs: 200, groups:  id, 2

Fixed effects:
            Estimate Std. Error t value
(Intercept)  -0.0566     0.1484  -0.381
x             2.6198     0.4959   5.283

These two solutions achieve essentially the same likelihood value, and the parameter estimates themselves are also nearly identical.

I hope that the discepancies that people are seeing are not just due to misreading the output. We report variance and covariances, so, e.g. the correlation between the two random effects in our output is

0.129 / sqrt(0.034 * 0.482) = 0.999...

the is the same correlation parameter that R reports as 1, so it is as close as we can hope for in my view.

I haven’t looked at the sitka example in a while, but my recollection is that we achieve a slightly higher log-likelihood value than R.

We have quite a bit of empirical data here and elsewhere that the log-likelihood, score, and hessian are correctly implemented for the MLE and RMLE. However the optimizers that we use (from scipy) are intended for problems in which the optimum is in the interior of the parameter space. When the optimum is on the boundary, the MLE/RMLE may not even be unique, and even if it is unique, there is no guarantee that these algorithms will find it. I don’t know what optimizer R and Matlab are using, but as far as I know it is some type of first order method similar to what scipy uses and therefore is subject to the same limitations.

I believe we are also all using the same transformed objective function, which profiles out the scale parameter then represents the random effects covariance matrix in terms of its Cholesky factor.

We should probably restructure the notebooks so that the introductory notebook uses examples that are not on the boundary, then we could have a separate notebook discussing these more advanced issues.

kshedden on Mar 7, 2018