statsmodels: Inconsistent results for mixed linear models under Python 2.7 and 3.4
statsmodels.mixedlm generates different results under Python 2.7 and Python 3.4. Moreover, in Python 3.4, the model does not converge when re_formula is used with variable slope.
Here is a simple example:
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
from copy import deepcopy
np.random.seed(seed=0)
data1 = pd.DataFrame(columns=['x','y','id'])
data2 = pd.DataFrame(columns=['x','y','id'])
data1['x'] = np.random.normal(size=100)
data1['y'] = deepcopy(data1['x'])*3 + np.random.normal(size=100)
data1['id'] = np.ones(100)
data2['x'] = np.random.normal(size=100)
data2['y'] = deepcopy(data2['x'])*2 + np.random.normal(size=100)
data2['id'] = np.ones(100)*2
data = pd.concat([data1,data2])
data = data.reset_index(drop=True)
md = smf.mixedlm('y ~ x', data, groups=data['id'], re_formula='~ x')
mdf = md.fit(method='cg')
print(mdf.summary())
Python 2.7 converges and generates the following results:
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: y
No. Observations: 200 Method: REML
No. Groups: 2 Scale: 0.9636
Min. group size: 100 Likelihood: -286.7528
Max. group size: 100 Converged: Yes
Mean group size: 100.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept -0.048 0.532 -0.091 0.928 -1.091 0.994
x 2.631 1.673 1.572 0.116 -0.649 5.911
Intercept RE 0.558
Intercept RE x x RE 1.660
x RE 5.597
==============================================================
Python 3.4 does not converge and gives these results:
/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
/usr/local/lib/python3.4/dist-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
Mixed Linear Model Regression Results
=========================================================
Model: MixedLM Dependent Variable: y
No. Observations: 200 Method: REML
No. Groups: 2 Scale: 0.9680
Min. group size: 100 Likelihood: -284.9909
Max. group size: 100 Converged: No
Mean group size: 100.0
---------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
---------------------------------------------------------
Intercept -0.057
x 2.620
Intercept RE 0.034
Intercept RE x x RE 0.129
x RE 0.482
=========================================================
Which are pretty different for some of the parameters. Any ideas?
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 21 (12 by maintainers)
@joergdietrich Thanks for the report. We would like this to be as consistent as possible with other packages. However the sitka example in the notebook and the simulated example above both have their MLE and RMLE on the boundary of the parameter space. This may be somewhat common in practice, but it leads to many computational and statistical issues.
I went back to the simulated example above and ran it in R and in statsmodels using 3 different optimizers. The optimized values of the log-likelihoods are:
The final gradient values (for the transformed parameters in the optimization) are
So CG is pretty well converged but the others are not quite there.
Looking at the fitted parameters, here is what we get in statsmodels under CG:
And here is what R produces:
These two solutions achieve essentially the same likelihood value, and the parameter estimates themselves are also nearly identical.
I hope that the discepancies that people are seeing are not just due to misreading the output. We report variance and covariances, so, e.g. the correlation between the two random effects in our output is
the is the same correlation parameter that R reports as 1, so it is as close as we can hope for in my view.
I haven’t looked at the sitka example in a while, but my recollection is that we achieve a slightly higher log-likelihood value than R.
We have quite a bit of empirical data here and elsewhere that the log-likelihood, score, and hessian are correctly implemented for the MLE and RMLE. However the optimizers that we use (from scipy) are intended for problems in which the optimum is in the interior of the parameter space. When the optimum is on the boundary, the MLE/RMLE may not even be unique, and even if it is unique, there is no guarantee that these algorithms will find it. I don’t know what optimizer R and Matlab are using, but as far as I know it is some type of first order method similar to what scipy uses and therefore is subject to the same limitations.
I believe we are also all using the same transformed objective function, which profiles out the scale parameter then represents the random effects covariance matrix in terms of its Cholesky factor.
We should probably restructure the notebooks so that the introductory notebook uses examples that are not on the boundary, then we could have a separate notebook discussing these more advanced issues.