scikit-learn: Suggestion: Add support for unpenalized logistic regression
LinearRegression
provides unpenalized OLS, and SGDClassifier
, which supports loss="log"
, also supports penalty="none"
. But if you want plain old unpenalized logistic regression, you have to fake it by setting C
in LogisticRegression
to a large number, or use Logit
from statsmodels
instead.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 11
- Comments: 34 (20 by maintainers)
You’re asking why would I want to do logistic regression without regularization? Because (1) sometimes the sample is large enough in proportion to the number of features that regularization won’t buy one anything and (2) sometimes the best-fitting coefficients are of interest, as opposed to maximizing predictive accuracy.
I can’t say much about (1) since computation isn’t my forte. For (2), I am a data analyst with a background in statistics. I know that scikit-learn focuses on traditional machine learning, but it is in my opinion the best Python package for data analysis right now, and I think it will benefit from not limiting itself too much. (I also think, following Larry Wasserman and Andrew Gelman, that statistics and machine learning would mutually benefit from intermingling more, but I guess that’s its own can of worms.) All coefficients will change with regularization; that’s what regularization does.
For the folks that really want unregularised logistic regression (like myself). I’ve been having to settle with using statsmodels and making a wrapper class that mimics SKLearn API.
Any updates on this? This is a big blocker for my willingness to recommend scikit-learn to people. It’s also not at all obvious to people coming from other libraries that scikit-learn does regularization by default and that there’s no way to disable it.
Hey, What is the status on this topic? I’d be really interested in an unpenalized Logistic Regression. This way p-values will mean something statistically speaking. Otherwise I will have to continue using R 😢 for such use cases… Thanks, Alex
@shermstats So @Kodiologist suggested adding
penalty="none"
to make it more explicit, which would just be an alias forC=np.inf
. It makes sense for me to make this more explicit in this way. Do you have thoughts on that? Then that would be what’s in the documentation. And I agree that bold might be a good idea. I think for someone with a ML background this is (maybe?) expected, for someone with a stats background, this is seems very surprising.You could try looking at R or statsmodels for ideas. I’m not familiar with their methods, but they’re reasonably fast and use no regularization at all.
Or statsmodels?
@mblondel is there an alternative to “iterative solvers”? You won’t get exactly the unregularized option, right?
@Kodiologist why do you want this?