scikit-learn: Suggestion: Add support for unpenalized logistic regression

LinearRegression provides unpenalized OLS, and SGDClassifier, which supports loss="log", also supports penalty="none". But if you want plain old unpenalized logistic regression, you have to fake it by setting C in LogisticRegression to a large number, or use Logit from statsmodels instead.

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 11
Comments: 34 (20 by maintainers)

Most upvoted comments

You’re asking why would I want to do logistic regression without regularization? Because (1) sometimes the sample is large enough in proportion to the number of features that regularization won’t buy one anything and (2) sometimes the best-fitting coefficients are of interest, as opposed to maximizing predictive accuracy.

+30

Kodiologist on Oct 11, 2016

I can’t say much about (1) since computation isn’t my forte. For (2), I am a data analyst with a background in statistics. I know that scikit-learn focuses on traditional machine learning, but it is in my opinion the best Python package for data analysis right now, and I think it will benefit from not limiting itself too much. (I also think, following Larry Wasserman and Andrew Gelman, that statistics and machine learning would mutually benefit from intermingling more, but I guess that’s its own can of worms.) All coefficients will change with regularization; that’s what regularization does.

+24

Kodiologist on Oct 11, 2016

I’m not opposed to adding penalty="none" but I’m not sure what the benefit is to adding a redundant option.

It becomes clearer how to get an unpenalized model.
It becomes clearer to the reader what code that’s using an unpenalized model is trying to do.
It allows sklearn to change its implementation of unregularized models in the future without breaking people’s code.

+12

Kodiologist on Nov 12, 2018

For the folks that really want unregularised logistic regression (like myself). I’ve been having to settle with using statsmodels and making a wrapper class that mimics SKLearn API.

arose13 on Apr 20, 2018

Any updates on this? This is a big blocker for my willingness to recommend scikit-learn to people. It’s also not at all obvious to people coming from other libraries that scikit-learn does regularization by default and that there’s no way to disable it.

shermstats on Nov 11, 2018

Hey, What is the status on this topic? I’d be really interested in an unpenalized Logistic Regression. This way p-values will mean something statistically speaking. Otherwise I will have to continue using R 😢 for such use cases… Thanks, Alex

alexcombessie on Feb 9, 2018

@shermstats So @Kodiologist suggested adding penalty="none" to make it more explicit, which would just be an alias for C=np.inf. It makes sense for me to make this more explicit in this way. Do you have thoughts on that? Then that would be what’s in the documentation. And I agree that bold might be a good idea. I think for someone with a ML background this is (maybe?) expected, for someone with a stats background, this is seems very surprising.

amueller on Nov 13, 2018

What solvers do you suggest to implement? How would that be different from the solvers we already have with C -> infty ?

You could try looking at R or statsmodels for ideas. I’m not familiar with their methods, but they’re reasonably fast and use no regularization at all.

Kodiologist on Feb 13, 2018

Or statsmodels?

jnothman on Feb 10, 2018

@mblondel is there an alternative to “iterative solvers”? You won’t get exactly the unregularized option, right?

@Kodiologist why do you want this?

amueller on Oct 11, 2016