scikit-learn: Suggestion: Add support for unpenalized logistic regression

LinearRegression provides unpenalized OLS, and SGDClassifier, which supports loss="log", also supports penalty="none". But if you want plain old unpenalized logistic regression, you have to fake it by setting C in LogisticRegression to a large number, or use Logit from statsmodels instead.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 11
  • Comments: 34 (20 by maintainers)

Most upvoted comments

You’re asking why would I want to do logistic regression without regularization? Because (1) sometimes the sample is large enough in proportion to the number of features that regularization won’t buy one anything and (2) sometimes the best-fitting coefficients are of interest, as opposed to maximizing predictive accuracy.

I can’t say much about (1) since computation isn’t my forte. For (2), I am a data analyst with a background in statistics. I know that scikit-learn focuses on traditional machine learning, but it is in my opinion the best Python package for data analysis right now, and I think it will benefit from not limiting itself too much. (I also think, following Larry Wasserman and Andrew Gelman, that statistics and machine learning would mutually benefit from intermingling more, but I guess that’s its own can of worms.) All coefficients will change with regularization; that’s what regularization does.

I’m not opposed to adding penalty="none" but I’m not sure what the benefit is to adding a redundant option.

  1. It becomes clearer how to get an unpenalized model.
  2. It becomes clearer to the reader what code that’s using an unpenalized model is trying to do.
  3. It allows sklearn to change its implementation of unregularized models in the future without breaking people’s code.

For the folks that really want unregularised logistic regression (like myself). I’ve been having to settle with using statsmodels and making a wrapper class that mimics SKLearn API.

Any updates on this? This is a big blocker for my willingness to recommend scikit-learn to people. It’s also not at all obvious to people coming from other libraries that scikit-learn does regularization by default and that there’s no way to disable it.

Hey, What is the status on this topic? I’d be really interested in an unpenalized Logistic Regression. This way p-values will mean something statistically speaking. Otherwise I will have to continue using R 😢 for such use cases… Thanks, Alex

@shermstats So @Kodiologist suggested adding penalty="none" to make it more explicit, which would just be an alias for C=np.inf. It makes sense for me to make this more explicit in this way. Do you have thoughts on that? Then that would be what’s in the documentation. And I agree that bold might be a good idea. I think for someone with a ML background this is (maybe?) expected, for someone with a stats background, this is seems very surprising.

What solvers do you suggest to implement? How would that be different from the solvers we already have with C -> infty ?

You could try looking at R or statsmodels for ideas. I’m not familiar with their methods, but they’re reasonably fast and use no regularization at all.

Or statsmodels?

@mblondel is there an alternative to “iterative solvers”? You won’t get exactly the unregularized option, right?

@Kodiologist why do you want this?