scikit-learn: Verbosity option is not working in GridSearchCV (Jupyter notebook)
Describe the bug
So this issue has been addressed before here by darrencl, but the user didn’t follow up with lesteve response.
The problem is that GridSearchCV doesn’t show the elapsed time periodically, or any log, I am setn_jobs = -1
, and verbose = 1
. I tried setting n_jobs
to other values, the same with verbose
, but nothing happened.
Note that this didn’t happen until I updated scikit-learn from version 0.22.1 to 1.0.2.
lesteve in his response assumed that this problem is due to ipykernel <6
, which is not the case with me.
Steps/Code to Reproduce
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
iris = datasets.load_iris()
params = {'n_estimators':[10,20,30,40,50,60],
'max_depth':[20,50,60,70,80]}
grid_obj = GridSearchCV(estimator=RandomForestClassifier(), param_grid=params, n_jobs=-1, verbose=1, cv=5)
grid_obj.fit(iris.data, iris.target)
Expected Results
This is the output when using version 0.22.1
Actual Results
Fitting 5 folds for each of 30 candidates, totalling 150 fits
Versions
System:
python: 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]
executable: D:\Programs\ApplicationsSetup\anaconda3\python.exe
machine: Windows-10-10.0.19041-SP0
Python dependencies:
pip: 21.2.2
setuptools: 58.0.4
sklearn: 1.0.2
numpy: 1.19.2
scipy: 1.6.2
Cython: 0.29.25
pandas: 1.4.1
matplotlib: 3.5.1
joblib: 1.1.0
threadpoolctl: 2.2.0
!jupyter --version
Selected Jupyter core packages...
IPython : 7.31.1
ipykernel : 6.4.1
ipywidgets : 7.6.5
jupyter_client : 6.1.12
jupyter_core : 4.9.1
jupyter_server : 1.13.5
jupyterlab : 3.2.9
nbclient : 0.5.11
nbconvert : 6.1.0
nbformat : 5.1.3
notebook : 6.4.9
qtconsole : 5.2.2
traitlets : 5.1.1
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 4
- Comments: 22 (9 by maintainers)
@lesteve I’m also having the problem on SLURM as well, even after updating scikit-learn to 1.3.0. Only works properly when n_jobs=1.
I also tried your suggestion in the comment you referenced above, but it didn’t help. I’m not entirely sure if I did it right. I used the argument
--signal=B:TERM@n_duration
to send the SIGTERM signaln_duration
seconds before the job’s time limit is reached. Then I ransys.stdout.flush()
andsys.stderr.flush()
in a function that I passed tosignal.signal(signal.SIGTERM, flush_function)
. It didn’t resolve the issue.@lesteve should we open another issue for this behavior outside of Jupyter then? I have a similar issue outside of Jupyter Notebook as @princyok and @Jose-Verdu-Diaz. Fresh sklearn install at v1.3.0 from conda-forge. Occurs for me with slurm jobs as @Jose-Verdu-Diaz describes but also regular command line execution on Linux. The behavior occurs so long as n_jobs != 1( as far as I can tell). Only the first line regarding the total number of combinations and validation setsis printed.
When I use:
I can see verbose in terminal instead of notebook. I do not know why, but it works for me.
Yes, please start by opening a separate issue for the regular command line execution on Linux, which definitely seems unexpected. Please add a standalone snippet reproducing the problem, which will make it a lot more likely it get looked at and eventually fixed.
On the SLURM one, it may be that there is some buffering going on in different places (SLURM side, OS side, …), see https://github.com/dask/dask-jobqueue/issues/299#issuecomment-609002077 for some quick experiments I made a while ago on a SGE cluster.
I have same problem. Linux, scikit-learn 0.23.2, not running in a notebook. I’m running from command line and printing to a file. It prints properly when n_jobs=None (same as n_jobs=1), but no progress is printed when n_jobs=-1.
Regardless of what n_jobs is set to, it prints the first line, “Fitting n folds for each of nn candidates, totalling nnn fits”, and then stops there if n_jobs=-1.
Are there any updates on this? I have the same problem on Apple M1. I’m also teaching machine learning and I see that most of my students also have this problem. We set n_jobs=4 for them. No verbosity setting gives any output while running, but the output appears after the job is completed (often even in a different cell. It’s very annoying since it’s demotivating for the students if they don’t know how far along their gridsearch is
Reading the top post in more details:
verbose > 1
if you want more details about the cross-validation progress. Maybe you did not need it in scikit-learn 0.22.1 but you definitely need it now. You can double-check withn_jobs=1
andverbose=10
, you should definitely see some output about the cross-validation progressA screenshot with a slight variation (first
.fit
does not show any progress, second.fit
does):When I use the multiprocessing backend I do get the output consistently so maybe a weird interaction between loky and ipykernel sub-process output capturing?