scikit-learn: Setting random_state and np.random.seed does not ensure reproducibility
I think it would be great and make things a lot easier, if there would be a top level API for scikit-learn
scikit-learn.set_random_seed
This would help a lot for reproducibility as one would not have to remember setting random states for each algorithm that is called. This has to deal with multiprocessing though I guess.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 27 (13 by maintainers)
I’m asking, because right now I have problems with reproducibility. I set the np.random.seed as well as each algorithms random state, however the results are still a bit different each time a run the scripts.
This looks like a multiprocessing issue. When I run this with
n_jobs=1
It seems that I always get the same result.This was previously requested in https://github.com/scikit-learn/scikit-learn/issues/5781 and the solution (i.e. using numpy global random seed) is documented in the FAQ.
Sorry, I forgot to remove the passwordprotection. Should be public now.