scikit-learn: Setting random_state and np.random.seed does not ensure reproducibility

I think it would be great and make things a lot easier, if there would be a top level API for scikit-learn

scikit-learn.set_random_seed

This would help a lot for reproducibility as one would not have to remember setting random states for each algorithm that is called. This has to deal with multiprocessing though I guess.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 27 (13 by maintainers)

Most upvoted comments

I’m asking, because right now I have problems with reproducibility. I set the np.random.seed as well as each algorithms random state, however the results are still a bit different each time a run the scripts.

This looks like a multiprocessing issue. When I run this with n_jobs=1 It seems that I always get the same result.

This was previously requested in https://github.com/scikit-learn/scikit-learn/issues/5781 and the solution (i.e. using numpy global random seed) is documented in the FAQ.

Sorry, I forgot to remove the passwordprotection. Should be public now.