pytorch-lightning: Cannot replicate training results with seed_everything and deterministic flag = True with DDP

🐛 Bug

I noticed this when I was adding more metrics calculation to the LightningModule, for example, adding the confusion matrix at the end of validation/test epoch. Before and after I added these functions (which do not appear to be dependent on any random seed), I noticed the training results are not the exactly the same.

However, once I added these function and re-ran again, yes I got the same training results.

To Reproduce

Code sample

Expected behavior

The training results should be identical even if some deterministic functions are added

Environment

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py
  • PyTorch Version (e.g., 1.0): 1.4
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.7
  • CUDA/cuDNN version: 10.1
  • GPU models and configuration: 4 GPUs DDP
  • Any other relevant information:

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

@awaelchli

Thanks I will do that and let you know.