pytorch-lightning: Comet logger cannot be pickled after creating an experiment

๐Ÿ› Bug

The Comet logger cannot be pickled after an experiment (at least an OfflineExperiment) has been created.

To Reproduce

Steps to reproduce the behavior:

initialize the logger object (works fine)

from pytorch_lightning.loggers import CometLogger
import tests.base.utils as tutils
from pytorch_lightning import Trainer
import pickle

model, _ = tutils.get_default_model()
logger = CometLogger(save_dir='test')
pickle.dumps(logger)

initialize a Trainer object with the logger (works fine)

trainer = Trainer(
    max_epochs=1,
    logger=logger
)
pickle.dumps(logger)
pickle.dumps(trainer)

access the experiment attribute which creates the OfflineExperiment object (fails)

logger.experiment
pickle.dumps(logger)
>> TypeError: can't pickle _thread.lock objects

Expected behavior

We should be able to pickle loggers for distributed training.

Environment

  • CUDA: - GPU: - available: False - version: None
  • Packages: - numpy: 1.18.1 - pyTorch_debug: False - pyTorch_version: 1.4.0 - pytorch-lightning: 0.7.5 - tensorboard: 2.1.0 - tqdm: 4.42.0
  • System: - OS: Darwin - architecture: - 64bit - - processor: i386 - python: 3.7.6 - version: Darwin Kernel Version 19.3.0: Thu Jan 9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 15 (2 by maintainers)

Most upvoted comments

I still see this bug as well with WandB logger.

I donโ€™t know if it can help or if it is the right place, but a similar error occurswhen running in ddp mode with the WandB logger.

WandB uses a lambda function at some point.

Does the logger have to pickled ? Couldnโ€™t it log only on rank 0 at epoch_end ?

Traceback (most recent call last):
  File "../train.py", line 140, in <module>
    main(args.gpus, args.nodes, args.fast_dev_run, args.mixed_precision, project_config, hparams)
  File "../train.py", line 117, in main
    trainer.fit(model)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 751, in fit
    mp.spawn(self.ddp_train, nprocs=self.num_processes, args=(model,))
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 149, in start_processes
    process.start()
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TorchHistory.add_log_hooks_to_pytorch_module.<locals>.<lambda>'

also related: #1704

Currently having this issue with wandbLogger.

I still have this error with 1.5.10 on macOS

Error executing job with overrides: ['train.pl_trainer.fast_dev_run=False', 'train.pl_trainer.gpus=0', 'train.pl_trainer.precision=32', 'logging.wandb_arg.mode=offline']
Traceback (most recent call last):
  File "/Users/ric/Documents/PhD/Projects/ed-experiments/src/train.py", line 78, in main
    train(conf)
  File "/Users/ric/Documents/PhD/Projects/ed-experiments/src/train.py", line 70, in train
    trainer.fit(pl_module, datamodule=pl_data_module)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
    self._evaluation_loop.run()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 140, in run
    self.on_run_start(*args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 86, in on_run_start
    self._dataloader_iter = _update_dataloader_iter(data_fetcher, self.batch_progress.current.ready)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py", line 121, in _update_dataloader_iter
    dataloader_iter = enumerate(data_fetcher, batch_idx)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 198, in __iter__
    self._apply_patch()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 133, in _apply_patch
    apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 181, in loader_iters
    loader_iters = self.dataloader_iter.loader_iters
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 537, in loader_iters
    self._loader_iters = self.create_loader_iters(self.loaders)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 577, in create_loader_iters
    return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py", line 104, in apply_to_collection
    v = apply_to_collection(
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py", line 96, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 177, in __iter__
    self._loader_iter = iter(self.loader)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 918, in __init__
    w.start()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TorchHistory.add_log_hooks_to_pytorch_module.<locals>.<lambda>'