pytorch-lightning: Progress bar doesn't show up on Kaggle TPU with `num_workers` greater than `0`.

🐛 Bug

As the issue title says: the progress bar doesn’t show up on Kaggle TPU with num_workers greater than 0.

Disclaimer: I haven’t tested this program on Google Colab TPU.

To Reproduce

Set num_workers to any number greater than zero up to max CPU cores. On Kaggle, the following code sets it to 4.

train_dataset_loader = DataLoader(train_dataset, 
                                  batch_size=BATCH_SIZE, 
                                  shuffle=True, 
                                  num_workers=multiprocessing.cpu_count(),
                                  drop_last=True)

The training is successful but instead of showing progress bar the following output is shown:

2021-10-04 04:35:45.588313: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib
/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py:103: LightningDeprecationWarning: The signature of `Callback.on_train_epoch_end` has changed in v1.3. `outputs` parameter has been removed. Support for the old signature will be removed in v1.5
  "The signature of `Callback.on_train_epoch_end` has changed in v1.3."
/opt/conda/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/tpu_spawn.py:192: UserWarning: cleaning up tpu spawn environment...
  rank_zero_warn("cleaning up tpu spawn environment...")

Expected behavior

The progrès bar must show up.

Environment

PyTorch Lightning Version: 1.4.9
PyTorch Version: 1.8.0a0+6e9f2c8
Python version: 3.7.10
OS (e.g., Linux): Linux
CUDA/cuDNN version: N/A
GPU models and configuration: N/A
How you installed PyTorch (conda, pip, source): pip
If compiling from source, the output of torch.__config__.show(): N/A
Any other relevant information: TPU on Kaggle

Additional context

N/A

cc @kaushikb11 @rohitgr7 @tchaton

About this issue

Original URL
State: open
Created 3 years ago
Comments: 33 (22 by maintainers)

Most upvoted comments

@RahulBhalley @Programmer-RD-AI I will take a stab at this with GCP TPUs.

kaushikb11 on Oct 13, 2021