neptune-client: BUG: Error msg during training - Timestamp must be non-decreasing for series attribute

Describe the bug

When running Neptune logger in PytorchLightning with ddp > 1gpus. Then there are continuous errors reading Error occured during asynchronous operation processing. Timestamp must be non-decreasing for series attribute If the Neptune logger is offline, or if neptune logger is removed then this error isn’t logged. There are too many errors, and even the progress bar of the training is difficult to identify.

Reproduction

When i run with 4gpus, i was able to reproduce this. https://colab.research.google.com/drive/1TOadmpet63eSXz6LMHVvdM-D6Gy0LDxe?usp=sharing

Expected behavior

If this is a valid error message, then there is no hint of what actions needs to be taken. If they are harmless/not valid kindly suggest a way to suppress this print.

Traceback

Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z

Environment

The output of pip list: PyTorch version: 1.9.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.3 Libc version: glibc-2.31

Python version: 3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: RTX A6000 GPU 1: RTX A6000 GPU 2: RTX A6000 GPU 3: RTX A6000 GPU 4: RTX A6000 GPU 5: RTX A6000 GPU 6: RTX A6000 GPU 7: RTX A6000

Nvidia driver version: 460.91.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries:

Versions of relevant libraries: [pip3] mypy==0.910 [pip3] mypy-extensions==0.4.3 [pip3] neptune-pytorch-lightning==0.9.7 [pip3] numpy==1.21.2 [pip3] pytorch-lightning==1.4.9 [pip3] torch==1.9.0+cu111 [pip3] torch-poly-lr-decay==0.0.1 [pip3] torchaudio==0.9.0 [pip3] torchmetrics==0.4.1 [conda] blas 1.0 mkl [conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.3.0 h06a4308_520 [conda] mkl-service 2.4.0 py38h7f8727e_0 [conda] mkl_fft 1.3.0 py38h42c9631_2 [conda] mkl_random 1.2.2 py38h51133e4_0 [conda] mypy 0.910 pypi_0 pypi [conda] mypy-extensions 0.4.3 pypi_0 pypi [conda] neptune-client 0.12.0 pypi_0 pypi [conda] neptune-contrib 0.27.3 pypi_0 pypi [conda] neptune-pytorch-lightning 0.9.7 pypi_0 pypi [conda] numpy 1.21.1 pypi_0 pypi [conda] numpy-base 1.21.2 py38h79a1101_0 [conda] pytorch-lightning 1.4.9 pypi_0 pypi [conda] torch 1.9.0+cu111 pypi_0 pypi [conda] torch-poly-lr-decay 0.0.1 pypi_0 pypi [conda] torchaudio 0.9.0 pypi_0 pypi [conda] torchmetrics 0.4.1 pypi_0 pypi

Additional context

image

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 19 (10 by maintainers)

Most upvoted comments

Hi @kamil-kaczmarek am now not having any issues with the suggested workaround.