pytorch-lightning: WandbLogger doesn't format config correctly

Bug description

Summary: WandbLogger(config=config) does not provide the same behavior as wandb.init(config=config) in recent versions of pytorch-lightning

Explanation:

Passing config into wandb.init is supposed to create a nicely formatted config in WandB Example: wandb.init(config={'key1': {'key2': 'value'}}) The Run Overview in wandb looks like this: image

A more complicated example of a nested config: image In this WandB run, the keys are logged and searchable as e.g. model.pool.0 (with corresponding value 4)

However, this is what it looks like when you run pytorch_lightning.loggers.WandbLogger(config={'test_key': {'key2': 'test_value'}}) (which is supposed to pass the config entry straight through to wandb.init) image

Note that the keys are no longer nested and there’s only one level of hierarchy where the values are massive dictionaries. Instead of the WandB config having a key of test_key.key2 with value of test_value, there is only a key of test_key with a value of {'key2': 'test_value'}.

What version are you seeing the problem on?

v2_0

Note: I have used older versions of pytorch-lightning that do not have this issue. I’m not sure if it is a regression and have not had time to bisect.

How to reproduce the bug

See above

Error messages and logs

# Error messages and logs here please

Environment

❯ python collect_env_details.py

Current environment
  • CUDA: - GPU: - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - available: True - version: 12.1
  • Lightning: - lightning-utilities: 0.8.0 - pytorch-fast-transformers: 0.4.0 - pytorch-lightning: 2.0.1.post0 - pytorch-quantization: 2.1.2 - torch: 2.0.0a0+1767026 - torch-tensorrt: 1.4.0.dev0 - torchaudio: 2.0.1+3b40834 - torchmetrics: 0.11.4 - torchtext: 0.13.0a0+fae8e8c - torchvision: 0.15.0a0
  • Packages: - absl-py: 1.4.0 - accelerate: 0.18.0 - aiohttp: 3.8.4 - aiosignal: 1.3.1 - alembic: 1.10.3 - antlr4-python3-runtime: 4.9.3 - apex: 0.1 - appdirs: 1.4.4 - argcomplete: 3.0.5 - argon2-cffi: 21.3.0 - argon2-cffi-bindings: 21.2.0 - asttokens: 2.2.1 - astunparse: 1.6.3 - async-timeout: 4.0.2 - attrs: 22.2.0 - audioread: 3.0.0 - autopage: 0.5.1 - backcall: 0.2.0 - beautifulsoup4: 4.11.2 - bleach: 6.0.0 - blessed: 1.20.0 - blis: 0.7.9 - boto: 2.49.0 - cachetools: 5.3.0 - catalogue: 2.0.8 - cauchy-mult: 0.1 - certifi: 2022.12.7 - cffi: 1.15.1 - charset-normalizer: 3.1.0 - click: 8.1.3 - cliff: 4.2.0 - cloudpickle: 2.2.1 - cmaes: 0.9.1 - cmake: 3.24.1.1 - cmd2: 2.4.3 - colorlog: 6.7.0 - comm: 0.1.2 - confection: 0.0.4 - contourpy: 1.0.7 - crcmod: 1.7 - cryptography: 40.0.2 - cubinlinker: 0.2.2+2.g4de3e99 - cuda-python: 12.1.0rc5+1.gc7fd38c.dirty - cudf: 23.2.0 - cugraph: 23.2.0 - cugraph-dgl: 23.2.0 - cugraph-service-client: 23.2.0 - cugraph-service-server: 23.2.0 - cuml: 23.2.0 - cupy-cuda12x: 12.0.0b3 - cycler: 0.11.0 - cymem: 2.0.7 - cython: 0.29.33 - dask: 2023.1.1 - dask-cuda: 23.2.0 - dask-cudf: 23.2.0 - datasets: 2.11.0 - debugpy: 1.6.6 - decorator: 5.1.1 - deepspeed: 0.9.1 - defusedxml: 0.7.1 - dill: 0.3.6 - distributed: 2023.1.1 - docker-pycreds: 0.4.0 - docutils: 0.19 - dropout-layer-norm: 0.1 - eeghdf: 0.2.4 - einops: 0.6.1 - en-core-web-sm: 3.5.0 - exceptiongroup: 1.1.1 - execnet: 1.9.0 - executing: 1.2.0 - expecttest: 0.1.3 - fasteners: 0.18 - fastjsonschema: 2.16.3 - fastrlock: 0.8.1 - fftconv: 0.1 - filelock: 3.10.0 - flash-attn: 1.0.3.post0 - fonttools: 4.38.0 - frozenlist: 1.3.3 - fsspec: 2023.1.0 - ft-attention: 0.1 - fused-dense-lib: 0.0.0 - future: 0.18.3 - fvcore: 0.1.5.post20221221 - gast: 0.4.0 - gcs-oauth2-boto-plugin: 3.0 - gdown: 4.7.1 - gitdb: 4.0.10 - gitpython: 3.1.31 - google-apitools: 0.5.32 - google-auth: 2.16.2 - google-auth-oauthlib: 0.4.6 - google-reauth: 0.1.1 - gpustat: 1.1 - graphsurgeon: 0.4.6 - greenlet: 2.0.2 - grpcio: 1.51.3 - gsutil: 5.23 - h5py: 3.8.0 - heapdict: 1.0.1 - hjson: 3.1.0 - httplib2: 0.20.4 - huggingface-hub: 0.13.4 - hydra-colorlog: 1.2.0 - hydra-core: 1.3.2 - hydra-optuna-sweeper: 1.2.0 - hypothesis: 5.35.1 - idna: 3.4 - importlib-metadata: 6.0.0 - importlib-resources: 5.12.0 - iniconfig: 2.0.0 - intel-openmp: 2021.4.0 - iopath: 0.1.10 - ipdb: 0.13.13 - ipykernel: 6.21.3 - ipython: 8.11.0 - ipython-genutils: 0.2.0 - ipywidgets: 8.0.6 - jaraco.classes: 3.2.3 - jedi: 0.18.2 - jeepney: 0.8.0 - jinja2: 3.1.2 - joblib: 1.2.0 - json5: 0.9.11 - jsonschema: 4.17.3 - jupyter: 1.0.0 - jupyter-client: 8.0.3 - jupyter-console: 6.6.3 - jupyter-core: 5.2.0 - jupyter-tensorboard: 0.2.0 - jupyterlab: 2.3.2 - jupyterlab-pygments: 0.2.2 - jupyterlab-server: 1.2.0 - jupyterlab-widgets: 3.0.7 - jupytext: 1.14.5 - keopscore: 2.1.2 - keyring: 23.13.1 - kiwisolver: 1.4.4 - langcodes: 3.3.0 - librosa: 0.9.2 - lightning-utilities: 0.8.0 - lit: 15.0.7 - llvmlite: 0.39.1 - locket: 1.0.0 - mako: 1.2.4 - markdown: 3.4.1 - markdown-it-py: 2.2.0 - markupsafe: 2.1.2 - matplotlib: 3.7.0 - matplotlib-inline: 0.1.6 - mdit-py-plugins: 0.3.5 - mdurl: 0.1.2 - mistune: 2.0.5 - mkl: 2021.1.1 - mkl-devel: 2021.1.1 - mkl-include: 2021.1.1 - mlperf-logging: 2.1.0 - mock: 5.0.1 - monotonic: 1.6 - more-itertools: 9.1.0 - mpmath: 1.3.0 - msgpack: 1.0.4 - multidict: 6.0.4 - multiprocess: 0.70.14 - munch: 2.5.0 - murmurhash: 1.0.9 - nbclient: 0.7.2 - nbconvert: 7.2.10 - nbformat: 5.7.3 - nest-asyncio: 1.5.6 - networkx: 2.6.3 - ninja: 1.11.1 - notebook: 6.4.10 - numba: 0.56.4+1.g9a03de713 - numpy: 1.22.2 - nvidia-dali-cuda110: 1.23.0 - nvidia-ml-py: 11.525.112 - nvidia-pyindex: 1.0.9 - nvitop: 1.1.2 - nvtx: 0.2.5 - oauth2client: 4.1.3 - oauthlib: 3.2.2 - omegaconf: 2.3.0 - onnx: 1.13.0 - opencv: 4.6.0 - opt-einsum: 3.3.0 - optuna: 2.10.1 - packaging: 23.0 - pandas: 1.5.2 - pandocfilters: 1.5.0 - parso: 0.8.3 - partd: 1.3.0 - pathtools: 0.1.2 - pathy: 0.10.1 - pbr: 5.11.1 - pexpect: 4.8.0 - pickleshare: 0.7.5 - pillow: 9.2.0 - pip: 21.2.4 - pkginfo: 1.9.6 - pkgutil-resolve-name: 1.3.10 - platformdirs: 3.1.1 - pluggy: 1.0.0 - ply: 3.11 - polygraphy: 0.44.2 - pooch: 1.7.0 - portalocker: 2.7.0 - preshed: 3.0.8 - prettytable: 3.6.0 - prometheus-client: 0.16.0 - prompt-toolkit: 3.0.38 - protobuf: 3.20.3 - psutil: 5.9.4 - ptxcompiler: 0.7.0+27.gbcb4096 - ptyprocess: 0.7.0 - pure-eval: 0.2.2 - py-cpuinfo: 9.0.0 - pyarrow: 10.0.1.dev0+ga6eabc2b.d20230220 - pyasn1: 0.4.8 - pyasn1-modules: 0.2.8 - pybind11: 2.10.3 - pycocotools: 2.0+nv0.7.1 - pycparser: 2.21 - pydantic: 1.10.6 - pygments: 2.14.0 - pylibcugraph: 23.2.0 - pylibcugraphops: 23.2.0 - pylibraft: 23.2.0 - pynvml: 11.5.0 - pyopenssl: 23.1.1 - pyparsing: 3.0.9 - pyperclip: 1.8.2 - pyrootutils: 1.0.4 - pyrsistent: 0.19.3 - pysocks: 1.7.1 - pytest: 7.2.2 - pytest-rerunfailures: 11.1.2 - pytest-shard: 0.1.2 - pytest-xdist: 3.2.1 - python-dateutil: 2.8.2 - python-dotenv: 1.0.0 - python-hostlist: 1.23.0 - pytorch-fast-transformers: 0.4.0 - pytorch-lightning: 2.0.1.post0 - pytorch-quantization: 2.1.2 - pytz: 2022.7.1 - pyu2f: 0.1.5 - pyyaml: 6.0 - pyzmq: 25.0.1 - qtconsole: 5.4.2 - qtpy: 2.3.1 - raft-dask: 23.2.0 - readme-renderer: 37.3 - regex: 2022.10.31 - requests: 2.28.2 - requests-oauthlib: 1.3.1 - requests-toolbelt: 0.10.1 - resampy: 0.4.2 - responses: 0.18.0 - retry-decorator: 1.1.1 - rfc3986: 2.0.0 - rich: 13.3.4 - rmm: 23.2.0 - rotary-emb: 0.1 - rsa: 4.7.2 - scikit-learn: 1.2.0 - scipy: 1.6.3 - seaborn: 0.12.2 - secretstorage: 3.3.3 - send2trash: 1.8.0 - sentencepiece: 0.1.98 - sentry-sdk: 1.20.0 - setproctitle: 1.3.2 - setuptools: 65.5.1 - six: 1.16.0 - smart-open: 6.3.0 - smmap: 5.0.0 - sortedcontainers: 2.4.0 - soundfile: 0.12.1 - soupsieve: 2.4 - spacy: 3.5.1 - spacy-legacy: 3.0.12 - spacy-loggers: 1.0.4 - sphinx-glpi-theme: 0.3 - sqlalchemy: 2.0.10 - srsly: 2.4.6 - stack-data: 0.6.2 - stevedore: 5.0.0 - strings-udf: 23.2.0 - structured-kernels: 0.1.0 - sympy: 1.11.1 - tabulate: 0.9.0 - tbb: 2021.8.0 - tblib: 1.7.0 - tensorboard: 2.9.0 - tensorboard-data-server: 0.6.1 - tensorboard-plugin-wit: 1.8.1 - tensorrt: 8.5.3.1 - termcolor: 2.2.0 - terminado: 0.17.1 - thinc: 8.1.9 - threadpoolctl: 3.1.0 - thriftpy2: 0.4.16 - timm: 0.6.13 - tinycss2: 1.2.1 - tokenizers: 0.13.3 - toml: 0.10.2 - tomli: 2.0.1 - toolz: 0.12.0 - torch: 2.0.0a0+1767026 - torch-tensorrt: 1.4.0.dev0 - torchaudio: 2.0.1+3b40834 - torchmetrics: 0.11.4 - torchtext: 0.13.0a0+fae8e8c - torchvision: 0.15.0a0 - tornado: 6.2 - tqdm: 4.65.0 - traitlets: 5.9.0 - transformer-engine: 0.6.0 - transformers: 4.28.1 - treelite: 3.1.0 - treelite-runtime: 3.1.0 - triton: 2.0.0.dev20221202 - twine: 4.0.2 - typer: 0.7.0 - types-dataclasses: 0.6.6 - typing-extensions: 4.5.0 - ucx-py: 0.30.0 - uff: 0.6.9 - urllib3: 1.26.14 - wandb: 0.15.1 - wasabi: 1.1.1 - wcwidth: 0.2.6 - webencodings: 0.5.1 - werkzeug: 2.2.3 - wheel: 0.38.4 - widgetsnbextension: 4.0.7 - xdoctest: 1.0.2 - xentropy-cuda-lib: 0.1 - xxhash: 3.2.0 - yacs: 0.1.8 - yarl: 1.9.1 - zict: 2.2.0 - zipp: 3.14.0 - zstandard: 0.21.0
  • System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.8.10 - version: #1 SMP Debian 4.19.269-1 (2022-12-20)

More info

No response

cc @awaelchli @morganmcg1 @borisdayma @scottire @parambharat

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 16 (6 by maintainers)

Most upvoted comments

Okay, I finally tracked down the issue. It turned out I was passing in not a raw Python dictionary for the config but an omegaconf DictConfig object (https://omegaconf.readthedocs.io/en/2.1_branch/index.html). This is the dictionary object used by Hydra (https://hydra.cc/), but it seems that WandB doesn’t like it when you pass in this fancy dictionary object instead of a basic Python dictionary.

The solution is to convert the config to a Python dict before passing into WandbLogger(config=config) or before calling LightningModule.save_hyperparameters(config). In the case of Hydra one should call OmegaConf.to_container(config). Hopefully this issue helps other people who run into problems because I think Lightning + Hydra + WandB is a fairly common ML stack these days.

@awaelchli Thanks for addressing this and making the PR, and sorry for assuming the issue was with Lightning when it turned out to be an unfortunate interaction between multiple libraries.

I think it is perhaps possible for the libraries to help alleviate these sorts of issues; for example LightningModule.save_hyperparameters(config) could convert the config from any Mapping type to a raw dictionary before passing into the logger.

In fact reading through the Lightning code, it seems like log_hyperparams() https://github.com/Lightning-AI/lightning/blob/bd05aa96eddbfcb6f010228ec91ce09f1db4fd29/src/lightning/pytorch/loggers/wandb.py#L419 assumes the input is type Dict, but I think this issue occurred because I was passing in a Mapping and Python doesn’t actually enforce the type checking. Perhaps it makes sense to handle the case when params is a Mapping (like the log_metrics() method right below) and recursively convert it to a Dict in the _convert_params() function? After all the docstring of _convert_params() says “Ensure parameters are a dict or convert to dict if necessary.”

But perhaps it’s just the responsibility of the user to make sure all the libraries are interacting properly

I have fixed my own issue after identifying the problem, but I do think that what you described would be more robust to potential related issues. As is, even the Hydra + Lightning combination is probably fairly common and users would all run into this non-obvious issue.

Thanks for showing the snippet and screenshot. Let me dig in some more to see if there is something strange going on with my setup.