pytorch-lightning: WandbLogger doesn't format config correctly
Bug description
Summary: WandbLogger(config=config)
does not provide the same behavior as wandb.init(config=config)
in recent versions of pytorch-lightning
Explanation:
Passing config
into wandb.init
is supposed to create a nicely formatted config in WandB
Example:
wandb.init(config={'key1': {'key2': 'value'}})
The Run Overview in wandb looks like this:
A more complicated example of a nested config:
In this WandB run, the keys are logged and searchable as e.g.
model.pool.0
(with corresponding value 4
)
However, this is what it looks like when you run
pytorch_lightning.loggers.WandbLogger(config={'test_key': {'key2': 'test_value'}})
(which is supposed to pass the config
entry straight through to wandb.init
)
Note that the keys are no longer nested and there’s only one level of hierarchy where the values are massive dictionaries. Instead of the WandB config having a key of test_key.key2
with value of test_value
, there is only a key of test_key
with a value of {'key2': 'test_value'}
.
What version are you seeing the problem on?
v2_0
Note: I have used older versions of pytorch-lightning that do not have this issue. I’m not sure if it is a regression and have not had time to bisect.
How to reproduce the bug
See above
Error messages and logs
# Error messages and logs here please
Environment
❯ python collect_env_details.py
Current environment
- CUDA: - GPU: - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - NVIDIA A100-SXM4-40GB - available: True - version: 12.1
- Lightning: - lightning-utilities: 0.8.0 - pytorch-fast-transformers: 0.4.0 - pytorch-lightning: 2.0.1.post0 - pytorch-quantization: 2.1.2 - torch: 2.0.0a0+1767026 - torch-tensorrt: 1.4.0.dev0 - torchaudio: 2.0.1+3b40834 - torchmetrics: 0.11.4 - torchtext: 0.13.0a0+fae8e8c - torchvision: 0.15.0a0
- Packages: - absl-py: 1.4.0 - accelerate: 0.18.0 - aiohttp: 3.8.4 - aiosignal: 1.3.1 - alembic: 1.10.3 - antlr4-python3-runtime: 4.9.3 - apex: 0.1 - appdirs: 1.4.4 - argcomplete: 3.0.5 - argon2-cffi: 21.3.0 - argon2-cffi-bindings: 21.2.0 - asttokens: 2.2.1 - astunparse: 1.6.3 - async-timeout: 4.0.2 - attrs: 22.2.0 - audioread: 3.0.0 - autopage: 0.5.1 - backcall: 0.2.0 - beautifulsoup4: 4.11.2 - bleach: 6.0.0 - blessed: 1.20.0 - blis: 0.7.9 - boto: 2.49.0 - cachetools: 5.3.0 - catalogue: 2.0.8 - cauchy-mult: 0.1 - certifi: 2022.12.7 - cffi: 1.15.1 - charset-normalizer: 3.1.0 - click: 8.1.3 - cliff: 4.2.0 - cloudpickle: 2.2.1 - cmaes: 0.9.1 - cmake: 3.24.1.1 - cmd2: 2.4.3 - colorlog: 6.7.0 - comm: 0.1.2 - confection: 0.0.4 - contourpy: 1.0.7 - crcmod: 1.7 - cryptography: 40.0.2 - cubinlinker: 0.2.2+2.g4de3e99 - cuda-python: 12.1.0rc5+1.gc7fd38c.dirty - cudf: 23.2.0 - cugraph: 23.2.0 - cugraph-dgl: 23.2.0 - cugraph-service-client: 23.2.0 - cugraph-service-server: 23.2.0 - cuml: 23.2.0 - cupy-cuda12x: 12.0.0b3 - cycler: 0.11.0 - cymem: 2.0.7 - cython: 0.29.33 - dask: 2023.1.1 - dask-cuda: 23.2.0 - dask-cudf: 23.2.0 - datasets: 2.11.0 - debugpy: 1.6.6 - decorator: 5.1.1 - deepspeed: 0.9.1 - defusedxml: 0.7.1 - dill: 0.3.6 - distributed: 2023.1.1 - docker-pycreds: 0.4.0 - docutils: 0.19 - dropout-layer-norm: 0.1 - eeghdf: 0.2.4 - einops: 0.6.1 - en-core-web-sm: 3.5.0 - exceptiongroup: 1.1.1 - execnet: 1.9.0 - executing: 1.2.0 - expecttest: 0.1.3 - fasteners: 0.18 - fastjsonschema: 2.16.3 - fastrlock: 0.8.1 - fftconv: 0.1 - filelock: 3.10.0 - flash-attn: 1.0.3.post0 - fonttools: 4.38.0 - frozenlist: 1.3.3 - fsspec: 2023.1.0 - ft-attention: 0.1 - fused-dense-lib: 0.0.0 - future: 0.18.3 - fvcore: 0.1.5.post20221221 - gast: 0.4.0 - gcs-oauth2-boto-plugin: 3.0 - gdown: 4.7.1 - gitdb: 4.0.10 - gitpython: 3.1.31 - google-apitools: 0.5.32 - google-auth: 2.16.2 - google-auth-oauthlib: 0.4.6 - google-reauth: 0.1.1 - gpustat: 1.1 - graphsurgeon: 0.4.6 - greenlet: 2.0.2 - grpcio: 1.51.3 - gsutil: 5.23 - h5py: 3.8.0 - heapdict: 1.0.1 - hjson: 3.1.0 - httplib2: 0.20.4 - huggingface-hub: 0.13.4 - hydra-colorlog: 1.2.0 - hydra-core: 1.3.2 - hydra-optuna-sweeper: 1.2.0 - hypothesis: 5.35.1 - idna: 3.4 - importlib-metadata: 6.0.0 - importlib-resources: 5.12.0 - iniconfig: 2.0.0 - intel-openmp: 2021.4.0 - iopath: 0.1.10 - ipdb: 0.13.13 - ipykernel: 6.21.3 - ipython: 8.11.0 - ipython-genutils: 0.2.0 - ipywidgets: 8.0.6 - jaraco.classes: 3.2.3 - jedi: 0.18.2 - jeepney: 0.8.0 - jinja2: 3.1.2 - joblib: 1.2.0 - json5: 0.9.11 - jsonschema: 4.17.3 - jupyter: 1.0.0 - jupyter-client: 8.0.3 - jupyter-console: 6.6.3 - jupyter-core: 5.2.0 - jupyter-tensorboard: 0.2.0 - jupyterlab: 2.3.2 - jupyterlab-pygments: 0.2.2 - jupyterlab-server: 1.2.0 - jupyterlab-widgets: 3.0.7 - jupytext: 1.14.5 - keopscore: 2.1.2 - keyring: 23.13.1 - kiwisolver: 1.4.4 - langcodes: 3.3.0 - librosa: 0.9.2 - lightning-utilities: 0.8.0 - lit: 15.0.7 - llvmlite: 0.39.1 - locket: 1.0.0 - mako: 1.2.4 - markdown: 3.4.1 - markdown-it-py: 2.2.0 - markupsafe: 2.1.2 - matplotlib: 3.7.0 - matplotlib-inline: 0.1.6 - mdit-py-plugins: 0.3.5 - mdurl: 0.1.2 - mistune: 2.0.5 - mkl: 2021.1.1 - mkl-devel: 2021.1.1 - mkl-include: 2021.1.1 - mlperf-logging: 2.1.0 - mock: 5.0.1 - monotonic: 1.6 - more-itertools: 9.1.0 - mpmath: 1.3.0 - msgpack: 1.0.4 - multidict: 6.0.4 - multiprocess: 0.70.14 - munch: 2.5.0 - murmurhash: 1.0.9 - nbclient: 0.7.2 - nbconvert: 7.2.10 - nbformat: 5.7.3 - nest-asyncio: 1.5.6 - networkx: 2.6.3 - ninja: 1.11.1 - notebook: 6.4.10 - numba: 0.56.4+1.g9a03de713 - numpy: 1.22.2 - nvidia-dali-cuda110: 1.23.0 - nvidia-ml-py: 11.525.112 - nvidia-pyindex: 1.0.9 - nvitop: 1.1.2 - nvtx: 0.2.5 - oauth2client: 4.1.3 - oauthlib: 3.2.2 - omegaconf: 2.3.0 - onnx: 1.13.0 - opencv: 4.6.0 - opt-einsum: 3.3.0 - optuna: 2.10.1 - packaging: 23.0 - pandas: 1.5.2 - pandocfilters: 1.5.0 - parso: 0.8.3 - partd: 1.3.0 - pathtools: 0.1.2 - pathy: 0.10.1 - pbr: 5.11.1 - pexpect: 4.8.0 - pickleshare: 0.7.5 - pillow: 9.2.0 - pip: 21.2.4 - pkginfo: 1.9.6 - pkgutil-resolve-name: 1.3.10 - platformdirs: 3.1.1 - pluggy: 1.0.0 - ply: 3.11 - polygraphy: 0.44.2 - pooch: 1.7.0 - portalocker: 2.7.0 - preshed: 3.0.8 - prettytable: 3.6.0 - prometheus-client: 0.16.0 - prompt-toolkit: 3.0.38 - protobuf: 3.20.3 - psutil: 5.9.4 - ptxcompiler: 0.7.0+27.gbcb4096 - ptyprocess: 0.7.0 - pure-eval: 0.2.2 - py-cpuinfo: 9.0.0 - pyarrow: 10.0.1.dev0+ga6eabc2b.d20230220 - pyasn1: 0.4.8 - pyasn1-modules: 0.2.8 - pybind11: 2.10.3 - pycocotools: 2.0+nv0.7.1 - pycparser: 2.21 - pydantic: 1.10.6 - pygments: 2.14.0 - pylibcugraph: 23.2.0 - pylibcugraphops: 23.2.0 - pylibraft: 23.2.0 - pynvml: 11.5.0 - pyopenssl: 23.1.1 - pyparsing: 3.0.9 - pyperclip: 1.8.2 - pyrootutils: 1.0.4 - pyrsistent: 0.19.3 - pysocks: 1.7.1 - pytest: 7.2.2 - pytest-rerunfailures: 11.1.2 - pytest-shard: 0.1.2 - pytest-xdist: 3.2.1 - python-dateutil: 2.8.2 - python-dotenv: 1.0.0 - python-hostlist: 1.23.0 - pytorch-fast-transformers: 0.4.0 - pytorch-lightning: 2.0.1.post0 - pytorch-quantization: 2.1.2 - pytz: 2022.7.1 - pyu2f: 0.1.5 - pyyaml: 6.0 - pyzmq: 25.0.1 - qtconsole: 5.4.2 - qtpy: 2.3.1 - raft-dask: 23.2.0 - readme-renderer: 37.3 - regex: 2022.10.31 - requests: 2.28.2 - requests-oauthlib: 1.3.1 - requests-toolbelt: 0.10.1 - resampy: 0.4.2 - responses: 0.18.0 - retry-decorator: 1.1.1 - rfc3986: 2.0.0 - rich: 13.3.4 - rmm: 23.2.0 - rotary-emb: 0.1 - rsa: 4.7.2 - scikit-learn: 1.2.0 - scipy: 1.6.3 - seaborn: 0.12.2 - secretstorage: 3.3.3 - send2trash: 1.8.0 - sentencepiece: 0.1.98 - sentry-sdk: 1.20.0 - setproctitle: 1.3.2 - setuptools: 65.5.1 - six: 1.16.0 - smart-open: 6.3.0 - smmap: 5.0.0 - sortedcontainers: 2.4.0 - soundfile: 0.12.1 - soupsieve: 2.4 - spacy: 3.5.1 - spacy-legacy: 3.0.12 - spacy-loggers: 1.0.4 - sphinx-glpi-theme: 0.3 - sqlalchemy: 2.0.10 - srsly: 2.4.6 - stack-data: 0.6.2 - stevedore: 5.0.0 - strings-udf: 23.2.0 - structured-kernels: 0.1.0 - sympy: 1.11.1 - tabulate: 0.9.0 - tbb: 2021.8.0 - tblib: 1.7.0 - tensorboard: 2.9.0 - tensorboard-data-server: 0.6.1 - tensorboard-plugin-wit: 1.8.1 - tensorrt: 8.5.3.1 - termcolor: 2.2.0 - terminado: 0.17.1 - thinc: 8.1.9 - threadpoolctl: 3.1.0 - thriftpy2: 0.4.16 - timm: 0.6.13 - tinycss2: 1.2.1 - tokenizers: 0.13.3 - toml: 0.10.2 - tomli: 2.0.1 - toolz: 0.12.0 - torch: 2.0.0a0+1767026 - torch-tensorrt: 1.4.0.dev0 - torchaudio: 2.0.1+3b40834 - torchmetrics: 0.11.4 - torchtext: 0.13.0a0+fae8e8c - torchvision: 0.15.0a0 - tornado: 6.2 - tqdm: 4.65.0 - traitlets: 5.9.0 - transformer-engine: 0.6.0 - transformers: 4.28.1 - treelite: 3.1.0 - treelite-runtime: 3.1.0 - triton: 2.0.0.dev20221202 - twine: 4.0.2 - typer: 0.7.0 - types-dataclasses: 0.6.6 - typing-extensions: 4.5.0 - ucx-py: 0.30.0 - uff: 0.6.9 - urllib3: 1.26.14 - wandb: 0.15.1 - wasabi: 1.1.1 - wcwidth: 0.2.6 - webencodings: 0.5.1 - werkzeug: 2.2.3 - wheel: 0.38.4 - widgetsnbextension: 4.0.7 - xdoctest: 1.0.2 - xentropy-cuda-lib: 0.1 - xxhash: 3.2.0 - yacs: 0.1.8 - yarl: 1.9.1 - zict: 2.2.0 - zipp: 3.14.0 - zstandard: 0.21.0
- System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.8.10 - version: #1 SMP Debian 4.19.269-1 (2022-12-20)
More info
No response
cc @awaelchli @morganmcg1 @borisdayma @scottire @parambharat
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 16 (6 by maintainers)
Okay, I finally tracked down the issue. It turned out I was passing in not a raw Python dictionary for the config but an omegaconf DictConfig object (https://omegaconf.readthedocs.io/en/2.1_branch/index.html). This is the dictionary object used by Hydra (https://hydra.cc/), but it seems that WandB doesn’t like it when you pass in this fancy dictionary object instead of a basic Python dictionary.
The solution is to convert the
config
to a Python dict before passing intoWandbLogger(config=config)
or before callingLightningModule.save_hyperparameters(config)
. In the case of Hydra one should callOmegaConf.to_container(config)
. Hopefully this issue helps other people who run into problems because I think Lightning + Hydra + WandB is a fairly common ML stack these days.@awaelchli Thanks for addressing this and making the PR, and sorry for assuming the issue was with Lightning when it turned out to be an unfortunate interaction between multiple libraries.
I think it is perhaps possible for the libraries to help alleviate these sorts of issues; for example
LightningModule.save_hyperparameters(config)
could convert the config from any Mapping type to a raw dictionary before passing into the logger.In fact reading through the Lightning code, it seems like
log_hyperparams()
https://github.com/Lightning-AI/lightning/blob/bd05aa96eddbfcb6f010228ec91ce09f1db4fd29/src/lightning/pytorch/loggers/wandb.py#L419 assumes the input is typeDict
, but I think this issue occurred because I was passing in aMapping
and Python doesn’t actually enforce the type checking. Perhaps it makes sense to handle the case whenparams
is aMapping
(like thelog_metrics()
method right below) and recursively convert it to a Dict in the_convert_params()
function? After all the docstring of_convert_params()
says “Ensure parameters are a dict or convert to dict if necessary.”But perhaps it’s just the responsibility of the user to make sure all the libraries are interacting properly
I have fixed my own issue after identifying the problem, but I do think that what you described would be more robust to potential related issues. As is, even the Hydra + Lightning combination is probably fairly common and users would all run into this non-obvious issue.
Thanks for showing the snippet and screenshot. Let me dig in some more to see if there is something strange going on with my setup.