pytorch-lightning: AttributeError: '_ResultMetric' object has no attribute '_forward_pre_hooks'
Bug description
using self.log with torch compile results in failure. https://github.com/pytorch/pytorch/pull/103621 applied to mitigate “failed to reach fixed point” error with python 3.8
Pytorch lightning 2.0.5 used for the experiment
What version are you seeing the problem on?
v2.0
How to reproduce the bug
def test_compiled_model_to_log_metric_with_cpu(tmp_path):
class MyModel(BoringModel):
def training_step(self, batch, batch_idx):
loss = self.step(batch)
self.log("loss", loss)
return loss
model = MyModel()
compiled_model = torch.compile(model)
trainer = Trainer(
default_root_dir=tmp_path,
accelerator="cpu",
fast_dev_run=True,
devices=1,
enable_checkpointing=False,
enable_model_summary=False,
enable_progress_bar=False,
)
trainer.fit(compiled_model)
assert set(trainer.callback_metrics) == {"loss"}
Error messages and logs
> raise AttributeError("'{}' object has no attribute '{}'".format(
type(self).__name__, name))
E AttributeError: '_ResultMetric' object has no attribute '_forward_pre_hooks'
E
E from user code:
E File "/home/janand/anaconda3/envs/pylight/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 188, in __init__
E super().__init__()
E
E Set torch._dynamo.config.verbose=True for more information
E
E
E You can suppress this exception and fall back to eager by setting:
E torch._dynamo.config.suppress_errors = True
../../../../anaconda3/envs/pylight/lib/python3.8/site-packages/torch/nn/modules/module.py:1617: AttributeError
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (15 by maintainers)
I found the difference and extracted a raw PyTorch example:
torch==2.1.0.dev20230818+cu118We can report this to torch and see if it can be addressed. However, this use case of patching out setattr on all nn.Modules is quite exotic and probably not future-proof.
@jerome-habana my recommendation for you is to just make the setattr compatible with
torch.compileexplicitly by making it aware of the internal root module. For example, you could handle it like this in your setattr override:This way, you avoid calling print on the dynamo module which resulted in the repr issue.
I was able to reduce it to just a torchmetrics example where an equivalent error is raised (the cause of the same error we are seeing in Lightnings _ResultMetric that inherits from torchmetrics):
Traceback:
The instantiation of the metric in the middle of
forward()simulates what is happening in Lightning whenself.log()is called.During the execution of this, the torch-compiled code is calling setattr during the creation of that module. This then goes through the patched function here where
print(value)is evaluated, which in return callsrepr(value)which accesses attributes from that object that don’t exist yet, because the object is being created.This recursive dependency is the problem, but I don’t know how to solve this. The next step is to reduce this even further to a raw PyTorch example, and then check whether PyTorch can address this use case.
cc @carmocca for visibility
@Borda @carmocca Any updates to this issue ?