pytorch-lightning: Logging with "self.log" in training_step does not create any outputs in progress bar or external Logger when loss isn't returned

🐛 Bug

I think the newly introduced log function function does not log properly while being used in the training_step. The same code in validation_step creates the desired results.

    def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_train", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        ##### Doesn't Work #######


    def validation_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("val_loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_val", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        ##### Works #######

Please reproduce using

https://gist.github.com/tobiascz/bb2c6de83263eb38181052840062b5ac

Expected behavior

Logs created in training_step should show up in the prog_bar and loggers (such as tensorboard logger). Same code in the validation_step creates the desired results.

Environment

CUDA:
- GPU:
  - Tesla T4
- available: True
- version: 10.1
Packages:
- numpy: 1.18.5
- pyTorch_debug: False
- pyTorch_version: 1.6.0+cu101
- pytorch-lightning: 0.10.0
- tqdm: 4.41.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.6.9
- version: #1 SMP Thu Jul 23 08:00:38 PDT 2020 In [ ]:

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 17 (7 by maintainers)

Most upvoted comments

Just a heads up for future people, there is a flag for Trainer, log_every_n_step is defaulted to be 50, so if you encounter similar issue as described as in this thread, try modify it.

+17

sunxd3 on Dec 22, 2021

Hello, I’m using pytorch-lightning 1.3.2, and facing a similar issue as well

  def training_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("train_loss", loss)
    self.log("train_label_loss", label_loss)
    self.log("train_score_loss", score_loss)
    return loss

  def validation_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("val_loss", loss)
    self.log("val_label_loss", label_loss)
    self.log("val_score_loss", score_loss)
    return loss

  def test_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("test_loss", loss)
    self.log("test_label_loss", label_loss)
    self.log("test_score_loss", score_loss)
    return loss

It only logged the output of val_loss, val_label_loss and val_score_loss to Tensorboard, the training and test loss values are no where to be seen

CaoHoangTung on May 21, 2021

Thanks for pointing this out @itsikad. based on the comment above I am reopening the issue.

def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_train", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        ##### Doesn't Work #######

def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_train", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        return loss
        ##### Works #######

Expected behavior logging in the training_step should be independent of optimising the model e.g. returning a loss. Even if the training_step has some issues and does not return a loss, the logging should work as expected.

tobiascz on Nov 3, 2020

Hello, I’m using pytorch-lightning 1.3.2, and facing a similar issue as well

  def training_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("train_loss", loss)
    self.log("train_label_loss", label_loss)
    self.log("train_score_loss", score_loss)
    return loss

  def validation_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("val_loss", loss)
    self.log("val_label_loss", label_loss)
    self.log("val_score_loss", score_loss)
    return loss

  def test_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("test_loss", loss)
    self.log("test_label_loss", label_loss)
    self.log("test_score_loss", score_loss)
    return loss

It only logged the output of val_loss, val_label_loss and val_score_loss to Tensorboard, the training and test loss values are no where to be seen

I have the same issue, did you find any solution or workaround?

183amir on Aug 27, 2021

Hey @hecoding,

You might want to have a look at this: https://github.com/PyTorchLightning/pytorch-lightning/pull/4618

Best, T.C

tchaton on Nov 11, 2020

Sometimes there’s no unambiguous way of returning a single loss tho, i.e. GAN training. What I’m doing to bypass the bug rn is this, hopefully there’s no impact on the optimization: (I’m using automatic_optimization=False btw)

    def training_step(self, batch, batch_idx, optimizer_idx):
        ...
        D_loss = ...
        G_loss = ...
        ...
        self.log(D, ...)
        self.log(G, ...)

        return torch.tensor(0)

hecoding on Nov 11, 2020

Dear @tobiascz,

Thanks for noticing this wrong behaviour. I will look into it asap.

Best regards, Thomas Chaton.

tchaton on Nov 11, 2020

@tobiascz I believe it isn’t a common issue, but from a design perspective, is it the desired behavior? I would expect the logs to work independently of whether the user returned the loss or not, however, a user warning/error should be raised in the case the training step returned None.

itsikad on Nov 3, 2020