pytorch-lightning: Logging with "self.log" in training_step does not create any outputs in progress bar or external Logger when loss isn't returned

🐛 Bug

I think the newly introduced log function function does not log properly while being used in the training_step. The same code in validation_step creates the desired results.

    def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_train", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        ##### Doesn't Work #######


    def validation_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("val_loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_val", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        ##### Works #######

Please reproduce using

https://gist.github.com/tobiascz/bb2c6de83263eb38181052840062b5ac

Expected behavior

Logs created in training_step should show up in the prog_bar and loggers (such as tensorboard logger). Same code in the validation_step creates the desired results.

Environment

  • CUDA:
    • GPU:
      • Tesla T4
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.18.5
    • pyTorch_debug: False
    • pyTorch_version: 1.6.0+cu101
    • pytorch-lightning: 0.10.0
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.6.9
    • version: #1 SMP Thu Jul 23 08:00:38 PDT 2020 In [ ]:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

Just a heads up for future people, there is a flag for Trainer, log_every_n_step is defaulted to be 50, so if you encounter similar issue as described as in this thread, try modify it.

Hello, I’m using pytorch-lightning 1.3.2, and facing a similar issue as well

  def training_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("train_loss", loss)
    self.log("train_label_loss", label_loss)
    self.log("train_score_loss", score_loss)
    return loss

  def validation_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("val_loss", loss)
    self.log("val_label_loss", label_loss)
    self.log("val_score_loss", score_loss)
    return loss

  def test_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("test_loss", loss)
    self.log("test_label_loss", label_loss)
    self.log("test_score_loss", score_loss)
    return loss

It only logged the output of val_loss, val_label_loss and val_score_loss to Tensorboard, the training and test loss values are no where to be seen

Thanks for pointing this out @itsikad. based on the comment above I am reopening the issue.

def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_train", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        ##### Doesn't Work #######
def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.log("loss", loss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        self.log("my_metric_train", 1001, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        return loss
        ##### Works #######

Expected behavior logging in the training_step should be independent of optimising the model e.g. returning a loss. Even if the training_step has some issues and does not return a loss, the logging should work as expected.

Hello, I’m using pytorch-lightning 1.3.2, and facing a similar issue as well

  def training_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("train_loss", loss)
    self.log("train_label_loss", label_loss)
    self.log("train_score_loss", score_loss)
    return loss

  def validation_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("val_loss", loss)
    self.log("val_label_loss", label_loss)
    self.log("val_score_loss", score_loss)
    return loss

  def test_step(self, batch, batch_idx):
    loss, label_loss, score_loss, pred_score_matrix, pred_label_matrix = self(*batch)
    self.log("test_loss", loss)
    self.log("test_label_loss", label_loss)
    self.log("test_score_loss", score_loss)
    return loss

It only logged the output of val_loss, val_label_loss and val_score_loss to Tensorboard, the training and test loss values are no where to be seen

I have the same issue, did you find any solution or workaround?

Hey @hecoding,

You might want to have a look at this: https://github.com/PyTorchLightning/pytorch-lightning/pull/4618

Best, T.C

Sometimes there’s no unambiguous way of returning a single loss tho, i.e. GAN training. What I’m doing to bypass the bug rn is this, hopefully there’s no impact on the optimization: (I’m using automatic_optimization=False btw)

    def training_step(self, batch, batch_idx, optimizer_idx):
        ...
        D_loss = ...
        G_loss = ...
        ...
        self.log(D, ...)
        self.log(G, ...)

        return torch.tensor(0)

Dear @tobiascz,

Thanks for noticing this wrong behaviour. I will look into it asap.

Best regards, Thomas Chaton.

@tobiascz I believe it isn’t a common issue, but from a design perspective, is it the desired behavior? I would expect the logs to work independently of whether the user returned the loss or not, however, a user warning/error should be raised in the case the training step returned None.