wandb: [App] Table not updating it at each call of log

I have a code for which I want to log some metadata at each step. I thought I could log this metadata in a Table, unfortunately the table is not updated on the app after the first call to log.

My code is something like

metadata = [["I love my phone", "1", "1"]]
table = wandb.Table(data=metadata, columns=["Text", "Predicted Label", "True Label"])
wandb.log({"metadata": table})

run = wandb.init(**config)
with run:
    for step in steps:
        ## some code
        table.add_data(*metadata)
        run.log({"examples": table})

On the app, I can only see the initial table with data no rows are added to it. Note that the corresponding artifact is also not updated. Maybe this is due to wandb assigning the same identity to initial and updated table and not logging it again?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 10
  • Comments: 48 (4 by maintainers)

Most upvoted comments

First of all, I’m sorry on behalf of W&B for not updating this thread in a while.

There are a few options here.

Use W&B History: If you’re just logging text data, not rich media like images, you can use the typical wandb.log({'predicted': 'hello', 'label': 'world'}) API and then use the UI to get a W&B Table.

  • In a workspace, click “Add Panel”
  • Add a Weave panel by clicking Weave
  • Type runs.history.concat to configure it to concatenate all of your history logs
  • You can then configure the columns shown by clicking columns in the bottom right of the Table This will function as a Table that can continuously log to, regardless of from which run.

See Tables Update: Create a new wandb.Table object each iteration and log it to the same key with wandb.log. This should update in the UI as you train.

Compare 2 Tables: If you want to compare logged Tables:

  • Navigate to the Artifacts tab of your project
  • Click an Artifact
  • Click File
  • Click <table_name>.table.json That should show up your Table.

Now you can click compare with a different version of your artifact. You can either concatenate the two Tables so the table shown will have all the rows from both Tables, or join them using one of the columns, so all other columns are merged.

Concatenate Tables: Note: Advanced & may be limited by performance depending on the size of your Table You can create a Weave panel in your workspace and use runs[0].loggedArtifactVersions.map((row, index) => row.file("<table_key>.table.json")) where <table_key> is the key you’ve logged the table to. This will create one big table with all of the rows from your logged Tables (make sure to create a new Table object every iteration). This is using the Weave query language which is still in active development and may change in the future.

Hello! I am also experiencing the same issue. In the first log call, the table is logged, but any future log calls with an updated table (updated via add_data) does not display the updated data in the UI. I also do not find any of the updated rows in the artifacts section.

I am also experiencing this issue.

faced with the same issue, would be nice to have a fix

I am surprised this is still not fixed over a year after being raised.

The docs here are quite misleading, as they give the impression that it’s possible to incrementally log text during training. However, it’s not possible to do that with weights and biases at the moment. Having to set up the workarounds for this adds an annoying amount of friction when setting up WandB logging for training with LLMs.

+1 I am also facing this issue, for now I am defaulting to logging the tables after training.

+1 on this

A work around I’ve just found is creating a new instance of a Table which contains the same columns and data as the old one, and logging that.

new_table = wandb.Table(
    columns=self.wandb_table.columns, data=self.wandb_table.data
)
self.wandb.log({"predictions": new_table}, commit=False)

Before doing this, it would only create a single json for the table in the RUN_DIR/files/media/table folder, but now it’s making one for each time this command is run.

Maybe this can help narrowing down the problem?

Still facing the same issue.

I suffered from this problem too. Thanks to @amitkparekh’s solution, I can get it to work. I made a small tweak tho: I just log a shallow copy of the table:

from copy import copy
...

my_table.add_data(...)
wandb.log({"my_table": copy(my_table)})

I experienced the same problem.

It turns out that the tables are being updated when I look at the artifacts section - there is a new version under the artifact, and I can view the table by going to Files --> {path_to_table} --> {table_name.json}.

But for some reason it is not updating in the UI.

Same here

Any updates on this 6 months later?

So after the first log, any future logs do not do anything. After the finish, it still does not update.

It’s like for the given instance of table, it will only ever be uploaded once. That’s why my workaround solution was to continue adding rows to the table as described in the help, but I log a brand new instantiation of Table using the columns and data from the old one.

A work around I’ve just found is creating a new instance of a Table which contains the same columns and data as the old one, and logging that.

new_table = wandb.Table(
    columns=self.wandb_table.columns, data=self.wandb_table.data
)
self.wandb.log({"predictions": new_table}, commit=False)

Before doing this, it would only create a single json for the table in the RUN_DIR/files/media/table folder, but now it’s making one for each time this command is run.

Maybe this can help narrowing down the problem?

As of March 14, 2024, this is THE MOST correct trick.

Although this trick seems to require repeated copies of memory and writes to files. But believe me, it’s the best solution “right now”.

Also facing this issue.

This is still a problem. And same with confusion matrices.

Still same issue. Why is it not resolved yet?

Following - i’m facing this same error

Hi! We are still working on this issue - this requires a large refactor to our logic of how we handle tables due to which this is something which has not been implemented yet. I’ll post back here once I have some some good news about the progress of this feature!

Hi @amitkparekh is this one of tables that weren’t updating properly?

Sorry, no it isn’t the same tables.

On my own project, I’ve been having the same issue where after logging an instance of wandb.Table once, all future “wandb.log’s” are ignored.

I am following the same logic outlined above by @oumarkaba: creating a table at the start, then adding data and logging it each epoch.

It seems that after the first time you log a table, logging that same instance of the table, with or without any data changes does nothing. Looking within the local files for the run, no new table artifacts are created after the first logging, so it might be something in the client-side code preventing it from being updated?

Using the workaround, where I just log a newly created wandb.Table instance using the same columns and data from the table within the state, it works properly — creating the local table artifact files and updating the dashboard as expected.


For completeness, I am using allennlp and have extended their WandBCallback. This is the full module of the callback I am using.

  • I’m creating the wandb.Table instance on start
  • I update the table at the end of each epoch
  • I create a new instance of the table and log the new instance, just as in the the above workaround, and then discard the new instance to save on memory.
import logging
from typing import Any

import wandb
from allennlp.training import GradientDescentTrainer
from allennlp.training.callbacks import TrainerCallback, WandBCallback


logger = logging.getLogger(__name__)


def get_training_stage_from_key(key: str) -> str:
    if key.startswith("training"):
        return "training"
    if key.startswith("validation"):
        return "validation"
    return "unknown"


@TrainerCallback.register("alt_wandb")
class AltWandBCallback(WandBCallback):
    def on_start(
        self,
        trainer: GradientDescentTrainer,
        is_primary: bool = True,
        **kwargs: Any,
    ) -> None:
        if not is_primary:
            return None

        super().on_start(trainer, is_primary, **kwargs)

        # Create prediction table
        columns = ["epoch", "stage", "task", "prediction", "target"]
        self.wandb_table = wandb.Table(columns=columns)

    def on_epoch(
        self,
        trainer: GradientDescentTrainer,
        metrics: dict[str, Any],
        epoch: int,
        is_primary: bool = True,
        **kwargs: Any,
    ) -> None:
        if not is_primary:
            return None

        self._log_predictions_table(metrics, epoch)

        filtered_metrics = {
            name: metric for name, metric in metrics.items() if "__" not in name
        }

        super().on_epoch(trainer, filtered_metrics, epoch, is_primary, **kwargs)

    def _log_predictions_table(self, metrics: dict[str, Any], epoch: int) -> None:
        prediction_metric_keys = {"training__predictions", "validation__predictions"}

        for key in prediction_metric_keys:
            if key not in metrics.keys():
                continue

            stage = get_training_stage_from_key(key)

            for task_prediction in metrics[key].values():
                logger.info(f"Adding data for {task_prediction}")
                self.wandb_table.add_data(epoch, stage, *task_prediction)

        new_table = wandb.Table(
            columns=self.wandb_table.columns, data=self.wandb_table.data
        )

        self.wandb.log({"predictions": new_table}, commit=False)