wandb: [Q]wandb: ERROR Failed to sample metric: Not Supported
In the official example:
Step3: Track gradients with wandb.watch() and everything else with wandb.log()
def train(model, loader, criterion, optimizer, config):
# Tell wandb to watch what the model gets up to: gradients, weights, and more!
wandb.watch(model, criterion, log="all", log_freq=10)
# Run training and track with wandb
total_batches = len(loader) * config.epochs
example_ct = 0 # number of examples seen
batch_ct = 0
for epoch in tqdm(range(config.epochs)):
for _, (images, labels) in enumerate(loader):
loss = train_batch(images, labels, model, optimizer, criterion)
example_ct += len(images)
batch_ct += 1
# Report metrics every 25th batch
if ((batch_ct + 1) % 25) == 0:
train_log(loss, example_ct, epoch)
def train_batch(images, labels, model, optimizer, criterion):
images, labels = images.to(device), labels.to(device)
# Forward pass ➡
outputs = model(images)
loss = criterion(outputs, labels)
# Backward pass ⬅
optimizer.zero_grad()
loss.backward()
# Step with optimizer
optimizer.step()
return loss
-
Define
train_log()def train_log(loss, example_ct, epoch): # Where the magic happens wandb.log({"epoch": epoch, "loss": loss}, step=example_ct) print(f"Loss after {str(example_ct).zfill(5)} examples: {loss:.3f}")
And here is my code:
(most of the other code is the same as the example)
def train(model,train_dataloader,loss_function,optimizer,config,device):
model.train()
#tell wandb to watch the model
wandb.watch(model,loss_function,log="all",log_freq=10)
total_batches = len(train_dataloader)*config.epochs
example_count = 0
batch_count = 0
print(f"Use {device} to train!")
for epoch in tqdm(range(config.epochs)):
for _,(images,labels) in enumerate(train_dataloader):
images,labels = images.to(device),labels.to(device)
outputs = model(images)
loss = loss_function(outputs,labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
example_count += len(images)
batch_count += 1
#report metrics every 25 batch
if ((batch_count+1)%25==0):
wandb.log({"epoch":epoch,"loss":loss},step = example_count)
print(f"epoch{epoch+1} loss:{loss}")
print("TRAIN FINISHED!")
I want to use wandb.wath() to track the gradients and other parameters, but there are only “loss”,”accuracy” and “epoch” in my dashboard.
Everything else goes well except for tracking parameters. I dont know if “wandb: ERROR Failed to sample metric: Not Supported” is the root problem of the failing to track parameters such as gradients, weights.
By the way, when I use wandb without wandb.watch(), the “wandb: ERROR Failed to sample metric: Not Supported” also exists. But it seems won’t influence wandb.log().
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 21 (5 by maintainers)
It works well with
wandb==0.13.4on Windows.@ShaohanTian @wangerlie does this issue only occur for you in the most recent versions, would it be possible that you check with 0.13.4?
Hi @ewth it appears that you get a slightly different error
wandb: ERROR Failed to serialize metric: division by zero. Any chance you’re usingdefine_metricand could you check if you’re dividing by zero during your training?