wandb: Runs returned by `wandb.Api` contain duplicates not present on the web app

Describe the bug

I want to do some analysis of a large group of runs, but the runs returned contained duplicates instead of some of the runs

import wandb
import pandas as pd
import numpy as np

api = wandb.Api()
filter_dict = {
    "group": "fnornn-test-seeds-40hz-fixed",
}
runs = api.runs("iir-modal/physmodjax", filter_dict)
print(len(runs))  #18962, same as in web app
data = []
for run in runs:
    data.append(
        {
            "seed": run.config["seed"],
            "num_steps_train": run.config["datamodule"]["num_steps_train"],
            "train_loss": run.summary["train loss"],
            "run_name": run.name,
        },
    )
print(len(data))  #18962
df = pd.DataFrame(data)
df_indx = df.set_index(["seed", "num_steps_train"])
indx_list = np.where(df_indx.index.duplicated()) 
print(len(indx_list[0])) # 87, this is bad

indx_dup = df_indx.index[indx_list]
for i in range(len(indx_dup)):
    print(df_indx.loc[indx_dup[i]])

image

This runs are not duplicated in the web app, for example image

When looking at the runs,

print(runs[100])
print(runs[99])
<Run iir-modal/physmodjax/4kp6z7va (finished)>
<Run iir-modal/physmodjax/4kp6z7va (finished)>

There is nothing special about either the duplicated runs, or the 87 that are missing in their place, except that the indexes are suspicioulsly round in general, and the pairs of duplicates are all close to each other:

for ind in indx_list[0]:
    # print(df.loc[ind])
    ind_rep = df.loc[df["run_name"] == df.loc[ind, "run_name"]].index
    print(
        f"runs[{ind_rep[0]}]: {runs[ind_rep[0]]} and runs[{ind_rep[1]}]: {runs[ind_rep[1]]}"
    )
runs[99]: <Run iir-modal/physmodjax/4kp6z7va (finished)> and runs[100]: <Run iir-modal/physmodjax/4kp6z7va (finished)>
runs[449]: <Run iir-modal/physmodjax/vhcasx2u (finished)> and runs[450]: <Run iir-modal/physmodjax/vhcasx2u (finished)>
runs[497]: <Run iir-modal/physmodjax/z6k0i9jq (finished)> and runs[500]: <Run iir-modal/physmodjax/z6k0i9jq (finished)>
runs[499]: <Run iir-modal/physmodjax/ujk0lwhk (finished)> and runs[501]: <Run iir-modal/physmodjax/ujk0lwhk (finished)>
runs[598]: <Run iir-modal/physmodjax/lrug2iut (finished)> and runs[600]: <Run iir-modal/physmodjax/lrug2iut (finished)>
runs[699]: <Run iir-modal/physmodjax/6ztlfdqu (finished)> and runs[701]: <Run iir-modal/physmodjax/6ztlfdqu (finished)>
runs[747]: <Run iir-modal/physmodjax/m0f8fyq3 (finished)> and runs[750]: <Run iir-modal/physmodjax/m0f8fyq3 (finished)>
runs[749]: <Run iir-modal/physmodjax/rahnut8i (finished)> and runs[751]: <Run iir-modal/physmodjax/rahnut8i (finished)>
runs[798]: <Run iir-modal/physmodjax/asbcv093 (finished)> and runs[800]: <Run iir-modal/physmodjax/asbcv093 (finished)>
runs[849]: <Run iir-modal/physmodjax/9e478eit (finished)> and runs[850]: <Run iir-modal/physmodjax/9e478eit (finished)>
runs[949]: <Run iir-modal/physmodjax/v2a4vxbh (finished)> and runs[951]: <Run iir-modal/physmodjax/v2a4vxbh (finished)>
runs[997]: <Run iir-modal/physmodjax/4ltf1ku8 (finished)> and runs[1000]: <Run iir-modal/physmodjax/4ltf1ku8 (finished)>
runs[1148]: <Run iir-modal/physmodjax/7rfifmar (finished)> and runs[1150]: <Run iir-modal/physmodjax/7rfifmar (finished)>
runs[1449]: <Run iir-modal/physmodjax/43vznps1 (finished)> and runs[1450]: <Run iir-modal/physmodjax/43vznps1 (finished)>
runs[1499]: <Run iir-modal/physmodjax/i720ra6r (finished)> and runs[1500]: <Run iir-modal/physmodjax/i720ra6r (finished)>
runs[1599]: <Run iir-modal/physmodjax/2enq3lb8 (finished)> and runs[1600]: <Run iir-modal/physmodjax/2enq3lb8 (finished)>
runs[1749]: <Run iir-modal/physmodjax/1xdoe710 (finished)> and runs[1751]: <Run iir-modal/physmodjax/1xdoe710 (finished)>
runs[1999]: <Run iir-modal/physmodjax/fkfayt36 (finished)> and runs[2000]: <Run iir-modal/physmodjax/fkfayt36 (finished)>
runs[2099]: <Run iir-modal/physmodjax/be1hpp9n (finished)> and runs[2100]: <Run iir-modal/physmodjax/be1hpp9n (finished)>
runs[2148]: <Run iir-modal/physmodjax/hjidwkdm (finished)> and runs[2151]: <Run iir-modal/physmodjax/hjidwkdm (finished)>
runs[2199]: <Run iir-modal/physmodjax/aj4sm9r0 (finished)> and runs[2202]: <Run iir-modal/physmodjax/aj4sm9r0 (finished)>
runs[2249]: <Run iir-modal/physmodjax/5tvbwwbf (finished)> and runs[2251]: <Run iir-modal/physmodjax/5tvbwwbf (finished)>
runs[2298]: <Run iir-modal/physmodjax/en65g0ld (finished)> and runs[2301]: <Run iir-modal/physmodjax/en65g0ld (finished)>
runs[2399]: <Run iir-modal/physmodjax/nvtpml5b (finished)> and runs[2400]: <Run iir-modal/physmodjax/nvtpml5b (finished)>
runs[2398]: <Run iir-modal/physmodjax/sxsee24x (finished)> and runs[2401]: <Run iir-modal/physmodjax/sxsee24x (finished)>
runs[2498]: <Run iir-modal/physmodjax/sy9i81b8 (finished)> and runs[2500]: <Run iir-modal/physmodjax/sy9i81b8 (finished)>
runs[2699]: <Run iir-modal/physmodjax/q2dzsmnq (finished)> and runs[2701]: <Run iir-modal/physmodjax/q2dzsmnq (finished)>
runs[2749]: <Run iir-modal/physmodjax/ngizfdbh (finished)> and runs[2750]: <Run iir-modal/physmodjax/ngizfdbh (finished)>
runs[2799]: <Run iir-modal/physmodjax/gvkigc5y (finished)> and runs[2800]: <Run iir-modal/physmodjax/gvkigc5y (finished)>
runs[2998]: <Run iir-modal/physmodjax/sk37dspl (finished)> and runs[3000]: <Run iir-modal/physmodjax/sk37dspl (finished)>
runs[3349]: <Run iir-modal/physmodjax/q6o6im9c (finished)> and runs[3350]: <Run iir-modal/physmodjax/q6o6im9c (finished)>
runs[3449]: <Run iir-modal/physmodjax/0q9001ir (finished)> and runs[3450]: <Run iir-modal/physmodjax/0q9001ir (finished)>
runs[3549]: <Run iir-modal/physmodjax/iyh34hh0 (finished)> and runs[3550]: <Run iir-modal/physmodjax/iyh34hh0 (finished)>
runs[3699]: <Run iir-modal/physmodjax/9v289xww (finished)> and runs[3701]: <Run iir-modal/physmodjax/9v289xww (finished)>
runs[3897]: <Run iir-modal/physmodjax/fy3vxfmi (finished)> and runs[3900]: <Run iir-modal/physmodjax/fy3vxfmi (finished)>
runs[4099]: <Run iir-modal/physmodjax/ysn4j48x (finished)> and runs[4100]: <Run iir-modal/physmodjax/ysn4j48x (finished)>
runs[4399]: <Run iir-modal/physmodjax/8yp0g96x (finished)> and runs[4401]: <Run iir-modal/physmodjax/8yp0g96x (finished)>
runs[4499]: <Run iir-modal/physmodjax/cir7zz23 (finished)> and runs[4500]: <Run iir-modal/physmodjax/cir7zz23 (finished)>
runs[4749]: <Run iir-modal/physmodjax/yrj5xuc6 (finished)> and runs[4750]: <Run iir-modal/physmodjax/yrj5xuc6 (finished)>
runs[4849]: <Run iir-modal/physmodjax/d1vnuoyu (finished)> and runs[4850]: <Run iir-modal/physmodjax/d1vnuoyu (finished)>
runs[4949]: <Run iir-modal/physmodjax/nyv06wht (finished)> and runs[4950]: <Run iir-modal/physmodjax/nyv06wht (finished)>
runs[4998]: <Run iir-modal/physmodjax/md7dxhfp (finished)> and runs[5001]: <Run iir-modal/physmodjax/md7dxhfp (finished)>
runs[5048]: <Run iir-modal/physmodjax/sx1829pq (finished)> and runs[5050]: <Run iir-modal/physmodjax/sx1829pq (finished)>
runs[5299]: <Run iir-modal/physmodjax/zse5hsqg (finished)> and runs[5300]: <Run iir-modal/physmodjax/zse5hsqg (finished)>
runs[5349]: <Run iir-modal/physmodjax/jgsfd6kn (finished)> and runs[5351]: <Run iir-modal/physmodjax/jgsfd6kn (finished)>
runs[5398]: <Run iir-modal/physmodjax/qvy2wosy (finished)> and runs[5400]: <Run iir-modal/physmodjax/qvy2wosy (finished)>
runs[5648]: <Run iir-modal/physmodjax/h0u79suh (finished)> and runs[5651]: <Run iir-modal/physmodjax/h0u79suh (finished)>
runs[5999]: <Run iir-modal/physmodjax/skksbe5k (finished)> and runs[6000]: <Run iir-modal/physmodjax/skksbe5k (finished)>
runs[5998]: <Run iir-modal/physmodjax/boxd5i0q (finished)> and runs[6001]: <Run iir-modal/physmodjax/boxd5i0q (finished)>
runs[6099]: <Run iir-modal/physmodjax/mwfq41nm (finished)> and runs[6100]: <Run iir-modal/physmodjax/mwfq41nm (finished)>
runs[6098]: <Run iir-modal/physmodjax/uiinqnl5 (finished)> and runs[6104]: <Run iir-modal/physmodjax/uiinqnl5 (finished)>
runs[6249]: <Run iir-modal/physmodjax/x2xi2pln (finished)> and runs[6251]: <Run iir-modal/physmodjax/x2xi2pln (finished)>
runs[6399]: <Run iir-modal/physmodjax/7a977w1z (finished)> and runs[6400]: <Run iir-modal/physmodjax/7a977w1z (finished)>
runs[6397]: <Run iir-modal/physmodjax/gbkm4c81 (finished)> and runs[6401]: <Run iir-modal/physmodjax/gbkm4c81 (finished)>
runs[6398]: <Run iir-modal/physmodjax/y2o8y9bz (finished)> and runs[6402]: <Run iir-modal/physmodjax/y2o8y9bz (finished)>
runs[6448]: <Run iir-modal/physmodjax/sahtepl9 (finished)> and runs[6452]: <Run iir-modal/physmodjax/sahtepl9 (finished)>
runs[6649]: <Run iir-modal/physmodjax/6z6vk03m (finished)> and runs[6650]: <Run iir-modal/physmodjax/6z6vk03m (finished)>
runs[6749]: <Run iir-modal/physmodjax/yr89n9w8 (finished)> and runs[6751]: <Run iir-modal/physmodjax/yr89n9w8 (finished)>
runs[6798]: <Run iir-modal/physmodjax/v1rpos7j (finished)> and runs[6800]: <Run iir-modal/physmodjax/v1rpos7j (finished)>
runs[6849]: <Run iir-modal/physmodjax/aqj6ys44 (finished)> and runs[6851]: <Run iir-modal/physmodjax/aqj6ys44 (finished)>
runs[6896]: <Run iir-modal/physmodjax/gkq37u0d (finished)> and runs[6900]: <Run iir-modal/physmodjax/gkq37u0d (finished)>
runs[6949]: <Run iir-modal/physmodjax/48c43ae6 (finished)> and runs[6951]: <Run iir-modal/physmodjax/48c43ae6 (finished)>
runs[6999]: <Run iir-modal/physmodjax/g6xipn8w (finished)> and runs[7000]: <Run iir-modal/physmodjax/g6xipn8w (finished)>
runs[7048]: <Run iir-modal/physmodjax/idtyf4u8 (finished)> and runs[7050]: <Run iir-modal/physmodjax/idtyf4u8 (finished)>
runs[7149]: <Run iir-modal/physmodjax/rfrg4lve (finished)> and runs[7150]: <Run iir-modal/physmodjax/rfrg4lve (finished)>
runs[7297]: <Run iir-modal/physmodjax/hga92bca (finished)> and runs[7300]: <Run iir-modal/physmodjax/hga92bca (finished)>
runs[7496]: <Run iir-modal/physmodjax/4mt7w6co (finished)> and runs[7500]: <Run iir-modal/physmodjax/4mt7w6co (finished)>
runs[7497]: <Run iir-modal/physmodjax/gm7quxia (finished)> and runs[7503]: <Run iir-modal/physmodjax/gm7quxia (finished)>
runs[7499]: <Run iir-modal/physmodjax/tvo6qipq (finished)> and runs[7506]: <Run iir-modal/physmodjax/tvo6qipq (finished)>
runs[7549]: <Run iir-modal/physmodjax/yw17kf45 (finished)> and runs[7550]: <Run iir-modal/physmodjax/yw17kf45 (finished)>
runs[7599]: <Run iir-modal/physmodjax/4zggqle2 (finished)> and runs[7603]: <Run iir-modal/physmodjax/4zggqle2 (finished)>
runs[8099]: <Run iir-modal/physmodjax/btfalbbj (finished)> and runs[8100]: <Run iir-modal/physmodjax/btfalbbj (finished)>
runs[8149]: <Run iir-modal/physmodjax/cdstls5y (finished)> and runs[8150]: <Run iir-modal/physmodjax/cdstls5y (finished)>
runs[8249]: <Run iir-modal/physmodjax/mgwt8lrf (finished)> and runs[8250]: <Run iir-modal/physmodjax/mgwt8lrf (finished)>
runs[8298]: <Run iir-modal/physmodjax/wuxyy5xe (finished)> and runs[8300]: <Run iir-modal/physmodjax/wuxyy5xe (finished)>
runs[8349]: <Run iir-modal/physmodjax/xanfgzqu (finished)> and runs[8351]: <Run iir-modal/physmodjax/xanfgzqu (finished)>
runs[8348]: <Run iir-modal/physmodjax/2gvnxbtb (finished)> and runs[8352]: <Run iir-modal/physmodjax/2gvnxbtb (finished)>
runs[8346]: <Run iir-modal/physmodjax/pm61152y (finished)> and runs[8353]: <Run iir-modal/physmodjax/pm61152y (finished)>
runs[8398]: <Run iir-modal/physmodjax/ax4djvc6 (finished)> and runs[8402]: <Run iir-modal/physmodjax/ax4djvc6 (finished)>
runs[8443]: <Run iir-modal/physmodjax/9b1oj8vf (finished)> and runs[8450]: <Run iir-modal/physmodjax/9b1oj8vf (finished)>
runs[8598]: <Run iir-modal/physmodjax/abnwba4x (finished)> and runs[8600]: <Run iir-modal/physmodjax/abnwba4x (finished)>
runs[8897]: <Run iir-modal/physmodjax/5o3frsvu (finished)> and runs[8900]: <Run iir-modal/physmodjax/5o3frsvu (finished)>
runs[9099]: <Run iir-modal/physmodjax/s2k8e9a7 (finished)> and runs[9100]: <Run iir-modal/physmodjax/s2k8e9a7 (finished)>
runs[9149]: <Run iir-modal/physmodjax/vuyhs0ce (finished)> and runs[9150]: <Run iir-modal/physmodjax/vuyhs0ce (finished)>
runs[9199]: <Run iir-modal/physmodjax/p5ogebi3 (finished)> and runs[9200]: <Run iir-modal/physmodjax/p5ogebi3 (finished)>
runs[9248]: <Run iir-modal/physmodjax/fns5zgqp (finished)> and runs[9250]: <Run iir-modal/physmodjax/fns5zgqp (finished)>
runs[9299]: <Run iir-modal/physmodjax/zggfogud (finished)> and runs[9301]: <Run iir-modal/physmodjax/zggfogud (finished)>

This behaviour is consistent even after api.flush() and updating to 0.16.0

There is nothing recorded in /tmp/debug-cli.carlos.log. I don’t know if the wandb.Api object keeps a log anywhere, haven’t been able to find anything in the docs.

Additional Files

No response

Environment

WandB version: 0.15.12 and 0.16.0

OS: Distributor ID: Ubuntu Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal

Python version: 3.9.18

Versions of relevant libraries: pandas==2.1.1

Additional Context

No response

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 19 (8 by maintainers)

Most upvoted comments

Hi @cdelavegamartin thanks so much for the additional information here, this is very helpful. It seems the error to be with the API returning paginated results, rather that with our backend - as the runs show up as unique in the App side. I am logging in a project thousands of dummy runs to try and reproduce this behavior - and I will let you know if there’s any other information needed on your side, but this should suffice for now. Thanks once again, and I will keep you updated here!