wandb: Runs returned by `wandb.Api` contain duplicates not present on the web app
Describe the bug
I want to do some analysis of a large group of runs, but the runs returned contained duplicates instead of some of the runs
import wandb
import pandas as pd
import numpy as np
api = wandb.Api()
filter_dict = {
"group": "fnornn-test-seeds-40hz-fixed",
}
runs = api.runs("iir-modal/physmodjax", filter_dict)
print(len(runs)) #18962, same as in web app
data = []
for run in runs:
data.append(
{
"seed": run.config["seed"],
"num_steps_train": run.config["datamodule"]["num_steps_train"],
"train_loss": run.summary["train loss"],
"run_name": run.name,
},
)
print(len(data)) #18962
df = pd.DataFrame(data)
df_indx = df.set_index(["seed", "num_steps_train"])
indx_list = np.where(df_indx.index.duplicated())
print(len(indx_list[0])) # 87, this is bad
indx_dup = df_indx.index[indx_list]
for i in range(len(indx_dup)):
print(df_indx.loc[indx_dup[i]])
This runs are not duplicated in the web app, for example
When looking at the runs,
print(runs[100])
print(runs[99])
<Run iir-modal/physmodjax/4kp6z7va (finished)>
<Run iir-modal/physmodjax/4kp6z7va (finished)>
There is nothing special about either the duplicated runs, or the 87 that are missing in their place, except that the indexes are suspicioulsly round in general, and the pairs of duplicates are all close to each other:
for ind in indx_list[0]:
# print(df.loc[ind])
ind_rep = df.loc[df["run_name"] == df.loc[ind, "run_name"]].index
print(
f"runs[{ind_rep[0]}]: {runs[ind_rep[0]]} and runs[{ind_rep[1]}]: {runs[ind_rep[1]]}"
)
runs[99]: <Run iir-modal/physmodjax/4kp6z7va (finished)> and runs[100]: <Run iir-modal/physmodjax/4kp6z7va (finished)>
runs[449]: <Run iir-modal/physmodjax/vhcasx2u (finished)> and runs[450]: <Run iir-modal/physmodjax/vhcasx2u (finished)>
runs[497]: <Run iir-modal/physmodjax/z6k0i9jq (finished)> and runs[500]: <Run iir-modal/physmodjax/z6k0i9jq (finished)>
runs[499]: <Run iir-modal/physmodjax/ujk0lwhk (finished)> and runs[501]: <Run iir-modal/physmodjax/ujk0lwhk (finished)>
runs[598]: <Run iir-modal/physmodjax/lrug2iut (finished)> and runs[600]: <Run iir-modal/physmodjax/lrug2iut (finished)>
runs[699]: <Run iir-modal/physmodjax/6ztlfdqu (finished)> and runs[701]: <Run iir-modal/physmodjax/6ztlfdqu (finished)>
runs[747]: <Run iir-modal/physmodjax/m0f8fyq3 (finished)> and runs[750]: <Run iir-modal/physmodjax/m0f8fyq3 (finished)>
runs[749]: <Run iir-modal/physmodjax/rahnut8i (finished)> and runs[751]: <Run iir-modal/physmodjax/rahnut8i (finished)>
runs[798]: <Run iir-modal/physmodjax/asbcv093 (finished)> and runs[800]: <Run iir-modal/physmodjax/asbcv093 (finished)>
runs[849]: <Run iir-modal/physmodjax/9e478eit (finished)> and runs[850]: <Run iir-modal/physmodjax/9e478eit (finished)>
runs[949]: <Run iir-modal/physmodjax/v2a4vxbh (finished)> and runs[951]: <Run iir-modal/physmodjax/v2a4vxbh (finished)>
runs[997]: <Run iir-modal/physmodjax/4ltf1ku8 (finished)> and runs[1000]: <Run iir-modal/physmodjax/4ltf1ku8 (finished)>
runs[1148]: <Run iir-modal/physmodjax/7rfifmar (finished)> and runs[1150]: <Run iir-modal/physmodjax/7rfifmar (finished)>
runs[1449]: <Run iir-modal/physmodjax/43vznps1 (finished)> and runs[1450]: <Run iir-modal/physmodjax/43vznps1 (finished)>
runs[1499]: <Run iir-modal/physmodjax/i720ra6r (finished)> and runs[1500]: <Run iir-modal/physmodjax/i720ra6r (finished)>
runs[1599]: <Run iir-modal/physmodjax/2enq3lb8 (finished)> and runs[1600]: <Run iir-modal/physmodjax/2enq3lb8 (finished)>
runs[1749]: <Run iir-modal/physmodjax/1xdoe710 (finished)> and runs[1751]: <Run iir-modal/physmodjax/1xdoe710 (finished)>
runs[1999]: <Run iir-modal/physmodjax/fkfayt36 (finished)> and runs[2000]: <Run iir-modal/physmodjax/fkfayt36 (finished)>
runs[2099]: <Run iir-modal/physmodjax/be1hpp9n (finished)> and runs[2100]: <Run iir-modal/physmodjax/be1hpp9n (finished)>
runs[2148]: <Run iir-modal/physmodjax/hjidwkdm (finished)> and runs[2151]: <Run iir-modal/physmodjax/hjidwkdm (finished)>
runs[2199]: <Run iir-modal/physmodjax/aj4sm9r0 (finished)> and runs[2202]: <Run iir-modal/physmodjax/aj4sm9r0 (finished)>
runs[2249]: <Run iir-modal/physmodjax/5tvbwwbf (finished)> and runs[2251]: <Run iir-modal/physmodjax/5tvbwwbf (finished)>
runs[2298]: <Run iir-modal/physmodjax/en65g0ld (finished)> and runs[2301]: <Run iir-modal/physmodjax/en65g0ld (finished)>
runs[2399]: <Run iir-modal/physmodjax/nvtpml5b (finished)> and runs[2400]: <Run iir-modal/physmodjax/nvtpml5b (finished)>
runs[2398]: <Run iir-modal/physmodjax/sxsee24x (finished)> and runs[2401]: <Run iir-modal/physmodjax/sxsee24x (finished)>
runs[2498]: <Run iir-modal/physmodjax/sy9i81b8 (finished)> and runs[2500]: <Run iir-modal/physmodjax/sy9i81b8 (finished)>
runs[2699]: <Run iir-modal/physmodjax/q2dzsmnq (finished)> and runs[2701]: <Run iir-modal/physmodjax/q2dzsmnq (finished)>
runs[2749]: <Run iir-modal/physmodjax/ngizfdbh (finished)> and runs[2750]: <Run iir-modal/physmodjax/ngizfdbh (finished)>
runs[2799]: <Run iir-modal/physmodjax/gvkigc5y (finished)> and runs[2800]: <Run iir-modal/physmodjax/gvkigc5y (finished)>
runs[2998]: <Run iir-modal/physmodjax/sk37dspl (finished)> and runs[3000]: <Run iir-modal/physmodjax/sk37dspl (finished)>
runs[3349]: <Run iir-modal/physmodjax/q6o6im9c (finished)> and runs[3350]: <Run iir-modal/physmodjax/q6o6im9c (finished)>
runs[3449]: <Run iir-modal/physmodjax/0q9001ir (finished)> and runs[3450]: <Run iir-modal/physmodjax/0q9001ir (finished)>
runs[3549]: <Run iir-modal/physmodjax/iyh34hh0 (finished)> and runs[3550]: <Run iir-modal/physmodjax/iyh34hh0 (finished)>
runs[3699]: <Run iir-modal/physmodjax/9v289xww (finished)> and runs[3701]: <Run iir-modal/physmodjax/9v289xww (finished)>
runs[3897]: <Run iir-modal/physmodjax/fy3vxfmi (finished)> and runs[3900]: <Run iir-modal/physmodjax/fy3vxfmi (finished)>
runs[4099]: <Run iir-modal/physmodjax/ysn4j48x (finished)> and runs[4100]: <Run iir-modal/physmodjax/ysn4j48x (finished)>
runs[4399]: <Run iir-modal/physmodjax/8yp0g96x (finished)> and runs[4401]: <Run iir-modal/physmodjax/8yp0g96x (finished)>
runs[4499]: <Run iir-modal/physmodjax/cir7zz23 (finished)> and runs[4500]: <Run iir-modal/physmodjax/cir7zz23 (finished)>
runs[4749]: <Run iir-modal/physmodjax/yrj5xuc6 (finished)> and runs[4750]: <Run iir-modal/physmodjax/yrj5xuc6 (finished)>
runs[4849]: <Run iir-modal/physmodjax/d1vnuoyu (finished)> and runs[4850]: <Run iir-modal/physmodjax/d1vnuoyu (finished)>
runs[4949]: <Run iir-modal/physmodjax/nyv06wht (finished)> and runs[4950]: <Run iir-modal/physmodjax/nyv06wht (finished)>
runs[4998]: <Run iir-modal/physmodjax/md7dxhfp (finished)> and runs[5001]: <Run iir-modal/physmodjax/md7dxhfp (finished)>
runs[5048]: <Run iir-modal/physmodjax/sx1829pq (finished)> and runs[5050]: <Run iir-modal/physmodjax/sx1829pq (finished)>
runs[5299]: <Run iir-modal/physmodjax/zse5hsqg (finished)> and runs[5300]: <Run iir-modal/physmodjax/zse5hsqg (finished)>
runs[5349]: <Run iir-modal/physmodjax/jgsfd6kn (finished)> and runs[5351]: <Run iir-modal/physmodjax/jgsfd6kn (finished)>
runs[5398]: <Run iir-modal/physmodjax/qvy2wosy (finished)> and runs[5400]: <Run iir-modal/physmodjax/qvy2wosy (finished)>
runs[5648]: <Run iir-modal/physmodjax/h0u79suh (finished)> and runs[5651]: <Run iir-modal/physmodjax/h0u79suh (finished)>
runs[5999]: <Run iir-modal/physmodjax/skksbe5k (finished)> and runs[6000]: <Run iir-modal/physmodjax/skksbe5k (finished)>
runs[5998]: <Run iir-modal/physmodjax/boxd5i0q (finished)> and runs[6001]: <Run iir-modal/physmodjax/boxd5i0q (finished)>
runs[6099]: <Run iir-modal/physmodjax/mwfq41nm (finished)> and runs[6100]: <Run iir-modal/physmodjax/mwfq41nm (finished)>
runs[6098]: <Run iir-modal/physmodjax/uiinqnl5 (finished)> and runs[6104]: <Run iir-modal/physmodjax/uiinqnl5 (finished)>
runs[6249]: <Run iir-modal/physmodjax/x2xi2pln (finished)> and runs[6251]: <Run iir-modal/physmodjax/x2xi2pln (finished)>
runs[6399]: <Run iir-modal/physmodjax/7a977w1z (finished)> and runs[6400]: <Run iir-modal/physmodjax/7a977w1z (finished)>
runs[6397]: <Run iir-modal/physmodjax/gbkm4c81 (finished)> and runs[6401]: <Run iir-modal/physmodjax/gbkm4c81 (finished)>
runs[6398]: <Run iir-modal/physmodjax/y2o8y9bz (finished)> and runs[6402]: <Run iir-modal/physmodjax/y2o8y9bz (finished)>
runs[6448]: <Run iir-modal/physmodjax/sahtepl9 (finished)> and runs[6452]: <Run iir-modal/physmodjax/sahtepl9 (finished)>
runs[6649]: <Run iir-modal/physmodjax/6z6vk03m (finished)> and runs[6650]: <Run iir-modal/physmodjax/6z6vk03m (finished)>
runs[6749]: <Run iir-modal/physmodjax/yr89n9w8 (finished)> and runs[6751]: <Run iir-modal/physmodjax/yr89n9w8 (finished)>
runs[6798]: <Run iir-modal/physmodjax/v1rpos7j (finished)> and runs[6800]: <Run iir-modal/physmodjax/v1rpos7j (finished)>
runs[6849]: <Run iir-modal/physmodjax/aqj6ys44 (finished)> and runs[6851]: <Run iir-modal/physmodjax/aqj6ys44 (finished)>
runs[6896]: <Run iir-modal/physmodjax/gkq37u0d (finished)> and runs[6900]: <Run iir-modal/physmodjax/gkq37u0d (finished)>
runs[6949]: <Run iir-modal/physmodjax/48c43ae6 (finished)> and runs[6951]: <Run iir-modal/physmodjax/48c43ae6 (finished)>
runs[6999]: <Run iir-modal/physmodjax/g6xipn8w (finished)> and runs[7000]: <Run iir-modal/physmodjax/g6xipn8w (finished)>
runs[7048]: <Run iir-modal/physmodjax/idtyf4u8 (finished)> and runs[7050]: <Run iir-modal/physmodjax/idtyf4u8 (finished)>
runs[7149]: <Run iir-modal/physmodjax/rfrg4lve (finished)> and runs[7150]: <Run iir-modal/physmodjax/rfrg4lve (finished)>
runs[7297]: <Run iir-modal/physmodjax/hga92bca (finished)> and runs[7300]: <Run iir-modal/physmodjax/hga92bca (finished)>
runs[7496]: <Run iir-modal/physmodjax/4mt7w6co (finished)> and runs[7500]: <Run iir-modal/physmodjax/4mt7w6co (finished)>
runs[7497]: <Run iir-modal/physmodjax/gm7quxia (finished)> and runs[7503]: <Run iir-modal/physmodjax/gm7quxia (finished)>
runs[7499]: <Run iir-modal/physmodjax/tvo6qipq (finished)> and runs[7506]: <Run iir-modal/physmodjax/tvo6qipq (finished)>
runs[7549]: <Run iir-modal/physmodjax/yw17kf45 (finished)> and runs[7550]: <Run iir-modal/physmodjax/yw17kf45 (finished)>
runs[7599]: <Run iir-modal/physmodjax/4zggqle2 (finished)> and runs[7603]: <Run iir-modal/physmodjax/4zggqle2 (finished)>
runs[8099]: <Run iir-modal/physmodjax/btfalbbj (finished)> and runs[8100]: <Run iir-modal/physmodjax/btfalbbj (finished)>
runs[8149]: <Run iir-modal/physmodjax/cdstls5y (finished)> and runs[8150]: <Run iir-modal/physmodjax/cdstls5y (finished)>
runs[8249]: <Run iir-modal/physmodjax/mgwt8lrf (finished)> and runs[8250]: <Run iir-modal/physmodjax/mgwt8lrf (finished)>
runs[8298]: <Run iir-modal/physmodjax/wuxyy5xe (finished)> and runs[8300]: <Run iir-modal/physmodjax/wuxyy5xe (finished)>
runs[8349]: <Run iir-modal/physmodjax/xanfgzqu (finished)> and runs[8351]: <Run iir-modal/physmodjax/xanfgzqu (finished)>
runs[8348]: <Run iir-modal/physmodjax/2gvnxbtb (finished)> and runs[8352]: <Run iir-modal/physmodjax/2gvnxbtb (finished)>
runs[8346]: <Run iir-modal/physmodjax/pm61152y (finished)> and runs[8353]: <Run iir-modal/physmodjax/pm61152y (finished)>
runs[8398]: <Run iir-modal/physmodjax/ax4djvc6 (finished)> and runs[8402]: <Run iir-modal/physmodjax/ax4djvc6 (finished)>
runs[8443]: <Run iir-modal/physmodjax/9b1oj8vf (finished)> and runs[8450]: <Run iir-modal/physmodjax/9b1oj8vf (finished)>
runs[8598]: <Run iir-modal/physmodjax/abnwba4x (finished)> and runs[8600]: <Run iir-modal/physmodjax/abnwba4x (finished)>
runs[8897]: <Run iir-modal/physmodjax/5o3frsvu (finished)> and runs[8900]: <Run iir-modal/physmodjax/5o3frsvu (finished)>
runs[9099]: <Run iir-modal/physmodjax/s2k8e9a7 (finished)> and runs[9100]: <Run iir-modal/physmodjax/s2k8e9a7 (finished)>
runs[9149]: <Run iir-modal/physmodjax/vuyhs0ce (finished)> and runs[9150]: <Run iir-modal/physmodjax/vuyhs0ce (finished)>
runs[9199]: <Run iir-modal/physmodjax/p5ogebi3 (finished)> and runs[9200]: <Run iir-modal/physmodjax/p5ogebi3 (finished)>
runs[9248]: <Run iir-modal/physmodjax/fns5zgqp (finished)> and runs[9250]: <Run iir-modal/physmodjax/fns5zgqp (finished)>
runs[9299]: <Run iir-modal/physmodjax/zggfogud (finished)> and runs[9301]: <Run iir-modal/physmodjax/zggfogud (finished)>
This behaviour is consistent even after api.flush() and updating to 0.16.0
There is nothing recorded in /tmp/debug-cli.carlos.log. I don’t know if the wandb.Api object keeps a log anywhere, haven’t been able to find anything in the docs.
Additional Files
No response
Environment
WandB version: 0.15.12 and 0.16.0
OS: Distributor ID: Ubuntu Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal
Python version: 3.9.18
Versions of relevant libraries: pandas==2.1.1
Additional Context
No response
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 19 (8 by maintainers)
Hi @cdelavegamartin thanks so much for the additional information here, this is very helpful. It seems the error to be with the API returning paginated results, rather that with our backend - as the runs show up as unique in the App side. I am logging in a project thousands of dummy runs to try and reproduce this behavior - and I will let you know if there’s any other information needed on your side, but this should suffice for now. Thanks once again, and I will keep you updated here!