wandb: [CLI]: wandb.finish() stuck when uploaded all data
Describe the bug
When running a training loop multiple times and calling wandb.finish() after each run, although it shows that all data is uploaded, the program is still stuck for a very long time.
def run_multiple_times():
while True:
wandb.init(reinit=True, ...)
# training code...
wandb.finish()
wandb: Waiting for W&B process to finish... (success).
wandb: | 20.180 MB of 20.180 MB uploaded (0.000 MB deduped)
Additional Files
No response
Environment
WandB version: 0.13.9
OS: 5.4.0-135-generic #152-Ubuntu
Python version: 3.10.9
Versions of relevant libraries:
Additional Context
No response
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 16
- Comments: 43
Encountered the same issue and try some steps as below:
Typing
top -u myusernamein the command line will show the PID1754207forwandb-service(you may have multiple wandb-service, so assume this PID causes the issue) like figure belowTyping
kill -9 1754207to stop this wandb-service processProblem luckily solved.
Reference here: https://stackoverflow.com/questions/54752710/nfs-file-cant-be-removed-resource-busy-but-pid-unknown
I believe the issue is slow upload speed, if you give it a couple hours the process should finish on it’s own
On Sun, Jun 18, 2023 at 7:28 PM rkn @.***> wrote:
Same problem.
I am also experiencing this issue, which makes it impossible to use sweeps because the runs get stuck on a wandb sync. In my case, this also results in extremely long syncs during the run (not just at the end), and sometimes the workspace does not update for 10-20 minutes. I am using
wandbversion 0.14.0, and I am not having any issues with my internet connection.same issue
same issue…
while waiting for the official fix, here is my script to quickly kill
wandb-servicesame problem. During the training, the metrics are uploaded to wandb’s server without issues. When
wandb.finish()is called, program stuck.I had the same issue when logging matplotlib.Figure with wandb.log(). Although it is not an essential solution, the following method automatically kills the wandb process.
I have met a similar issue. WandB version: 0.14.2. Python: 3.10.10 OS: CentOS7
log
debug.log
debug-internal.log