wandb: Sync with server is extremely slow
wandb --version && python --version && uname
- wandb, version 0.10.5
- Python, version 3.7.6
- Linux Ubuntu 20.04
Python script to reproduce:
import wandb
wandb.init()
wandb.log({'a':1})
The script takes more than 10 minutes to finish, it get stuck at sync phase. On another machine with the same configuration it works, I am not able to pinpoint the problem. The beginning of debug.log is the following:
2020-10-09 17:56:17,590 INFO MainThread:14442 [wandb_init.py:_log_setup():292] Logging user logs to wandb/run-20201009_175617-2p03aens/logs/debug.log
2020-10-09 17:56:17,590 INFO MainThread:14442 [wandb_init.py:_log_setup():293] Logging internal logs to wandb/run-20201009_175617-2p03aens/logs/debug-internal.log
2020-10-09 17:56:17,590 INFO MainThread:14442 [wandb_setup.py:_flush():68] setting env: {}
2020-10-09 17:56:17,590 INFO MainThread:14442 [wandb_setup.py:_flush():68] setting user settings: {}
2020-10-09 17:56:17,590 INFO MainThread:14442 [wandb_setup.py:_flush():68] multiprocessing start_methods=fork,spawn,forkserver
2020-10-09 17:56:23,256 INFO MainThread:14442 [wandb_run.py:_console_start():1265] atexit reg
2020-10-09 17:56:23,257 INFO MainThread:14442 [wandb_run.py:_redirect():1135] redirect: SettingsConsole.REDIRECT
2020-10-09 17:56:23,257 INFO MainThread:14442 [wandb_run.py:_redirect():1138] Redirecting console.
2020-10-09 17:56:23,257 INFO MainThread:14442 [redirect.py:install():196] install start
2020-10-09 17:56:23,258 INFO MainThread:14442 [redirect.py:install():211] install stop
2020-10-09 17:56:23,258 INFO MainThread:14442 [redirect.py:install():196] install start
2020-10-09 17:56:23,258 INFO MainThread:14442 [redirect.py:install():211] install stop
2020-10-09 17:56:23,258 INFO MainThread:14442 [wandb_run.py:_redirect():1182] Redirects installed.
2020-10-09 17:56:23,259 INFO MainThread:14442 [wandb_run.py:_atexit_cleanup():1238] got exitcode: 0
2020-10-09 17:56:23,260 INFO MainThread:14442 [wandb_run.py:_restore():1210] restore
2020-10-09 17:56:23,260 INFO MainThread:14442 [redirect.py:uninstall():215] uninstall start
2020-10-09 17:56:23,260 INFO MainThread:14442 [redirect.py:_stop():264] _stop: stdout
2020-10-09 17:56:23,260 INFO MainThread:14442 [redirect.py:_stop():270] _stop closed: stdout
2020-10-09 17:56:23,260 INFO stdout :14442 [redirect.py:_pipe_relay():119] relay done saw last write: stdout
2020-10-09 17:56:23,260 INFO stdout :14442 [redirect.py:_pipe_relay():135] relay done done: stdout
2020-10-09 17:56:23,260 INFO MainThread:14442 [redirect.py:_stop():276] _stop joined: stdout
2020-10-09 17:56:23,260 INFO MainThread:14442 [redirect.py:_stop():278] _stop rd closed: stdout
2020-10-09 17:56:23,261 INFO MainThread:14442 [redirect.py:uninstall():219] uninstall done
2020-10-09 17:56:23,261 INFO MainThread:14442 [redirect.py:uninstall():215] uninstall start
2020-10-09 17:56:23,261 INFO MainThread:14442 [redirect.py:_stop():264] _stop: stderr
2020-10-09 17:56:23,261 INFO MainThread:14442 [redirect.py:_stop():270] _stop closed: stderr
2020-10-09 17:56:23,261 INFO stderr :14442 [redirect.py:_pipe_relay():119] relay done saw last write: stderr
2020-10-09 17:56:23,261 INFO stderr :14442 [redirect.py:_pipe_relay():135] relay done done: stderr
2020-10-09 17:56:23,261 INFO MainThread:14442 [redirect.py:_stop():276] _stop joined: stderr
2020-10-09 17:56:23,261 INFO MainThread:14442 [redirect.py:_stop():278] _stop rd closed: stderr
2020-10-09 17:56:23,261 INFO MainThread:14442 [redirect.py:uninstall():219] uninstall done
2020-10-09 17:56:23,716 INFO MainThread:14442 [wandb_run.py:_wait_for_finish():1349] got exit ret: uuid: "977487dc190d4a7b85170a7da99da7ca"
response {
poll_exit_response {
file_counts {
wandb_count: 1
}
pusher_stats {
total_bytes: 472
}
}
}
2020-10-09 17:56:25,723 INFO MainThread:14442 [wandb_run.py:_wait_for_finish():1349] got exit ret: uuid: "ae39742a10bb4754aadce46e0e72d889"
response {
poll_exit_response {
file_counts {
wandb_count: 4
}
pusher_stats {
total_bytes: 1089
}
}
}
2020-10-09 17:56:27,730 INFO MainThread:14442 [wandb_run.py:_wait_for_finish():1349] got exit ret: uuid: "0c5c130d2f1e4edf98d2e9ff0a1a040e"
response {
poll_exit_response {
file_counts {
wandb_count: 4
}
pusher_stats {
total_bytes: 1089
}
}
}
2020-10-09 17:56:29,736 INFO MainThread:14442 [wandb_run.py:_wait_for_finish():1349] got exit ret: uuid: "8725216e2144469e83762b70c7f20e62"
response {
poll_exit_response {
file_counts {
wandb_count: 4
}
pusher_stats {
total_bytes: 1089
}
}
}
debug files: debug.log debug-internal.log
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (5 by maintainers)
Hi, I have the exact same problem and my setting is:
wandb, version 0.13.1 Python version 3.9.7 Linux Ubuntu 20.04
The sync is very slow and takes around 15minutes for a file of size 0.005MB
The same problem here. It used to work all of a sudden, it started getting stucked in syncing step. I also check if there any network issue (via git push and pull, some network testing), there was no problem with any other network related task.
Issue-Label Bot is automatically applying the label
bugto this issue, with a confidence of 0.79. Please mark this comment with 👍 or 👎 to give our bot feedback!Links: app homepage, dashboard and code for this bot.