wandb: ERROR: Abnormal program exit

wandb --0.10.0 && python --3.6.12

  • Weights and Biases version: 0.10.0
  • Python version: 3.6
  • Operating System: Linux Ubuntu 18.06

Description

Cannot run wandb.init()

What I Did

(kg) junhao@compute006:~$ python
Python 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import wandb
>>> wandb.init()
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: You can find your API key in your browser here: https://app.wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:
wandb: Appending key for api.wandb.ai to your netrc file: /home/junhao/.netrc
Problem at: <stdin> 1 <module>
Traceback (most recent call last):
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 479, in init
    run = wi.init()
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 358, in init
    _backend=backend, _disable_warning=True, _settings=self.settings
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/sdk/wandb_login.py", line 95, in _login
    res = _backend.interface.communicate_login(key, anonymous)
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/interface/interface.py", line 438, in communicate_login
    "Couldn't communicate with backend after %s seconds" % timeout
wandb.errors.error.Error: Couldn't communicate with backend after 5 seconds
wandb: ERROR Abnormal program exit
Traceback (most recent call last):
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 479, in init
    run = wi.init()
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 358, in init
    _backend=backend, _disable_warning=True, _settings=self.settings
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/sdk/wandb_login.py", line 95, in _login
    res = _backend.interface.communicate_login(key, anonymous)
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/interface/interface.py", line 438, in communicate_login
    "Couldn't communicate with backend after %s seconds" % timeout
wandb.errors.error.Error: Couldn't communicate with backend after 5 seconds

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/junhao/anaconda3/envs/kg/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 513, in init
    six.raise_from(Exception("problem"), error_seen)
  File "<string>", line 3, in raise_from
Exception: problem
>>> B

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 23 (2 by maintainers)

Most upvoted comments

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.84. Please mark this comment with πŸ‘ or πŸ‘Ž to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

To me downgrading wandb to 0.9.7 mitigated the problem. It seems to be a problem with the newest version only.

@vanpelt I am currently experiencing this issue now as well.

Same for me, I performed no changes on my env, and the script does not want to run anymore. Tried updating wandb without success.

Yep, definitely back. But it’s happened to me for the first time ever today without any package up/down-grade.

Traceback (most recent call last):
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1075, in init
    wi.setup(kwargs)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 165, in setup
    self._wl = wandb_setup.setup()
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 312, in setup
    ret = _setup(settings=settings)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 307, in _setup
    wl = _WandbSetup(settings=settings)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 293, in __init__
    _WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 106, in __init__
    self._setup()
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 234, in _setup
    self._setup_manager()
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 265, in _setup_manager
    self._manager = wandb_manager._Manager(
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_manager.py", line 111, in __init__
    self._service.start()
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 119, in start
    self._launch_server()
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 115, in _launch_server
    assert ports_found
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 192, in <module>
    runner.run()
  File "main.py", line 96, in run
    self.trainer: BaseTrainer = cls(**self.trainer_config)
  File "/home/mila/s/schmidtv/ocp-project/run-repos/ocp-1/ocpmodels/trainers/base_trainer.py", line 144, in __init__
    self.load()
  File "/home/mila/s/schmidtv/ocp-project/run-repos/ocp-1/ocpmodels/trainers/base_trainer.py", line 153, in load
    self.load_logger()
  File "/home/mila/s/schmidtv/ocp-project/run-repos/ocp-1/ocpmodels/trainers/base_trainer.py", line 184, in load_logger
    self.logger = registry.get_logger_class(logger_name)(self.config)
  File "/home/mila/s/schmidtv/ocp-project/run-repos/ocp-1/ocpmodels/common/logger.py", line 112, in __init__
    self.run = wandb.init(
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1115, in init
    raise Exception("problem") from error_seen
Exception: problem


Self-canceling SLURM job 2666801
slurmstepd: error: *** JOB 2666801 ON rtx7 CANCELLED AT 2023-01-10T20:43:59 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 2666801.0 ON rtx7 CANCELLED AT 2023-01-10T20:43:59 ***
Traceback (most recent call last):
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/__main__.py", line 3, in <module>
    cli.cli(prog_name="python -m wandb")
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/cli/cli.py", line 97, in wrapper
    return func(*args, **kwargs)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/cli/cli.py", line 282, in service
    server.serve()
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/service/server.py", line 130, in serve
    self._inform_used_ports(grpc_port=grpc_port, sock_port=sock_port)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/service/server.py", line 65, in _inform_used_ports
    pf.write(self._port_fname)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/site-packages/wandb/sdk/service/port_file.py", line 27, in write
    f = tempfile.NamedTemporaryFile(prefix=bname, dir=dname, mode="w", delete=False)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/tempfile.py", line 540, in NamedTemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/home/mila/s/schmidtv/.conda/envs/ocp-a100/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpxm30s71a/port-6513.txtksswlbb9'

fyi, this seems to have worked for me: submit a job with

WANDB__SERVICE_WAIT=300 python code.py

edit: adding link where I foudn this solution, https://github.com/wandb/wandb/issues/4224

I am still experiencing similar issues, see out put below.

Traceback (most recent call last):
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1075, in init
    wi.setup(kwargs)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 165, in setup
    self._wl = wandb_setup.setup()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 312, in setup
    ret = _setup(settings=settings)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 307, in _setup
    wl = _WandbSetup(settings=settings)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 293, in __init__
    _WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 106, in __init__
    self._setup()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 234, in _setup
    self._setup_manager()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 265, in _setup_manager
    self._manager = wandb_manager._Manager(
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_manager.py", line 108, in __init__
    self._service.start()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 112, in start
    self._launch_server()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 108, in _launch_server
    assert ports_found
AssertionError
wandb: ERROR Abnormal program exit
Traceback (most recent call last):
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1075, in init
    wi.setup(kwargs)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 165, in setup
    self._wl = wandb_setup.setup()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 312, in setup
    ret = _setup(settings=settings)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 307, in _setup
    wl = _WandbSetup(settings=settings)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 293, in __init__
    _WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 106, in __init__
    self._setup()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 234, in _setup
    self._setup_manager()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 265, in _setup_manager
    self._manager = wandb_manager._Manager(
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_manager.py", line 108, in __init__
    self._service.start()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 112, in start
    self._launch_server()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 108, in _launch_server
    assert ports_found
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train_masters.py", line 897, in <module>
    wandb.init(project="Masters", config=args, resume=False, group="final",
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1116, in init
    raise Exception("problem") from error_seen
Exception: problem
Traceback (most recent call last):
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/__main__.py", line 3, in <module>
    cli.cli(prog_name="python -m wandb")
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/cli/cli.py", line 97, in wrapper
    return func(*args, **kwargs)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/cli/cli.py", line 282, in service
    server.serve()
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/service/server.py", line 130, in serve
    self._inform_used_ports(grpc_port=grpc_port, sock_port=sock_port)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/service/server.py", line 65, in _inform_used_ports
    pf.write(self._port_fname)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/site-packages/wandb/sdk/service/port_file.py", line 25, in write
    f = tempfile.NamedTemporaryFile(prefix=bname, dir=dname, mode="w", delete=False)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/tempfile.py", line 540, in NamedTemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/scratch/hywluc001/conda-envs/pointnet/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpsnxv_nef/port-18551.txtacbmdqof'

I’m getting a similar error with the following stack trace:

  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 1075, in init
    wi.setup(kwargs)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 165, in setup
    self._wl = wandb_setup.setup()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 312, in setup
    ret = _setup(settings=settings)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 307, in _setup
    wl = _WandbSetup(settings=settings)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 293, in __init__
    _WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 106, in __init__
    self._setup()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 234, in _setup
    self._setup_manager()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 266, in _setup_manage$
    _use_grpc=use_grpc, settings=self._settings
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_manager.py", line 108, in __init__
    self._service.start()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/service/service.py", line 112, in start
    self._launch_server()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/service/service.py", line 108, in _launch_s$rver
    assert ports_found
AssertionError
wandb: ERROR Abnormal program exit
Traceback (most recent call last):
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 1075, in init
    wi.setup(kwargs)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 165, in setup
    self._wl = wandb_setup.setup()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 312, in setup
    ret = _setup(settings=settings)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 307, in _setup
    wl = _WandbSetup(settings=settings)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 293, in __init__
    _WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 106, in __init__
    self._setup()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 234, in _setup
    self._setup_manager()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_setup.py", line 266, in _setup_manage$
    _use_grpc=use_grpc, settings=self._settings
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/wandb_manager.py", line 108, in __init__
    self._service.start()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/service/service.py", line 112, in start
    self._launch_server()
  File "/h/phil/anaconda3/envs/rna_contrast/lib/python3.7/site-packages/wandb/sdk/service/service.py", line 108, in _launch_s$
rver
    assert ports_found
AssertionError

with wandb v0.13.5

Any advice as to how to get around this? It seems to be failing sporadically as some jobs keep on running and others crash with this error periodically