wandb: Unable to launch multiple runs with Joblib

Hi, thanks for the awesome library. I found it really helpful for my research!

However I have encountered some issues with trying to launch multiple runs with joblib.

from math import sqrt
from joblib import Parallel, delayed
import wandb

def f(x):
    wandb.init(project="symppl", reinit=True)
    for i in range(10):
        loss = i
        # Log metrics with wandb
        # wandb.log({"Loss": loss})
    wandb.finish()
    return sqrt(x)

def main():
    res = Parallel(n_jobs=2)(delayed(f)(i**2) for i in range(4))
    print(res)

if __name__ == "__main__":
    main()

This will error with the following exception.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 231, in prepare
    set_start_method(data['start_method'], force=True)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 247, in set_start_method
    self._actual_context = self.get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 239, in get_context
    return super().get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 193, in get_context
    raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'loky'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 231, in prepare
    set_start_method(data['start_method'], force=True)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 247, in set_start_method
    self._actual_context = self.get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 239, in get_context
    return super().get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 193, in get_context
    raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'loky'

Changing joblib’s backend to multiprocessing also causes an error.

Problem at: /Users/ethan/dev/wandb/run.py 6 f
Problem at: /Users/ethan/dev/wandb/run.py 6 f
Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
    run = wi.init()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
    backend.ensure_launched(
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
    self.wandb_process.start()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
    run = wi.init()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
    backend.ensure_launched(
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
    self.wandb_process.start()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
wandb: ERROR Abnormal program exit
wandb: ERROR Abnormal program exit
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
    run = wi.init()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
    backend.ensure_launched(
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
    self.wandb_process.start()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "/Users/ethan/dev/wandb/run.py", line 6, in f
    wandb.init(project="symppl", reinit=True)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 612, in init
    six.raise_from(Exception("problem"), error_seen)
  File "<string>", line 3, in raise_from
Exception: problem
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run.py", line 19, in <module>
    main()
  File "run.py", line 15, in main
    res = Parallel(n_jobs=2, backend="multiprocessing")(delayed(f)(i**2) for i in range(4))
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 1042, in __call__
    self.retrieve()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
Exception: problem
/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

My workflow consists of sweeping over hyper-parameters with hydra. This currently blocks me from using Hydra with a parallel launcher (e.g. joblib). I think this would also block support for hydra multiruns; see https://github.com/wandb/client/issues/1233#issuecomment-693139976

Is there a way that I can work around this issue?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 18 (7 by maintainers)

Most upvoted comments

Hey @tiborsekera with the current release of the client library you can set the following environment variable so all works with joblib:

os.environ["WANDB_START_METHOD"] = "thread"

@ethanluoyc

Thanks for reporting this. As a temporary workaround until we fix this you can do:

pip install wandb==0.9.7

The only other change is that you will need in your sample script is to change wandb.finish() to wandb.join()

We are using a different approach in 0.10.x that appears to be conflicting with joblib’s multiprocessing implementations.

I was able to reproduce the issue and made a hot fix. We’ll need to put this through some more rigorous testing, but if @varun19299 or @ethanluoyc want to give joblib a try with 0.10.x you can install my hotfix branch with:

pip install --force-reinstall git+git://github.com/wandb/client.git@task/joblib#egg=wandb