joblib: Memory Leak in joblib.Parallel

Apologies if this issue has been flagged before, I didn’t see anything about it though. I’ve been running the following bit of code:

def compute(margin, N_wl, pi, alpha, n): #start, end):

    ret_val = 0
    #for n in range(start, end):
    p1 = 0.5 + margin/2
    q_alpha = hypergeom.ppf(1-alpha, N_wl, N_wl/2, n) # upper alpha quantile of null distribution
    prob_select_N = binom.pmf(n, N_wl, pi) # probability of selecting n out of N_wl at sampling rate pi
    pvalue_nw = hypergeom.sf(q_alpha, N_wl, N_wl*p1, n) # probability of alternative distr falling above q_alpha
    return prob_select_N*pvalue_nw

def compute_unconditional_power(margin, N_wl, pi, alpha):
    '''
    Compute unconditional power of the test.

    margin = vote margin (votes for w / votes for w or l) in the population
    N_wl = the total number of ballots for either the winner or loser in the population,
    pop = total population size,
    pi = the sampling probability,
    alpha = the type I error rate
    '''
    unlikely_draw_lower = binom.ppf(0.005, N_wl, pi)
    unlikely_draw_upper = binom.ppf(0.995, N_wl, pi)
    power_sum = 0

    powers = Parallel(n_jobs=num_cores)(delayed(compute)(margin, N_wl, pi, alpha, n) \
            for n in range(int(unlikely_draw_lower), int(unlikely_draw_upper)))

    return sum(powers)

And i’ve noticed two things: the way that the parallel processes get spun up in 0.12.1 is different than in 0.11, and that this code, which works fine in 0.11, results in a memory leak in 0.12.1. Typically I get the following error, which as far as I can tell is just joblib’s way of handling an OOM:

/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/process_executor.py:634:  UserWarning: A worker timeout while some jobs were given to the executor. You might want to use a longer timeout for the executor. 
  "the executor.", UserWarning
Traceback (most recent call last):
  File "gen_plot_data.py", line 214, in <module>
    main()
  File "gen_plot_data.py", line 165, in main
    bbp_ss = get_bbp_sample_size(prop_winner, Ntot, alpha)
  File "gen_plot_data.py", line 115, in get_bbp_sample_size
    quants[quant] = get_sample_for_power(margin, Ntot, alpha, quant/100.0, 1/float(Ntot))
  File "gen_plot_data.py", line 106, in get_sample_for_power
    x = compute_unconditional_power(margin, Ntot, pi, alpha)
  File "gen_plot_data.py", line 93, in compute_unconditional_power
    for n in range(int(unlikely_draw_lower), int(unlikely_draw_upper)))
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 962, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 865, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/lib/python2.7/dist-packages/joblib/_parallel_backends.py", line 515, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/_base.py", line 431, in result
    return self.__get_result()
  File "/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/_base.py", line 382, in __get_result
    raise self._exception
joblib.externals.loky.process_executor.BrokenProcessPool: A process in the executor was terminated abruptly while the future was running or pending.```

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 33 (22 by maintainers)

Most upvoted comments

python 3.6 joblib version 0.12.5 I am having the following issues but the script was able to finish working (I have 10 workers).

/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning
/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning
/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning
/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning

Does this information mean the calculation in my code is wrong? I am not sure if I can believe the results… When I run them independently, I get the same results. So it seems the results are solid. But the warning still worries me…

I had a similar problem and following @ogrisel 's suggestion above, forcing garbage collection via gc.collect() inside the inner function (the one I was calling in parallel) seems to have resolved it. Thanks.

joblib 0.12.2 now restarts workers when it detects a memory leak. @umbernhard @jramapuram can you please try on your workload with joblib 0.12.2 to check that this fixes the problem?