joblib: Memory Leak in joblib.Parallel
Apologies if this issue has been flagged before, I didn’t see anything about it though. I’ve been running the following bit of code:
def compute(margin, N_wl, pi, alpha, n): #start, end):
ret_val = 0
#for n in range(start, end):
p1 = 0.5 + margin/2
q_alpha = hypergeom.ppf(1-alpha, N_wl, N_wl/2, n) # upper alpha quantile of null distribution
prob_select_N = binom.pmf(n, N_wl, pi) # probability of selecting n out of N_wl at sampling rate pi
pvalue_nw = hypergeom.sf(q_alpha, N_wl, N_wl*p1, n) # probability of alternative distr falling above q_alpha
return prob_select_N*pvalue_nw
def compute_unconditional_power(margin, N_wl, pi, alpha):
'''
Compute unconditional power of the test.
margin = vote margin (votes for w / votes for w or l) in the population
N_wl = the total number of ballots for either the winner or loser in the population,
pop = total population size,
pi = the sampling probability,
alpha = the type I error rate
'''
unlikely_draw_lower = binom.ppf(0.005, N_wl, pi)
unlikely_draw_upper = binom.ppf(0.995, N_wl, pi)
power_sum = 0
powers = Parallel(n_jobs=num_cores)(delayed(compute)(margin, N_wl, pi, alpha, n) \
for n in range(int(unlikely_draw_lower), int(unlikely_draw_upper)))
return sum(powers)
And i’ve noticed two things: the way that the parallel processes get spun up in 0.12.1 is different than in 0.11, and that this code, which works fine in 0.11, results in a memory leak in 0.12.1. Typically I get the following error, which as far as I can tell is just joblib’s way of handling an OOM:
/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/process_executor.py:634: UserWarning: A worker timeout while some jobs were given to the executor. You might want to use a longer timeout for the executor.
"the executor.", UserWarning
Traceback (most recent call last):
File "gen_plot_data.py", line 214, in <module>
main()
File "gen_plot_data.py", line 165, in main
bbp_ss = get_bbp_sample_size(prop_winner, Ntot, alpha)
File "gen_plot_data.py", line 115, in get_bbp_sample_size
quants[quant] = get_sample_for_power(margin, Ntot, alpha, quant/100.0, 1/float(Ntot))
File "gen_plot_data.py", line 106, in get_sample_for_power
x = compute_unconditional_power(margin, Ntot, pi, alpha)
File "gen_plot_data.py", line 93, in compute_unconditional_power
for n in range(int(unlikely_draw_lower), int(unlikely_draw_upper)))
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 962, in __call__
self.retrieve()
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 865, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/local/lib/python2.7/dist-packages/joblib/_parallel_backends.py", line 515, in wrap_future_result
return future.result(timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/_base.py", line 431, in result
return self.__get_result()
File "/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/_base.py", line 382, in __get_result
raise self._exception
joblib.externals.loky.process_executor.BrokenProcessPool: A process in the executor was terminated abruptly while the future was running or pending.```
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 33 (22 by maintainers)
python 3.6 joblib version 0.12.5 I am having the following issues but the script was able to finish working (I have 10 workers).
Does this information mean the calculation in my code is wrong? I am not sure if I can believe the results… When I run them independently, I get the same results. So it seems the results are solid. But the warning still worries me…
I had a similar problem and following @ogrisel 's suggestion above, forcing garbage collection via gc.collect() inside the inner function (the one I was calling in parallel) seems to have resolved it. Thanks.
joblib 0.12.2 now restarts workers when it detects a memory leak. @umbernhard @jramapuram can you please try on your workload with joblib 0.12.2 to check that this fixes the problem?