keras: How to parallelize fit_generator? (PicklingError)
I tried several ways but cannot get parallelization of sample / data generation to work successfully. Below is a gist. Am I doing something wrong, or is there a bug?
https://gist.github.com/stmax82/283ef735c8e2601ef841de8b37243ee1
I suppose that my fourth try would be the correct one - but when I set pickle_safe=True, I get the error:
PicklingError: Can't pickle <function generator_queue.<locals>.data_generator_task at 0x000000001B042EA0>: attribute lookup data_generator_task on keras.engine.training failed
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 15 (5 by maintainers)
TL;DR: Do NOT set
pickle_safe=True
. You’re bound for trouble.Extensive explanation: I’ve been investigating the way workers are used in the
_generator
function sets (see issues #5071, #6745 ). So far, my conclusion is that the way pickle_safe=True is implemented is, at best, flawed beyond recovery and should be avoided completely. Here’s what I’ve gathered:fork()
, themultiprocessing.Process
is made -simplifying heavily- by creating a whole new application process, pickling the data the new process needs and sending it over a pipe, together with other data required to simulate afork()
(See this article for a more detailed and precise explanation of why that is necessary)fork()
's magic, there isn’t the need to pickle and unpickle the generator. However, An identical independent clone of the original generator is created and used independently in each child process! Which means that your data is being enqueued one time per each worker whenever they callnext()
and obtain a new batch.Take this example:
The output of this code under Linux is:
while under Windows it breaks with the error:
Considering all, I believe the
pickle_safe
argument to be misleading, wrong and potentially harmful and should be IMO removed altogether. Until then, stick topickle_safe=False
in your code to avoid headaches.