joblib: slow memory retrieval (significantly slower then simple pickle)
Hi,
I’m little confused by why does reading and writing to (file based) “memory” take so enormous amount of time compared to bare pickling/unpickling.
In my case, func() is a tiny memorized function that takes a short string argument and returns a (short) dict with (long) lists of ~complex objects. For some reason, function retrieval from cache takes significantly more time then just unpickleing file. Resulting file is approximately 70Mb.
I observe same thing for any other function.
%prun func(some_str)
1 12.436 12.436 52.011 52.011 pickle.py:1014(load)
41531482 7.665 0.000 11.931 0.000 pickle.py:226(read)
1922386 5.547 0.000 7.339 0.000 pickle.py:1504(load_build)
41531483 4.266 0.000 4.266 0.000 {method 'read' of '_io.BufferedReader' objects}
6490284 3.753 0.000 6.666 0.000 pickle.py:1439(load_long_binput)
2645763 2.666 0.000 4.764 0.000 pickle.py:1192(load_binunicode)
30070039 2.403 0.000 2.403 0.000 {built-in method builtins.isinstance}
4140172 1.870 0.000 3.225 0.000 pickle.py:1415(load_binget)
1922386 1.369 0.000 2.049 0.000 pickle.py:1316(load_newobj)
9196954 1.359 0.000 1.359 0.000 {built-in method _struct.unpack}
1922386 1.114 0.000 8.724 0.000 numpy_pickle.py:319(load_build)
10857316 0.962 0.000 0.962 0.000 {method 'pop' of 'list' objects}
14536246 0.873 0.000 0.873 0.000 {method 'append' of 'list' objects}
1922386 0.873 0.000 1.218 0.000 pickle.py:1472(load_setitem)
1922393 0.816 0.000 0.816 0.000 {built-in method builtins.getattr}
676815 0.765 0.000 1.384 0.000 pickle.py:1458(load_appends)
1922387 0.730 0.000 0.832 0.000 pickle.py:1257(load_empty_dictionary)
1 0.715 0.715 53.099 53.099 <string>:1(<module>)
1245385 0.559 0.000 0.848 0.000 pickle.py:1451(load_append)
...
%prun len(pickle.load(open("..file..", 'rb')))
1 4.587 4.587 4.587 4.587 {built-in method _pickle.load}
1 0.553 0.553 5.140 5.140 <string>:1(<module>)
1 0.000 0.000 5.140 5.140 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
About this issue
- Original URL
- State: open
- Created 8 years ago
- Comments: 37 (29 by maintainers)
I think I got the following patch to memory.py to work:
Of course its a huge hack that just bypasses everything. I wonder if it breaks anything.
Actually thinking about it, maybe the cleanest thing to do is to add a
use_joblib_pickling(for lack of a better name) argument toMemory, which should be True by default.