numba: Slow performance on repeated deserialization
I’m running into performance problems when I deserialize a numba function many times within the same process. I have a small example here:
Process 1
import numba
@numba.njit
def f(x):
total = 0
for i in range(len(x)):
total += x[i]
return total
import cloudpickle
cloudpickle.dumps(f)
b'\x80\x04\x95\xa8\x01\x00\x00\x00\x00\x00\x00\x8c\x0fnumba.serialize\x94\x8c\x12_rebuild_reduction\x94\x93\x94(\x8c\x16numba.targets.registry\x94\x8c\rCPUDispatcher\x94\x93\x94\x8c$64ebbc90-6b3a-11e8-963f-49110aa33909\x94(K\x04C\x043\r\r\n\x94C\xa4\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00C\x00\x00\x00s*\x00\x00\x00d\x01}\x01x t\x00t\x01|\x00\x83\x01\x83\x01D\x00]\x10}\x02|\x01|\x00|\x02\x19\x007\x00}\x01q\x12W\x00|\x01S\x00)\x02N\xe9\x00\x00\x00\x00)\x02\xda\x05range\xda\x03len)\x03\xda\x01x\xda\x05total\xda\x01i\xa9\x00r\x07\x00\x00\x00\xfa\x1e<ipython-input-2-359c566f9f92>\xda\x01f\x01\x00\x00\x00s\x08\x00\x00\x00\x00\x02\x04\x01\x12\x01\x10\x01\x94\x87\x94}\x94(\x8c\x05range\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94\x8c\x03len\x94\x8c\x08builtins\x94\x8c\x03len\x94\x93\x94\x8c\x08__name__\x94\x8c\x08__main__\x94u\x8c\x01f\x94Nt\x94}\x94}\x94\x8c\x08nopython\x94\x88s\x8c\x06direct\x94\x88]\x94t\x94R\x94.'
Process 2
import pickle
import numpy as np
x = np.random.random(1000)
b = b'\x80\x04\x95\xa8\x01\x00\x00\x00\x00\x00\x00\x8c\x0fnumba.serialize\x94\x8c\x12_rebuild_reduction\x94\x93\x94(\x8c\x16numba.targets.registry\x94\x8c\rCPUDispatcher\x94\x93\x94\x8c$64ebbc90-6b3a-11e8-963f-49110aa33909\x94(K\x04C\x043\r\r\n\x94C\xa4\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00C\x00\x00\x00s*\x00\x00\x00d\x01}\x01x t\x00t\x01|\x00\x83\x01\x83\x01D\x00]\x10}\x02|\x01|\x00|\x02\x19\x007\x00}\x01q\x12W\x00|\x01S\x00)\x02N\xe9\x00\x00\x00\x00)\x02\xda\x05range\xda\x03len)\x03\xda\x01x\xda\x05total\xda\x01i\xa9\x00r\x07\x00\x00\x00\xfa\x1e<ipython-input-2-359c566f9f92>\xda\x01f\x01\x00\x00\x00s\x08\x00\x00\x00\x00\x02\x04\x01\x12\x01\x10\x01\x94\x87\x94}\x94(\x8c\x05range\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94\x8c\x03len\x94\x8c\x08builtins\x94\x8c\x03len\x94\x93\x94\x8c\x08__name__\x94\x8c\x08__main__\x94u\x8c\x01f\x94Nt\x94}\x94}\x94\x8c\x08nopython\x94\x88s\x8c\x06direct\x94\x88]\x94t\x94R\x94.'
We see that repeated calls of deserializing and then calling the function remain somewhat slow at 60ms per call.
In [5]: %time pickle.loads(b)(x)
CPU times: user 255 ms, sys: 23.8 ms, total: 279 ms
Wall time: 281 ms
Out[5]: 515.6796588752262
In [6]: %time pickle.loads(b)(x)
CPU times: user 63.4 ms, sys: 3.84 ms, total: 67.2 ms
Wall time: 66.2 ms
Out[6]: 515.6796588752262
In [7]: %time pickle.loads(b)(x)
CPU times: user 68.3 ms, sys: 0 ns, total: 68.3 ms
Wall time: 67.1 ms
Out[7]: 515.6796588752262
The cost of deserializing is relatively low
In [8]: %time f = pickle.loads(b)
CPU times: user 619 µs, sys: 66 µs, total: 685 µs
Wall time: 696 µs
And if we call this function many times then things are ok
In [9]: %time f(x)
CPU times: user 65.9 ms, sys: 3.42 ms, total: 69.3 ms
Wall time: 68.6 ms
Out[9]: 515.6796588752262
In [10]: %time f(x)
CPU times: user 25 µs, sys: 2 µs, total: 27 µs
Wall time: 34.6 µs
Out[10]: 515.6796588752262
In [11]: %time f(x)
CPU times: user 36 µs, sys: 3 µs, total: 39 µs
Wall time: 52.7 µs
Out[11]: 515.6796588752262
So my guess is that we should be deduplicating things in some way. Is this in scope for Numba to resolve or is this something that I should be handling on my end?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16 (6 by maintainers)
Commits related to this issue
- Keep a queue of references to last N deserialized functions. Fixes #3026 — committed to seibert/numba by seibert 6 years ago
- Merge pull request #3151 from seibert/cache_longer Keep a queue of references to last N deserialized functions. Fixes #3026 — committed to numba/numba by stuartarchibald 6 years ago
A modest time delay would suit my needs well.
On Fri, Jun 8, 2018, 3:25 PM Siu Kwan Lam notifications@github.com wrote: