numba: Structured arrays 10 times slower than two dimensional arrays in nopython mode
Hello,
using structured arrays as suggested by Graham Markall ( https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/Mqi9SXzqdTg ) instead of class properties is about 10 times slower than to use two dimensional arrays.
Implemented code example with two dimensional arrays: https://gist.github.com/ufechner7/d4fd1b75af0f1e5cd48c
Benchmark code: https://gist.github.com/ufechner7/95db14f734edd51dcd9b
Benchmark results:
time for numba sub_plain [µs]: 0.33
time for numba sub_rec_array [µs]: 1.97
time for numba sub_array [µs]: 0.19
This prevents me from using record arrays, even though the code is shorter and cleaner, if they could be used.
Code using record arrays:
x_dt = np.dtype([('v_wind', np.float64),
('v_wind_gnd', np.float64),
('result', np.float64)])
buf = np.zeros((3, 3)) # initialize the record array with zeros
vec3 = np.recarray(3, dtype=x_dt, buf=buf)
sub(vec3.v_wind, vec3.v_wind_gnd, vec3.result)
Code using two dimensional arrays:
V_wind = 0 # (westwind, downwind direction to the east)
V_wind_gnd = 1
Result = 2
vec3 = np.zeros((3, 3))
sub(vec3[V_wind], vec3[V_wind_gnd], vec3[Result])
It would be nice, if this could be fixed.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 21 (11 by maintainers)
It’s been suggested to me that there is a lack of documentation/explanation on what Numba actually does when it invokes a jitted function, so I hope that I can explain a bit more here about what Numba is doing in order to clarify things:
When you make a call to a function that has been decorated with the jit decorator, Numba has to figure out what the types of all the arguments are in order to either compile the jitted function for those arguments (because every time a new set of argument types is used, Numba must compile a specialised version of the function), or to retrieve a cached compiled version of the function for those specific arguments. In the case of scalar types and arrays of scalar types, this is fairly simple - each dtype has a unique integer that identifies it, so comparing argument types is just a case of comparing integers, and this is very fast. For structured types, it is more complicated, since every field needs to be compared. In order to do this, we have to call a function which computes the “data type descriptor” of the structured type (i.e. http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#c.PyArray_Descr). This takes much longer than just looking up the integer corresponding to a type, and also takes longer to compare with other data type descriptors for the already-compiled versions of the function.
This lookup of argument types and comparisons happens within the Numba dispatcher, which is also responsible for marshalling any arguments into their native types before passing them to the compiled code.
What you are seeing with the benchmark above, is that the time taken to call the function increases by about 2us when a structured type is involved, because of all this extra work that is going on in the dispatcher. The actual generated code has not slowed down, but its runtime is a very small part of the total runtime because it is much shorter than the time spent in the dispatcher.
In summary:
subfunction’s body has a very short execution time.