dask: Questions about map_blocks and _meta
This is a general question but is more focused on lower-level dask development so I figured I’d ask it here. This has to do with the recent _meta changes from #2977, #4543, and others from @mrocklin @shoyer and @pentschev.
Starting with dask 2.0, some of my tests on my Satpy library are failing because I was checking how often my map_blocks’d function was being called. To determine the meta information, the function is now being called once for meta and once for actual usage from what I can tell.
So my main question is, is it possible to provide keyword arguments to map_blocks to say “this is the array type that will be returned by this function” or could it be changed to default to the type of the input arrays? Is there some way to skip this meta generation step? Would this interfere with the sparse/masked array support?
Relevant source code: https://github.com/dask/dask/blob/master/dask/array/utils.py#L118-L133
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 16 (16 by maintainers)
Yeah I think that is the origin of
.blocks. There are various use cases that benefit from grabbing individual chunks and working on them.That said, with the addition of HLG, there is some benefit to rewriting things in terms of
map_blocksinstead.We should definitely have function arguments to allow explicitly providing
metain any location where it would be inferred from a user-defined function. This is similar to the existing situation withdtype. Running user provided functions on bogus inputs to determine metadata should really be a last resort.@pentschev could you kindly remind why it’s important to keep track of
_metaat all on dask arrays? Which dask operations need to know the types of contained arrays?