napari: Large multiscale zarrs slow on pan/zoom when clipped

🐛 Bug

Hello, I’m very excited to incorporate napari into our histology workflow, but we are encountering an issue that degrades UX. Passing a list of dask arrays from an image pyramid makes pan and zoom very slow (several seconds per frame), getting worse with larger images. If I pass only the base level of the pyramid, then interactions are silky smooth.

To Reproduce

Steps to reproduce the behavior:

  1. Create a moderately sized zarr pyramid:
import numpy
import skimage.data as data
import zarr
from skimage.transform import pyramid_gaussian


def create_zarr(path: str, image: numpy.ndarray, chunk_size: int = 512) -> None:
    pyramid = pyramid_gaussian(image, downscale=4, max_layer=4, multichannel=True)

    store = zarr.DirectoryStore(path)
    with zarr.group(store, overwrite=True) as group:
        series = []
        for i, layer in enumerate(pyramid):
            path = "base" if i == 0 else f"L{i}"
            group.create_dataset(path, data=layer, chunks=(chunk_size, chunk_size, 3))
            series.append({"path": path})

        multiscales = [{"name": "pyramid", "datasets": series, "type": "pyramid"}]
        group.attrs["multiscales"] = multiscales


create_zarr("./image.zarr", numpy.tile(data.astronaut(), reps=(10, 10, 1)))
  1. Load the zarr into napari
import os
from typing import List

import dask.array as da
import napari


def get_pyramid(path: str) -> List[da.Array]:
    levels = ["base"] + sorted(
        [s for s in os.listdir(path) if (not s.startswith(".")) and (s != "base")]
    )
    return [da.from_zarr(f"{str(path)}/{level}").rechunk() for level in levels]


pyramid = get_pyramid("image.zarr")

with napari.gui_qt():
    viewer = napari.Viewer()
    viewer.add_image(pyramid)  # [0])  # add this in to load only the base layer

Expected behavior

Buttery smooth UI when a pyramid is passed in, not just for a single image.

Environment

napari: 0.4.0
Platform: Windows-10-10.0.19041-SP0
Python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:22:52) [MSC v.1916 64 bit (AMD64)]
Qt: 5.12.9
PyQt5: 5.12.3
NumPy: 1.18.5
SciPy: 1.5.3
Dask: 2.30.0
VisPy: 0.6.5

GL version: 4.6.0 NVIDIA 452.57
MAX_TEXTURE_SIZE: 32768

Plugins:
- napari_plugin_engine: 0.1.8
- svg: 0.1.4

Performance traces

See multiscale.txt for the trace when panning with a pyramid. See single.txt for the trace when panning with a single image. (Renamed from json to txt for upload purposes)

single.txt multiscale.txt

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 27 (18 by maintainers)

Most upvoted comments

Yeah, panning is similar. You can help panning a lot just by loading more same-level tiles, like expand the intersected frustum a bit on all sides. So no multi-level required. But then loading coarser levels should be in the mix somewhere, since that gets you more coverage faster.

In general, we need two systems that don’t exist yet. One is a priority queue of things to load, from various locations and various levels. On-screen, nearly on screen, next/prev slices, one level up, two levels up, etc. With some “policy” that we can tweak and tune. And offer some config options, mostly how aggressive to pre-cache stuff.

The second system is during rendering a way to render the “best available” data, wherever it is. This is just like a very fast search process. Mostly it’s drawing the current level. But in some cases, it might have to hunt around. Maybe we load at the root tile (one 256x256 texture) up front, as the worst-case fallback. We might be inside a single pixel of that, but at least it’s something.

Really it should be easy to make it look good once we have the machinery/vocabulary to describe what we want to happen. All of this was planned since day one. We are just getting there now.

Screen Shot 2020-12-02 at 2 35 54 PM

Yeah, when it pops to this level it needs to load 36 256x256 tiles from RAM to the GPU. Each takes like 50ms which is way too long. But I think the first step is to amortize it down to one load per frame. So that would take 36 frames if it’s doing 20Hz it’s almost 2 seconds. For now, it will go black and then you see them popping in.

The next step is to keep rendering the old tiles, even though they are “too big”. Basically, you draw each parent tile until its 4 children are ready, then you swap. That will give you the standard tiled rendering thing where things just get a bit blurry then tile by tile they can clearer. All within 2 seconds.

So those are two separate steps neither totally simple, but neither requires invention it’s just wiring it up. So I can look at those this week hopefully.

Then I would circle by to why 50ms to load one tile… In a game you might load hundreds of textures in 5ms, just going from RAM to VRAM is fast. So 50ms for one 256x256 texture is slower than we’d like.

Screen Shot 2020-12-02 at 2 40 59 PM

Is the PR description verbatim what you are using to test @wooahn?

I can test with NAPARI_OCTREE=1 and see how it compares. It’s less done than NAPARI_ASYNC. But if this is a good test case, I can try to make it work.

Ultimately octree should result in the best performance. But it might not be there today.

Also, if I pass in the full pyramid and the napari window itself is small, I can interact with the image just fine, but as soon as I maximize the window to my screen (2560x1440) and the image starts clipping that’s when it gets slow (a few frames per second).

This is expected. Currently with pyramids, with every small pan we create a new “view” of the underlying array (potentially requiring IO if there is no caching implemented), and we send that view to the GPU. The size of said view is controlled by the canvas size, so for a big display, you are sending a huge array over to the GPU every single time you reveal a new pixel, which, we know, is ridiculous. 😂 For a small window size, the array that you are sending is much smaller, so performance is acceptable.

We are working on this! Thank you for your patience! 😬

Oh interesting. Yeah those adjustments definitely help, but even though our images are already in uint8, they’re quite large so they get pretty sluggish anyway. Also, if I pass in the full pyramid and the napari window itself is small, I can interact with the image just fine, but as soon as I maximize the window to my screen (2560x1440) and the image starts clipping that’s when it gets slow (a few frames per second).

Thanks again for looking into this!

Hi @sofroniewn, thanks for responding so quickly!

I’ve just tried creating the zarr with a chunksize of 128 and removed the call to rechunk, but the image is still very slow to pan and zoom. I only had the rechunk in there because I was working with some large SVS and SCN files and I noticed that rechunking improved performance for those kinds of files.

I have also tried to use the following async.json file using NAPARI_ASYNC, but the performance improves only slightly, at least from what I can tell using the performance widget (NAPARI_PERFMON).

{
  "synchronous": false,
  "num_workers": 8,
  "use_processes": false,
  "force_delay_seconds": 0
}

When I have some spare time I’d be happy to provide call graphs from pstat, like #1885 .