distributed: Bokeh dashboard failures

I think I’ve seen other reports of this recently, but here’s an example of the bokeh dashboard being sad:

https://gist.github.com/mrocklin/417439bf259b8974b502e67d2ac98f22

from dask.distributed import Client
client = Client()

import zarr
import numpy as np
import dask.array as da


n = (100000, 100000)
chunksize = (5000, 5000)
input = zarr.ones(n, chunks=chunksize)

def f1d(chunk):
    return chunk.sum(keepdims=True)

def f2d(chunk, axis=None):
    return chunk.sum(axis=axis, keepdims=True)

x = da.from_array(input, chunks=chunksize)
a = x.map_blocks(f1d)
b = x.map_blocks(f2d, axis=0)
c = x.map_blocks(f2d, axis=1)

import dask
aa, bb, cc = dask.persist(a, b, c)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('alpha', 519), ('color', 481), ('duration', 519), ('duration_text', 481), ('key', 481), ('name', 481), ('start', 519), ('worker', 481), ('worker_thread', 481), ('y', 519)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('alpha', 555), ('color', 483), ('duration', 555), ('duration_text', 483), ('key', 483), ('name', 483), ('start', 555), ('worker', 483), ('worker_thread', 483), ('y', 555)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('alpha', 582), ('color', 490), ('duration', 582), ('duration_text', 490), ('key', 490), ('name', 490), ('start', 582), ('worker', 490), ('worker_thread', 490), ('y', 582)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('alpha', 614), ('color', 478), ('duration', 614), ('duration_text', 478), ('key', 478), ('name', 478), ('start', 614), ('worker', 478), ('worker_thread', 478), ('y', 614)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('alpha', 644), ('color', 492), ('duration', 644), ('duration_text', 492), ('key', 492), ('name', 492), ('start', 644), ('worker', 492), ('worker_thread', 492), ('y', 644)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('alpha', 672), ('color', 480), ('duration', 672), ('duration_text', 480), ('key', 480), ('name', 480), ('start', 672), ('worker', 480), ('worker_thread', 480), ('y', 672)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('alpha', 700), ('color', 492), ('duration', 700), ('duration_text', 492), ('key', 492), ('name', 492), ('start', 700), ('worker', 492), ('worker_thread', 492), ('y', 700)

Sorry for the not entirely minimal example. Hopefully it’s easy to reproduce though.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 4
  • Comments: 16 (14 by maintainers)

Most upvoted comments

3.2.2 is available on PyPI and the Bokeh anaconda.org channel. I am not plugged into the process for availability on conda-forge and defaults these days.

Fix PR is verified and merged. I will try to backport and cut a 3.2.2 release in the next few days

Thanks @bryevdv for all the help! Confirmed 3.2.2 works. Since this is a warning I don’t think it’s worth the trouble excluding bokeh 3.2.1 from installing with Distributed though if others feel differently please say so. Closing

👍 Should be Monday

That message indicates a usage error. Creating a CDS with different column lengths has never been supported, but I think a hard error was only added recently (previously you’d just get silent failures / undefined behavior). I don’t think that check went in with 3.2.1 though (very little went into 3.2.1 – https://github.com/bokeh/bokeh/milestone/82?closed=1).

If you need to update a CDS with new data that has a different length, then you need to update the entire .data at once:

new_data_dict = ... # new dict with all new columns the same length

source.data = new_data_dict # "atomic" update