bokeh: [BUG] vbar_stack with DataFrame as source ignores entire row if first data column is NaN
Software versions
OS: Linux 5.4.6 Browser: Chrome 78.0.3904.108 (Official Build) (64-bit) Python: 3.8.1 JupyterLab: 1.2.4 Pandas: 0.25.3 Bokeh: 1.4.0
Issue
When plotting a vbar_stack, the entire row in the data source is ignored if the first input column contains a NaN in a pandas.DataFrame.
If a standard Python dict is used as a data source, the output is plotted as expected.
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
data = dict(index=[1, 2, 3, 4],
a=[4, 5, None, 7],
b=[9, None, 7, 6])
source = ColumnDataSource(data)
f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
width=0.5, color=["red", "blue"])
show(f)
source = ColumnDataSource(pd.DataFrame(index=[1, 2, 3, 4],
data=[dict(a=4, b=9),
dict(a=5, b=None),
dict(a=None, b=7),
dict(a=7, b=6),
]))
f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
width=0.5, color=["red", "blue"])
show(f)
Expected behavior with dict as data source

Unexpected behavior with DataFrame as data source

Mitigation
I currently let Pandas fill the NaNs with zeros.
Thanks for Bokeh, guys!
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 18 (11 by maintainers)
@pohlt that case (“always in a data frame”) I think we can probably make consistent. Earlier I was really referring to the list vs data frame differences, which cannot always be bridged.
FWIW I think I’d start trying to “make things work” as in the vega case. Otherwise we are looking at: