seaborn: What to do with missing values in heatmaps

Running this snippet:

import seaborn as sns
import numpy as np
import pandas as pd

df = pd.DataFrame(data={'a': [1, 1, 1],
                        'b': [2, np.nan, 2],
                        'c': [3, 3, np.nan]})

sns.heatmap(df)

We get something like this:

missing-get-small

Missing values are happily assigned “the color of the minimum” in the color bar. I wonder if this is a good default or if missing values should be treated differently. I would go for “treating them differently by default” and then the question would be how (maybe having a missing_color parameter and issuing a warning if the color is in the colormap/colorbar).

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 9
  • Comments: 17 (10 by maintainers)

Most upvoted comments

As of now, you could manually mask them,

import seaborn as sns
import numpy as np
import pandas as pd

df = pd.DataFrame(data={'a': [1, 1, 1],
                        'b': [2, np.nan, 2],
                        'c': [3, 3, np.nan]})

mask = df.isnull()
sns.heatmap(df, mask=mask)

image (edit: added image)

Is this what you suggest should happen by default?

There’s a masking argument that works pretty well, but it could be improved in at least two respects. First, it could be useful to add an option that makes masked out cells the same color as the image background (this would be especially pleasing for heat maps of upper-triangular or lower-triangular matrices).

@jm-contreras I think this already happens? Perhaps you mean something different:

with sns.axes_style("white"):
    sns.heatmap(flights_missing, cmap=cmap, mask=flights_missing.isnull())

bbvgncetu9iaaaaasuvork5cyii

I agree that masking should happen by default; in fact, I don’t really see how the current behaviour would ever be useful as it either doesn’t affect things or produces misleading visualisations.