seaborn: relplot error when using str as hue

When I am trying to use a string or categorical column within a dataframe as hue within a relplot we get the following error: AttributeError: 'str' object has no attribute 'view'

Minimal example:

test_df = pd.DataFrame(
[{"a": 0.0, "b": 1.0, "c": "1", "d": "1"},
 {"a": 0.0, "b": 1.0, "c": "2", "d": "2"}]
)
g = sns.relplot(x="a", y="b", col="c",  hue="d", data=test_df)

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 16
  • Comments: 22 (6 by maintainers)

Commits related to this issue

Most upvoted comments

This is my first time ever commenting on an open issue, so please be kind if I am wrong!

I was facing similar issues, till I referred to this post on stackoverflow

  • Problem: I have about 16 unique numerical str values as id (like: ‘215’ ‘112233’, …)
  • The fix: Use your palette size to be same as the unique numerical str values.
  • example: sns.lineplot(data=df, x='x', y='y', hue='id', palette=sns.color_palette("Set1", 16))
  • or sns.scatterplot(data=df, x='x', y='y', hue='id', palette=sns.color_palette("Set1", 16))

I think the problem here is that seaborn is duck typing whether the hue variable is categorical or numeric based on whether it can be converted to float without erroring, but then using the original data for getting the colors. Passing numerics as strings is kind of a weird corner case, but it shouldn’t be impossible. There’s a similar issue in PairGrid (#1347) and so a good general solution is needed.

I think in the meantime, a workaround would be to provide explicit colors values to the palette (either as a list or dictionary) which should skip the numeric color-mapping code;

test_df = pd.DataFrame(
[{"a": 0.0, "b": 1.0, "c": "1", "d": "1"},
 {"a": 0.0, "b": 1.0, "c": "2", "d": "2"}]
)
g = sns.relplot(x="a", y="b", col="c",  hue="d", palette=["r", "b"], data=test_df)

Reproducible examples for scatterplot:

df = pd.DataFrame({'x': [1,2], 'y': [3,4], 'hue': [10, 20]})
df.hue = df.hue.astype('category')
sns.scatterplot('x', 'y', hue='hue', data=df)

TypeError: data type not understood

df = pd.DataFrame({'x': [1,2], 'y': [3,4], 'hue': [10, 20]})
df.hue = df.hue.astype(str)
sns.scatterplot('x', 'y', hue='hue', data=df)

AttributeError: 'str' object has no attribute 'view'

Another inconsistent behavior that seems connected to this issue:

Case 1 (will generate a legend for [0, 1, 2])

df = pd.DataFrame({"Condition": np.random.choice([1, 2], 100),
                  "ScoreA": np.random.normal(0, 1, 100),
                  "ScoreB": np.random.normal(0, 1, 100)})
sns.relplot(x="ScoreA", y="ScoreB", hue="Condition", data=df)

Case 2 (will generate a legend for [0, 1, 2, 3])

df = pd.DataFrame({"Condition": np.random.choice([1, 2, 3], 100),
                  "ScoreA": np.random.normal(0, 1, 100),
                  "ScoreB": np.random.normal(0, 1, 100)})
sns.relplot(x="ScoreA", y="ScoreB", hue="Condition", data=df)

Case 3 (will generate a legend for [1, 2, 3, 4])

df = pd.DataFrame({"Condition": np.random.choice([1, 2, 3, 4], 100),
                  "ScoreA": np.random.normal(0, 1, 100),
                  "ScoreB": np.random.normal(0, 1, 100)})
sns.relplot(x="ScoreA", y="ScoreB", hue="Condition", data=df)

I just ran into the same problem. I had multiple sets of data that were identified by a category, in my case, a date. ‘11_25_18’ Trying to use this in the ‘hue’ parameter gave an error. When I renamed the category ‘exp11_25_18,’ the error went away. To me, this feels like a bug.

@teckwanikaran thank you very much for sharing this! You saved my day!

I was facing this issue after I first tried to provide a numerical dataset as hue input. However, this turned out to produce not actually the expected result. It split the graph into three different categories, although the data provided contained 36 unique numerical values. Also, the three categories were completely “random”. They did not represent the input data at all. What is the intended behaviour for a numerical input?

Leaving a note below an incorrect answer helps in such cases.

Anyways, you may not call it bug, but I still think it’s better not to let numpy decide whether the user wants categoricals or not.

I did propose something similar to add to matplotlib, which of course also has the problem of guessing what the user wants, but solves this a bit differently.