seaborn: relplot error when using str as hue

When I am trying to use a string or categorical column within a dataframe as hue within a relplot we get the following error: AttributeError: 'str' object has no attribute 'view'

Minimal example:

test_df = pd.DataFrame(
[{"a": 0.0, "b": 1.0, "c": "1", "d": "1"},
 {"a": 0.0, "b": 1.0, "c": "2", "d": "2"}]
)
g = sns.relplot(x="a", y="b", col="c",  hue="d", data=test_df)

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 16
Comments: 22 (6 by maintainers)

Commits related to this issue

Merge branch 'issue-1515' of https://github.com/benlindsay/seaborn into issue_omnibus Closes #1515 — committed to mwaskom/seaborn by mwaskom 5 years ago

Most upvoted comments

This is my first time ever commenting on an open issue, so please be kind if I am wrong!

I was facing similar issues, till I referred to this post on stackoverflow

Problem: I have about 16 unique numerical str values as id (like: ‘215’ ‘112233’, …)
The fix: Use your palette size to be same as the unique numerical str values.
example: sns.lineplot(data=df, x='x', y='y', hue='id', palette=sns.color_palette("Set1", 16))
or sns.scatterplot(data=df, x='x', y='y', hue='id', palette=sns.color_palette("Set1", 16))

+29

teckwanikaran on Apr 11, 2019

I think the problem here is that seaborn is duck typing whether the hue variable is categorical or numeric based on whether it can be converted to float without erroring, but then using the original data for getting the colors. Passing numerics as strings is kind of a weird corner case, but it shouldn’t be impossible. There’s a similar issue in PairGrid (#1347) and so a good general solution is needed.

I think in the meantime, a workaround would be to provide explicit colors values to the palette (either as a list or dictionary) which should skip the numeric color-mapping code;

test_df = pd.DataFrame(
[{"a": 0.0, "b": 1.0, "c": "1", "d": "1"},
 {"a": 0.0, "b": 1.0, "c": "2", "d": "2"}]
)
g = sns.relplot(x="a", y="b", col="c",  hue="d", palette=["r", "b"], data=test_df)

+12

mwaskom on Jul 27, 2018

Reproducible examples for scatterplot:

df = pd.DataFrame({'x': [1,2], 'y': [3,4], 'hue': [10, 20]})
df.hue = df.hue.astype('category')
sns.scatterplot('x', 'y', hue='hue', data=df)

TypeError: data type not understood

df = pd.DataFrame({'x': [1,2], 'y': [3,4], 'hue': [10, 20]})
df.hue = df.hue.astype(str)
sns.scatterplot('x', 'y', hue='hue', data=df)

AttributeError: 'str' object has no attribute 'view'

gokceneraslan on Apr 2, 2019

Another inconsistent behavior that seems connected to this issue:

Case 1 (will generate a legend for [0, 1, 2])

df = pd.DataFrame({"Condition": np.random.choice([1, 2], 100),
                  "ScoreA": np.random.normal(0, 1, 100),
                  "ScoreB": np.random.normal(0, 1, 100)})
sns.relplot(x="ScoreA", y="ScoreB", hue="Condition", data=df)

Case 2 (will generate a legend for [0, 1, 2, 3])

df = pd.DataFrame({"Condition": np.random.choice([1, 2, 3], 100),
                  "ScoreA": np.random.normal(0, 1, 100),
                  "ScoreB": np.random.normal(0, 1, 100)})
sns.relplot(x="ScoreA", y="ScoreB", hue="Condition", data=df)

Case 3 (will generate a legend for [1, 2, 3, 4])

df = pd.DataFrame({"Condition": np.random.choice([1, 2, 3, 4], 100),
                  "ScoreA": np.random.normal(0, 1, 100),
                  "ScoreB": np.random.normal(0, 1, 100)})
sns.relplot(x="ScoreA", y="ScoreB", hue="Condition", data=df)

QuentinAndre on Nov 27, 2018

@teckwanikaran it works!

bhattaraiprabhat on Apr 25, 2019

I just ran into the same problem. I had multiple sets of data that were identified by a category, in my case, a date. ‘11_25_18’ Trying to use this in the ‘hue’ parameter gave an error. When I renamed the category ‘exp11_25_18,’ the error went away. To me, this feels like a bug.

danolson1 on Mar 8, 2019

@teckwanikaran thank you very much for sharing this! You saved my day!

I was facing this issue after I first tried to provide a numerical dataset as hue input. However, this turned out to produce not actually the expected result. It split the graph into three different categories, although the data provided contained 36 unique numerical values. Also, the three categories were completely “random”. They did not represent the input data at all. What is the intended behaviour for a numerical input?

VincentSch4rf on Dec 2, 2019

Leaving a note below an incorrect answer helps in such cases.

Anyways, you may not call it bug, but I still think it’s better not to let numpy decide whether the user wants categoricals or not.

I did propose something similar to add to matplotlib, which of course also has the problem of guessing what the user wants, but solves this a bit differently.

ImportanceOfBeingErnest on Jul 29, 2018

Related: As to why it may be good to have numerical strings: https://stackoverflow.com/questions/51525284/the-hue-parameter-in-seaborn-relplot-skips-an-integer-when-given-numerical-d

ImportanceOfBeingErnest on Jul 29, 2018