pandas: GOTCHA: You can't use dot notation to add a column to a DataFrame

I ran into a nasty gotcha I have not seen mentioned in the documentation. In general, you can refer to DataFrame columns like a dictionary key or like an object attribute. But the behavior is inconsistent. In particular, you can modify an existing column using dot notation, but you can’t add a new column.

The following example demonstrates.

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)

# this adds a column to df
df['three'] = df.two + 1

# this successfully modifies an existing column
df.one += 1

# this seems to work but does not add a column to df
df.four = df.two + 2

print df.three
print df.four
print df

Is this behavior unavoidable?

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 3
  • Comments: 21 (16 by maintainers)

Commits related to this issue

Most upvoted comments

see docs here: http://pandas-docs.github.io/pandas-docs-travis/indexing.html#attribute-access I guess an additional warning to that effect could be added.

This whole attribute access bizness if a bit of a hybrid, e.g. it WILL work if the column already exists, but not for a completely new column.

The reason is that it is VERY error prone; e.g. very easy to overwite an existing method (though this could raise). Furthermore people often want to simply assign an attribute e.g. df.name = 'foo' or whatever, so very tricky to figure out what is needed (e.g. is that an additional column?)

IMHO the assignment should be completely banned and only allow read-only access, but I may be in a minority view.

dupe of this issue: https://github.com/pydata/pandas/issues/6160