pandas: DataFrame.copy(deep=True) is not a deep copy of the index
Code Sample, a copy-pastable example if possible
df1 = pd.DataFrame(index=['a', 'b'], columns=['foo', 'muu'])
df1.index.name = "foo"
print(df1)
# create deep copy of df1 and change a value in the index
df2 = df1.copy(deep=True)
df2.index.name = "bar"
df2.index.values[0] = 'c' # changes both df1 and df2
print(df1)
print(df2)
Problem description
DataFrame.copy(deep=True) is not a deep copy of the index.
In
maybe deep should be set to True?
Expected Output
foo muu
foo
a NaN NaN
b NaN NaN
foo muu
foo
c NaN NaN
b NaN NaN
foo muu
bar
c NaN NaN
b NaN NaN
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-53-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.21.0 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.5.0.post20170921 Cython: 0.26.1 numpy: 1.13.1 scipy: 0.19.1 pyarrow: 0.8.0 xarray: 0.9.6 IPython: 6.1.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.8 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 0.9.8 lxml: 3.8.0 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.5.0
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 4
- Comments: 20 (12 by maintainers)
IMO,
copy(deep=True)
should completely sever all connections between the original and the copied object - compare the official python docs (https://docs.python.org/3/library/copy.html):So, IMO,
deep=True
should come to mean whatdeep='all'
does currently (and the latter can then be removed).Re:
This is not a valid argument IMO - it’s up to me as a user (consenting adults and all…) what I do with my objects, including the indexes, and if I make a deep copy, it’s a justified expectation (I would even argue: a built-in expectation of the word “deep”) that this will not mess with the original.
Plus, if I’m already deep-copying the much larger
values
of a DF, not copying theindex
only saves a comparatively irrelevant amount of memory.ok. I think the documentation of copy is unclear then:
Make a deep copy, including a copy of the data and the indices.
I also ran into this today, discovered that even if the id of the index was different on the copy, modifying the
cp.index.to_numpy()
values was corrupting the original.I am totally in line @DanielGoldfarb 's point 1:
A fix for this could be composed of the following elements:
have the
deep=True
behave (and be documented) as the intuition, that is, with absolutely no shared items between the copy and the original. Thedeep='all'
alias can stay around, but if it is not yet official maybe it should better be dropped now.accept a new
deep='values'
where only the values are deep-copied. This is therefore the same behaviour as today’s deep=True. Make this the default to preserve legacy compatibility and speed.optionally accept a new
deep=index
where only the index is deep-copied. I would not really know why this would be needed, but this is just for symmetry of the APIWould this be ok for everyone ?
The example looks to work on master. Could use a test