pandas: BUG: Unclear FutureWarning regarding inplace iloc setitem

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np, pandas as pd
values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values, columns=["a", "b"])
new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new

Issue Description

FutureWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals)

This is confusing because I did not do df.iloc, I did df.loc. In the release notes, the subsection header mentions .loc, but the text only talks about .iloc.

Additionally, it was very difficult to put together a reproducible example, until I found a related issue demonstrating that it matters whether the old/new series have different dtypes. This is reasonably clear from the release notes themselves, but not the warning message.

Expected Behavior

I assume that this change does affect both .loc and .iloc so the warning message could be updated to be more clear, but in the event it’s a false alarm on .loc, it would be good to suppress it.

The warning message could also be a little bit more clear about why the warning got triggered (even if in a general sense).

Installed Versions

INSTALLED VERSIONS

commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763 python : 3.10.0.final.0 python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:20:07 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T8110 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.5.0 numpy : 1.23.3 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 63.4.1 pip : 22.1.2 Cython : None pytest : 7.1.3 hypothesis : None sphinx : 5.1.1 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.5.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.0 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.9.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 11
  • Comments: 34 (23 by maintainers)

Most upvoted comments

I’m not sure that’s a sufficient change @jbrockmendel because df.isetitem will not work in that case and there is no corresponding df.setitem. I also suspect it’s possible to get into a situation where if you do df[col] = data you get a SettingWithCopyWarning telling you to do df.loc[:, col] = data, but then when you do df.loc[:, col], you’ll get a warning telling you to do df[col] = data. I think people will find that confusing.

What’s the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

I have a script that runs df.loc[i:ii, c] = newvals in a loop over a number of dataframes and only some of the cases emits this warning, but in all cases values are set correctly.

I also don’t see how either df[df.columns[i]] = newvals or, df.isetitem(i, newvals) can be a substitute for either df.loc[i:ii, c] or df.iloc[i:ii, c]

… as I read the discussion here and in the PR, there doesn’t really seem to be a consensus about what to do with this warning. (Remove it, change it, …)

They must undo this warning because it creates highly undesirable consequences.

Next, they should book and complete an Ayahuasca therapy session because they obviously cannot resolve this issue at their current level of conciousness and understanding.

this issue is quite annoying…

@max0x7ba just out of curiosity, how much are you paying for this massively complex, 3.5k-issue, OSS that you are totally entitled to SLAs and world-class service on?

It’s a warning. Relax or find a different tool.

What’s the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

Definitely open to suggestions for better wording. Let me try to explain using the OP example:

import numpy as np
import pandas as pd

values = np.arange(4).reshape(2, 2)

df = pd.DataFrame(values, columns=["a", "b"])

At this point the DataFrame df is directly backed by the original values, so doing something like values[0, 0] = 99 would affect the values in df.

There are types of setting (e.g. df.iloc[0,0] = 11) that will be “inplace” and will edit the original values, and others (e.g. df["B"] = 42) that will create a new array and NOT edit the original values.

Unfortunately, because of [reasons], the existing behavior is not super-consistent in when we do inplace vs not-inplace. That’s why this particular case is deprecated, so in 2.0 we can make the behavior more consistent.

In particular, in your case:

new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new  # <- issues the warning about future behavior changing

In the current behavior, df.loc[:, "a"] = new is NOT inplace, but in the future it will be. The warning here is just in case you really care about inplace-vs-not, to keep the old behavior you need to do df["a"] = new instead.

Is that helpful? If you have suggestions to make the warning or docs clearer, please let us know.

I’m getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

Actually even looking at the release notes, I don’t think I understand exactly what is deprecated here. The relevant section starts with

Most of the time setting values with DataFrame.iloc() attempts to set values inplace, only falling back to inserting a new array if necessary.

But then it says

This behavior is deprecated. In a future version, setting an entire column with iloc will attempt to operate inplace.

Isn’t this what already happens (“most of the time”?) And if it’s about different dtypes, how will that work? Do you mean to say it will coerce the dtype when setting the data?

Closed by #50044

As an aside, I’d suggest enforcing the code of conduct more and banning toxic users, there’s nothing to be gained by having them around and they disincentivise people from contributing

@max0x7ba please refrain from such comments (the second paragraph), that’s not helpful at all

When a commit causes totally unnecessary consternation from the users, but the author of the commit doesn’t take any responsibility and keeps dragging his feet for 4 months by not resolving the otherwise unnecessary problem, you don’t get to choose the feedback you get, do you?

Get real and fix the problem promptly, please.

@jbrockmendel Yes, that is what I ended up doing, but that’s not what I was hoping. To the best of my knowledge, other warnings from pandas are things that I can take direct action upon to correct a potentially dangerous situation, or to prepare for a future change - making the appropriate changes in these cases silences the warnings.

In this case, it seems more informative than a true warning, at least in the case where I am OK with the setting being in-place in the future (which I am). I am OK with using the warnings filter, but I was hoping for some way to suppress it at the pandas level so downstream users don’t potentially see it as well.

Making it a deprecation warning is fine, imo. Then, as package developer, I can supress it from my logs once I have verified it’s not a problem and users won’t be bothered. I think the main problem right now is that end users get spammed with this warning for code that they have no control over and that might be perfectly fine.

yeah, you could use with tm.assert_produces_warning(None) for that (we should really run the whole test suite with -W error, but we’re not there yet, so for now let’s explicitly assert no warning is raised)

I’m getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

Me too, df.update() does:

self.loc[:, col] = expressions.where(mask, this, that)

The warning message essentially says that DataFrame.update currently does not update but will in the future versions, which doesn’t sound right.