pandas: BUG: inconsistent replace

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Problem description

>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
   0    1
0  1  5.0
1  2  2.0

>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1, 5)
   0    1
0  5  5.0
1  2  2.0

Problem description

Maybe I don’t understand somethink or this is just non-sens

Expected Output

>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
   0    1
0  1  5.0
1  2  2.0

>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1, 5)
   0    1
0  5  1.0
1  2  2.0

Or

>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
   0    1
0  5  5.0
1  2  2.0

>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1, 5)
   0    1
0  5  5.0
1  2  2.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-62-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.5 numpy : 1.19.0 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.2.0 Cython : None pytest : 5.4.3 hypothesis : None sphinx : 3.1.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.15.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : 0.6.2 lxml.etree : None matplotlib : 3.2.2 numexpr : 2.7.1 odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.17.1 pytables : None pytest : 5.4.3 pyxlsb : None s3fs : None scipy : 1.5.1 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : 0.50.1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21 (19 by maintainers)

Commits related to this issue

Most upvoted comments

the issue number, i.e. 35376

By replacing it with is_integer(element) or is_float(element) this case is covered.

is_float(element) is a start, but don’t we only want to include floats that are int-like? so (is_float(element) and element.is_integer()). to be even more careful we should check that casting to the target dtype is lossless

can we remove this lines without affecting the results ?

The line before the ones you quoted is if not self._can_hold_element(to_replace):, so we should only get here if the array doesn’t contain to_replace. If we are getting here with the example from the OP, that suggests a problem in _can_hold_element

the behaviour changed with #27768, so not intentional.

01f90c187f0eec0e8178371d7c066e600c9e105b is the first bad commit commit 01f90c187f0eec0e8178371d7c066e600c9e105b Author: jbrockmendel jbrockmendel@gmail.com Date: Mon Aug 12 11:58:42 2019 -0700

CLN: short-circuit case in Block.replace (#27768)

cc @jbrockmendel

0.25.3 was giving the expected output, so marking as regression for now pending further invesitigation on whether the change was intentional

>>> pd.__version__
'0.25.3'
>>>
>>> pd.DataFrame([[1, 1.0], [2, 2.0]]).replace(1.0, 5)
   0    1
0  5  5.0
1  2  2.0
>>>
>>> pd.DataFrame([[1, 1.0], [2, 2.0]]).replace(1, 5)
   0    1
0  5  5.0
1  2  2.0
>>>