pandas: BUG: inconsistent replace
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Problem description
>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
0 1
0 1 5.0
1 2 2.0
>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1, 5)
0 1
0 5 5.0
1 2 2.0
Problem description
Maybe I don’t understand somethink or this is just non-sens
Expected Output
>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
0 1
0 1 5.0
1 2 2.0
>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1, 5)
0 1
0 5 1.0
1 2 2.0
Or
>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1.0, 5)
0 1
0 5 5.0
1 2 2.0
>>> pd.DataFrame([[1,1.0],[2,2.0]]).replace(1, 5)
0 1
0 5 5.0
1 2 2.0
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-62-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.0.5 numpy : 1.19.0 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.2.0 Cython : None pytest : 5.4.3 hypothesis : None sphinx : 3.1.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.15.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : 0.6.2 lxml.etree : None matplotlib : 3.2.2 numexpr : 2.7.1 odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.17.1 pytables : None pytest : 5.4.3 pyxlsb : None s3fs : None scipy : 1.5.1 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : 0.50.1
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (19 by maintainers)
Commits related to this issue
- BUG: is_float added to IntBlock._can_hold_element Added return is_integer(element) or is_float(element) to the IntBlock._can_hold_element method because an Block of ints can be replaced from int. Rea... — committed to QuentinN42/pandas by QuentinN42 4 years ago
- BUG: is_float(e) and e.is_integer() added to IntBlock._can_hold_element As @jbrockmendel said in #35376, you can replace an int by a float in an IntBlock only if the element is an integer. — committed to QuentinN42/pandas by QuentinN42 4 years ago
- replace test added Added the #35376 error as a test. — committed to QuentinN42/pandas by QuentinN42 4 years ago
the issue number, i.e. 35376
is_float(element)is a start, but don’t we only want to include floats that are int-like? so(is_float(element) and element.is_integer()). to be even more careful we should check that casting to the target dtype is losslessThe line before the ones you quoted is
if not self._can_hold_element(to_replace):, so we should only get here if the array doesn’t contain to_replace. If we are getting here with the example from the OP, that suggests a problem in_can_hold_elementthe behaviour changed with #27768, so not intentional.
01f90c187f0eec0e8178371d7c066e600c9e105b is the first bad commit commit 01f90c187f0eec0e8178371d7c066e600c9e105b Author: jbrockmendel jbrockmendel@gmail.com Date: Mon Aug 12 11:58:42 2019 -0700
cc @jbrockmendel
0.25.3 was giving the expected output, so marking as regression for now pending further invesitigation on whether the change was intentional