pandas: [0.24.1] New nullable integer fillna with non-int doesn't coerce to object
Code Sample
import pandas as pd
sample_data = []
sample_data.append({"integer_column":None})
sample_data.append({"integer_column":1})
sample_data.append({"integer_column":2})
df = pd.DataFrame(sample_data)
# Previous type is object
# df.dtypes
df.loc[:,'integer_column'] = df.loc[:,'integer_column'].astype('Int64')
# Check new type is Int64, nullable
# df.dtypes
df.fillna('null_string')
Problem description
Using the new nullable type Int64, it is not possible to fill “NaN” values with other value.
Error raised
TypeError: <U11 cannot be converted to an IntegerDtype
Expected Output
The new dataframe should have replaced it’s NaN values with the desired input of .fillna() method.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 85 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None
pandas: 0.24.1 pytest: 3.3.2 pip: 9.0.1 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.1.2 openpyxl: 2.5.12 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml.etree: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.1.18 pymysql: None psycopg2: None jinja2: 2.8.1 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 3
- Comments: 22 (15 by maintainers)
Commits related to this issue
- BUG: permit str dtype -> IntegerDtype conversions Resolves #25472, resolves #25288. — committed to alexreg/pandas by alexreg 3 years ago
- BUG: permit str dtype -> IntegerDtype conversions Resolves #25472, resolves #25288. — committed to alexreg/pandas by alexreg 3 years ago
- BUG: permit str dtype -> IntegerDtype conversions Resolves #25472, resolves #25288. — committed to alexreg/pandas by alexreg 3 years ago
- BUG: permit str dtype -> IntegerDtype conversions Resolves #25472, resolves #25288. — committed to alexreg/pandas by alexreg 3 years ago
- BUG: permit str dtype -> IntegerDtype conversions Resolves #25472, #25288. — committed to alexreg/pandas by alexreg 3 years ago
- BUG: permit str dtype -> IntegerDtype conversions Resolves #25472, #25288. — committed to alexreg/pandas by alexreg 3 years ago
I would assume that
.fillna()would coerce the series into being of type object when I am trying to fill it with an object. Just like for example when adding a float to it coerces it into being of type float:However;
i brought it up here: https://github.com/pandas-dev/pandas/issues/25288#issuecomment-592917095
i still think it makes more sense to be able to just use
.fillna()without explicitly casting first. as i stated earlier in this thread, this is default behaviour for.fillna()on other dtypes, and it also is a logical step when for example adding a float to an Int64.what you are suggesting from a user perspective, is that now sometimes i can
.fillna()directly, and sometimes i will have to cast + fill. as a user i would feel more for consistent behaviour of.fillna().Because I need to fill the
<NA>values with something else. Just like when adding a float to an integer it becomes a float. And like I said, this is already.fillna()'s current behaviour on floats.But I also understand there is something to say for not doing so. Maybe a boolean argument such as
coerce=Truewould be a solution?This works fine if you use an actual integer value to fill, so there’s not really much of a point to using Int64 in this case since you’re still asking for an object.
In any case I suppose it should still coerce to object for you like using float here would. Investigation and PRs are always welcome
@alexreg notwithstanding Joris’s (reasonable) objection, the place where you would change the behavior is in
ExtensionBlock.fillna. either a try/except aroundvalues = self.values.fillnaor a better implementation of _can_hold_elementI’ll get to it eventually if no one else does, but it isn’t a priority for me ATM.
This behaviour is also displayed when using
.fillna()on a series of floats using a string:In fact, since I’m using pandas for an ETL tool, this doesn’t look nice to me. Having to change the type to “object” inevitable adds “.0” after the integer number and breaks my code.
The alternative I used is to remove the “.0” part after “astype(object)” and fill it with NaNs values.
I vaguely recall some discussion on whether ExtensionArray.fillna should allow coercing the array to the dtype of the
fill_value. I don’t recall if we reached a final conclusion. It’s somewhat inconvenient to have to manual.astypebefore filling with a different dtype, but the type stability ensured byExtensionArray[T].fillna -> ExtensionArray[T]is nice.On Tue, Feb 12, 2019 at 10:35 PM William Ayd notifications@github.com wrote: