pandas: [0.24.1] New nullable integer fillna with non-int doesn't coerce to object

Code Sample

import pandas as pd

sample_data = []

sample_data.append({"integer_column":None})
sample_data.append({"integer_column":1})
sample_data.append({"integer_column":2})

df = pd.DataFrame(sample_data)

# Previous type is object
# df.dtypes

df.loc[:,'integer_column'] = df.loc[:,'integer_column'].astype('Int64')

# Check new type is Int64, nullable
# df.dtypes

df.fillna('null_string')

Problem description

Using the new nullable type Int64, it is not possible to fill “NaN” values with other value.

Error raised

TypeError: <U11 cannot be converted to an IntegerDtype

Expected Output

The new dataframe should have replaced it’s NaN values with the desired input of .fillna() method.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 85 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.24.1 pytest: 3.3.2 pip: 9.0.1 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.1.2 openpyxl: 2.5.12 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml.etree: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.1.18 pymysql: None psycopg2: None jinja2: 2.8.1 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 3
  • Comments: 22 (15 by maintainers)

Commits related to this issue

Most upvoted comments

I would assume that .fillna() would coerce the series into being of type object when I am trying to fill it with an object. Just like for example when adding a float to it coerces it into being of type float:

>>> pd.Series([1, 2, None], dtype='Int64') + 0.5
0    1.5
1    2.5
2    NaN
dtype: float64

However;

>>> pd.Series([1, 2, None], dtype='Int64').fillna('')
TypeError: <U1 cannot be converted to an IntegerDtype

i brought it up here: https://github.com/pandas-dev/pandas/issues/25288#issuecomment-592917095

i still think it makes more sense to be able to just use .fillna() without explicitly casting first. as i stated earlier in this thread, this is default behaviour for .fillna() on other dtypes, and it also is a logical step when for example adding a float to an Int64.

what you are suggesting from a user perspective, is that now sometimes i can .fillna() directly, and sometimes i will have to cast + fill. as a user i would feel more for consistent behaviour of .fillna().

Because I need to fill the <NA> values with something else. Just like when adding a float to an integer it becomes a float. And like I said, this is already .fillna()'s current behaviour on floats.

But I also understand there is something to say for not doing so. Maybe a boolean argument such as coerce=True would be a solution?

This works fine if you use an actual integer value to fill, so there’s not really much of a point to using Int64 in this case since you’re still asking for an object.

In any case I suppose it should still coerce to object for you like using float here would. Investigation and PRs are always welcome

@alexreg notwithstanding Joris’s (reasonable) objection, the place where you would change the behavior is in ExtensionBlock.fillna. either a try/except around values = self.values.fillna or a better implementation of _can_hold_element

I’ll get to it eventually if no one else does, but it isn’t a priority for me ATM.

This behaviour is also displayed when using .fillna() on a series of floats using a string:

>>> pd.Series([1, 2, None], dtype='float64').fillna('')
0    1
1    2
2     
dtype: object

In fact, since I’m using pandas for an ETL tool, this doesn’t look nice to me. Having to change the type to “object” inevitable adds “.0” after the integer number and breaks my code.

The alternative I used is to remove the “.0” part after “astype(object)” and fill it with NaNs values.

I vaguely recall some discussion on whether ExtensionArray.fillna should allow coercing the array to the dtype of the fill_value. I don’t recall if we reached a final conclusion. It’s somewhat inconvenient to have to manual .astype before filling with a different dtype, but the type stability ensured by ExtensionArray[T].fillna -> ExtensionArray[T] is nice.

On Tue, Feb 12, 2019 at 10:35 PM William Ayd notifications@github.com wrote:

This works fine if you use an actual integer value to fill, so there’s not really much of a point to using Int64 in this case since you’re still asking for an object.

In any case I suppose it should still coerce to object for you like using float here would. Investigation and PRs are always welcome

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/25288#issuecomment-463054425, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIi1zyiheZ19Ef7M62IDXIE5hr0Psks5vM5YbgaJpZM4a37EH .