pandas: pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError
Code Sample, a copy-pastable example if possible
For:
result = a2b.loc[vals] # pd.Series()[ np.array ]
If a2b is a series that maps {int64:int64}
and vals is an int64
array, the result should be a series that maps {int64:int64}
, or a KeyError should be thrown
Pasteable repo:
import pandas as pd
import numpy as np
a2b = pd.Series(
index = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
9725301000001109, 9725601000001103, 9725801000001104,
9730701000001104, 10049011000001109, 10328511000001105]),
data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
999000011000001104, 999000011000001104, 999000011000001104,
999000011000001104, 999000011000001104, 999000011000001104])
)
assert a2b.dtype==np.int64
assert a2b.index.dtype==np.int64
key = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
9725301000001109, 9725601000001103, 9725801000001104,
9730701000001104,
10047311000001102, # Misin in a2b.index
10049011000001109,
10328511000001105])
result = a2b.loc[key]
result
assert result.dtype==np.int64
assert result.index.dtype==np.int64
What happens:
In [2]: import pandas as pd
...: import numpy as np
...: a2b = pd.Series(
...: index = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
...: 9725301000001109, 9725601000001103, 9725801000001104,
...: 9730701000001104, 10049011000001109, 10328511000001105]),
...: data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
...: 999000011000001104, 999000011000001104, 999000011000001104,
...: 999000011000001104, 999000011000001104, 999000011000001104])
...: )
...: assert a2b.dtype==np.int64
...: assert a2b.index.dtype==np.int64
...: key = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
...: 9725301000001109, 9725601000001103, 9725801000001104,
...: 9730701000001104,
...: 10047311000001102, # Misin in a2b.index
...: 10049011000001109,
...: 10328511000001105])
...: result = a2b.loc[key]
...: result
...:
Out[2]:
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
NaN NaN
9.990000e+17 NaN
9.990000e+17 NaN
dtype: float64
In [3]: assert result.dtype==np.int64
...: assert result.index.dtype==np.int64
...:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-3-be86ec17a393> in <module>()
----> 1 assert result.dtype==np.int64
2 assert result.index.dtype==np.int64
AssertionError:
Problem description
I don’t like this behavior because:
- I have quietly lost all my data due to cast to float64
- in other calls to getitem a KeyError is raised if a value is not found in the index.
Expected Output
Asserts should not fail.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
In [4]: pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.15.candidate.1 python-bits: 64 OS: Linux OS-release: 4.15.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None
pandas: 0.22.0 pytest: None pip: 18.1 setuptools: 40.6.2 Cython: 0.29.1 numpy: 1.16.1 scipy: 1.2.0 pyarrow: None xarray: None IPython: 5.0.0 sphinx: None patsy: 0.5.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: 2.6.8 feather: None matplotlib: 2.1.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.2.17 pymysql: None psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (10 by maintainers)
I see the
KeyError
with__getitem__
, but the message given when tryinga2b.loc[key]
indicates that, while no error is thrown now, it will be in the future. It seems to me that, while the current behavior is not ideal, it is expected.But I’m still trying to sleuth out whether there’s another issue here. @jreback do you mean that because one of the labels in
key
is not ina2b
, then that label shouldn’t show up in the index ofa2b.loc[key]
?use .iloc as that is what is designed for selecting by position as the docs indicate: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selection-by-position
getitem it falling back here as described http://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#miscellaneous-indexing-faq
this is as expected behavior and is not likely to change