pandas: pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError

Code Sample, a copy-pastable example if possible

For:


result = a2b.loc[vals]       # pd.Series()[ np.array ]

If a2b is a series that maps {int64:int64} and vals is an int64 array, the result should be a series that maps {int64:int64}, or a KeyError should be thrown

Pasteable repo:

       import pandas as pd
       import numpy as np
       a2b = pd.Series(
           index = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
                     9725301000001109,  9725601000001103,  9725801000001104,
                     9730701000001104, 10049011000001109, 10328511000001105]),
           data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
                        999000011000001104, 999000011000001104, 999000011000001104,
                        999000011000001104, 999000011000001104, 999000011000001104])
       )
       assert a2b.dtype==np.int64
       assert a2b.index.dtype==np.int64
       key = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
                             9725301000001109,  9725601000001103,  9725801000001104,
                             9730701000001104,
                             10047311000001102, # Misin in a2b.index
                             10049011000001109,
                             10328511000001105])
       result = a2b.loc[key]
       result
       assert result.dtype==np.int64
       assert result.index.dtype==np.int64

What happens:

In [2]:         import pandas as pd
   ...:         import numpy as np
   ...:         a2b = pd.Series(
   ...:             index = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
   ...:                       9725301000001109,  9725601000001103,  9725801000001104,
   ...:                       9730701000001104, 10049011000001109, 10328511000001105]),
   ...:             data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
   ...:                          999000011000001104, 999000011000001104, 999000011000001104,
   ...:                          999000011000001104, 999000011000001104, 999000011000001104])
   ...:         )
   ...:         assert a2b.dtype==np.int64
   ...:         assert a2b.index.dtype==np.int64
   ...:         key = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
   ...:                               9725301000001109,  9725601000001103,  9725801000001104,
   ...:                               9730701000001104,
   ...:                               10047311000001102, # Misin in a2b.index
   ...:                               10049011000001109,
   ...:                               10328511000001105])
   ...:         result = a2b.loc[key]
   ...:         result
   ...: 
Out[2]: 
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
NaN             NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
dtype: float64

In [3]:         assert result.dtype==np.int64
   ...:         assert result.index.dtype==np.int64
   ...: 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-be86ec17a393> in <module>()
----> 1 assert result.dtype==np.int64
      2 assert result.index.dtype==np.int64

AssertionError: 

Problem description

I don’t like this behavior because:

  • I have quietly lost all my data due to cast to float64
  • in other calls to getitem a KeyError is raised if a value is not found in the index.

Expected Output

Asserts should not fail.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] In [4]: pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.15.candidate.1 python-bits: 64 OS: Linux OS-release: 4.15.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.22.0 pytest: None pip: 18.1 setuptools: 40.6.2 Cython: 0.29.1 numpy: 1.16.1 scipy: 1.2.0 pyarrow: None xarray: None IPython: 5.0.0 sphinx: None patsy: 0.5.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: 2.6.8 feather: None matplotlib: 2.1.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.2.17 pymysql: None psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (10 by maintainers)

Most upvoted comments

I see the KeyError with __getitem__, but the message given when trying a2b.loc[key] indicates that, while no error is thrown now, it will be in the future. It seems to me that, while the current behavior is not ideal, it is expected.

But I’m still trying to sleuth out whether there’s another issue here. @jreback do you mean that because one of the labels in key is not in a2b, then that label shouldn’t show up in the index of a2b.loc[key]?

use .iloc as that is what is designed for selecting by position as the docs indicate: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selection-by-position

getitem it falling back here as described http://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#miscellaneous-indexing-faq

this is as expected behavior and is not likely to change