pandas: incosistencies in dataframe.dtype.isin() - I am getting inconsistent with different runs on same code.
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas
data = [[2, 'tom', 10], [3, 'nick', 15], [4, 'juli', 14]]
myDf = pandas.DataFrame(data, columns=['a','b','c'])
print (myDf.dtypes)
print((myDf.dtypes.isin(['int64'])))
Problem description
Problem is that I am getting inconsistent results based on running the same code:
So first run gives me:
a False
b False
c False
and second (sometimes it takes up to 10 attempts) run gives me:
a True
b False
c True
the second case is the truth and matches the myDf.dtypes:
a int64
b object
c int64
.
Again to reiterate, I am not changing the code but when I keep running the same code I sometime get the first output and sometime get the second output. This is while I do not change the input, code or interpreter. I tried it on different machines and interpreters and still getting the same inconsistent result
the version of Python 3.7 the version of Pandas: 1.0
Expected Output
Consistent result (as it sometimes matches with the current output).
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 19.3.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8
pandas : 1.0.0 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 19.0.3 setuptools : 40.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 20 (13 by maintainers)
This is https://github.com/numpy/numpy/issues/7242
Thanks! The code sets PYTHONHASHSEED and then verifies that with this set, the result is deterministic. When the value of the hash seed is 1, 5, 6, 13, or 15 you get the
True False True
pattern and otherwise you getFalse False False
(for seeds up to 20). The trick is that you must set this seed before starting up the Python interpreter, which is why you need to usesubprocess.call
.