pandas: incosistencies in dataframe.dtype.isin() - I am getting inconsistent with different runs on same code.

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas

data = [[2, 'tom', 10], [3, 'nick', 15], [4, 'juli', 14]]

myDf = pandas.DataFrame(data, columns=['a','b','c'])

print (myDf.dtypes)

print((myDf.dtypes.isin(['int64'])))

Problem description

Problem is that I am getting inconsistent results based on running the same code:

So first run gives me:

a  False
b  False
c  False

and second (sometimes it takes up to 10 attempts) run gives me:

a  True
b  False
c  True

the second case is the truth and matches the myDf.dtypes:

a     int64
b    object
c     int64

Again to reiterate, I am not changing the code but when I keep running the same code I sometime get the first output and sometime get the second output. This is while I do not change the input, code or interpreter. I tried it on different machines and interpreters and still getting the same inconsistent result

the version of Python 3.7 the version of Pandas: 1.0

Expected Output

Consistent result (as it sometimes matches with the current output).

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line] INSTALLED VERSIONS

commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 19.3.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8

pandas : 1.0.0 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 19.0.3 setuptools : 40.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 1
Comments: 20 (13 by maintainers)

Most upvoted comments

This is https://github.com/numpy/numpy/issues/7242

rhshadrach on Oct 31, 2023

Thanks! The code sets PYTHONHASHSEED and then verifies that with this set, the result is deterministic. When the value of the hash seed is 1, 5, 6, 13, or 15 you get the True False True pattern and otherwise you get False False False (for seeds up to 20). The trick is that you must set this seed before starting up the Python interpreter, which is why you need to use subprocess.call.