pandas: CI: py 3.10 build failing
@seberg this build is using numpy 1.22dev, looks like a bunch of the failures are raising in np.iinfo(np.int64).max
return np.iinfo(np.int64).max
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <[AttributeError("'iinfo' object has no attribute 'kind'") raised in repr()] iinfo object at 0x7f986b9aa830>
int_type = <class 'numpy.int64'>
def __init__(self, int_type):
try:
self.dtype = numeric.dtype(int_type)
except TypeError:
> self.dtype = numeric.dtype(type(int_type))
E TypeError: 'numpy.dtype[bool_]' object is not callable
/opt/hostedtoolcache/Python/3.10.0-beta.2/x64/lib/python3.10/site-packages/numpy/core/getlimits.py:518: TypeError
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 4
- Comments: 24 (17 by maintainers)
Links to this issue
Commits related to this issue
- Skip cell due to bugs in pandas Only happen in python 3.10, so not going to force updating pandas version in requirements. Related https://github.com/pandas-dev/pandas/issues/41935 — committed to LineaLabs/lineapy by mingjerli 2 years ago
- Lin 621 migrate everything from demos repo to lineapy examples (#794) * Copy demos repo stories * Exclude self-hosting-lineapy notebooks from CI * Use zip artifact store file * Skip cell due to bu... — committed to LineaLabs/lineapy by mingjerli 2 years ago
- refresh main with changes after 0.2.0 - 0.2.1 (#804) * bump up the version * copying over the items from platform-demo * minor edits to the self-hosting-demo * Update README.md * Update R... — committed to LineaLabs/lineapy by lionsardesai 2 years ago
- Catch up main with changes from v0.2.1 (#806) * bump up the version * copying over the items from platform-demo * minor edits to the self-hosting-demo * Update README.md * Update README.m... — committed to LineaLabs/lineapy by lionsardesai 2 years ago
Aha, I think I have a lead… To cut it down more, this line is sufficient to trigger the issue:
And that should end up side-stepping almost all code in
maybe_convert_objects
.Not feeling like getting pandas dev setup running right now, but there is one line here:
which is called from cython using:
Now, that may be nothing, but
np.full
lives in thenumeric
module. And it does use adtype
which is the booleandtype
we end up with here. Obviously, that also should not mess with the module scope, but at least thenp.core.numeric
module gets involved there.EDIT: Continuing down the rabbit hole a bit. In fact the value is mutated by the time the trace function says that
np.full
is called (or by the time tracing reports it). No call before it seems to happen at all (np.empty
, etc. are all C implemented though, so maybe that is why).EDIT2: I opened a Python issue here: https://bugs.python.org/issue46451
I’ve run into this problem while trying to debug (in PyCharm) some code that uses pandas 1.3.5, and I was able to create a minimal reproducible example:
Note that pandas is changing the value of
numpy.core.numeric.dtype
, which originally is a class:If we comment out
sys.settrace(trace)
and debug the code, the output is a little bit different:If we uncomment
# import scipy.linalg.lapack
, the output is a little bit more complex (the first error I got, and similar to the error in the original report above, and also reported in this question in StackOverflow):Using a custom trace function, I’ve pinpointed that the global
dtype
is changed right after this call: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/indexes/base.py#L6411This call goes into Cython code and, looking at it, I’ve found this suspicious assignment that may be the cause (but I’m not sure, as there’s also the weird issue that this only happens when there’s a tracer or a profiler): https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L2655 (also another one here: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L1429).
(Also, not sure why would an assignment like that change a global in a
numpy
module, bug in Cython perhaps?)It doesn’t matter whether you have cython installed. It matters which cython was used for building pandas, scikit-learn, …
All of these packages need to update slowly so that you can avoid installing your own cython but still get the fix.
Great debugging there! Still utterly puzzling 😃. Just to note, I can reproduce the example in python3.10.1, but not python3.9.0 (Maybe we knew that long ago). Further, it does not matter whether I run python compiled for debugging.
valgrind
does not find anything (not that I would have expected that).So we know that this is sensitive to python3.10 and has to do with tracing being active? We also know it is probably related to Cython. And I feel I have heard about tricky changes in Python 3.10 that affected cython? It feels like it is probably time to open either a python or cython issue about this?
will be in the 1.4.3 release. discussion on release date in #46610
The important part is whether tracing is enabled (i.e. typically a debugger or profiler is being used). In that case you will run into this issue. Check also https://github.com/cython/cython/issues/4609
Basically, your options are to upgrade Cython (to the non-released version as of now), to use the Cython 3 alpha, or to use the correct compile time option to disable the faulty paths.
Mark Shannon asked for a repro, and I had another look and it seems like Cython generates somewhat complicated stuff (
PyEval...
). So moved it to cython/cython#4609, on the plus side, there is really nothing fancy about it and you can trivially reproduce this without pandas/numpy and just cython. (I still have no idea if it is Cython or Python going wrong.)@jbrockmendel I think I’ve figured it out. So it turns out that sys.setprofile, which is called in our tests for read_csv, is somehow changing the value of np.core.numeric.dtype. In #43910, where I skip this test, the Python 3.10 tests all pass.
One explanation might be that we are not resetting sys.setprofile back correctly, but the sys.setprofile(None) call should be the correct way to reset it back.
I will continue looking into this.
cc @mzeitlin11