arrow: [Python] BUG: Reading ORC segfaults on windows (if TZDIR isn't set)
Describe the bug, including details regarding any error messages, version, and platform.
orc on windows doesn’t exist for a long time yet in conda-forge, and we’ve only recently enabled it for the C++ portion of arrow. I tried to switch it on for pyarrow now as well in https://github.com/conda-forge/arrow-cpp-feedstock/pull/1086, and the test suite segfaults as soon as it gets to test_dataset.py::test_orc_format
stacktrace
[...]
test_dataset.py::test_ipc_format[threaded] PASSED [ 20%]
test_dataset.py::test_ipc_format[serial] PASSED [ 20%]
Fatal Python error: Aborted
Thread 0x00000ea0 (most recent call first):
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pyarrow\tests\test_dataset.py", line 261 in to_table
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pyarrow\tests\test_dataset.py", line 2991 in test_orc_format
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\python.py", line 194 in pytest_pyfunc_call
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\python.py", line 1799 in runtest
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\runner.py", line 169 in pytest_runtest_call
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\runner.py", line 262 in <lambda>
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\runner.py", line 341 in from_call
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\runner.py", line 261 in call_runtest_hook
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\runner.py", line 222 in call_and_report
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\runner.py", line 133 in runtestprotocol
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\runner.py", line 114 in pytest_runtest_protocol
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\main.py", line 348 in pytest_runtestloop
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\main.py", line 323 in _main
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\main.py", line 269 in wrap_session
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\main.py", line 316 in pytest_cmdline_main
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\config\__init__.py", line 166 in main
File "D:\bld\apache-arrow_1686428319811\_test_env\Lib\site-packages\_pytest\config\__init__.py", line 189 in console_main
File "D:\bld\apache-arrow_1686428319811\_test_env\Scripts\pytest-script.py", line 9 in <module>
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pyarrow._hdfsio, pyarrow._fs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pandas._libs.tslib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, fastparquet.cencoding, fastparquet.speedups, pyarrow.gandiva, pyarrow._acero, pyarrow._csv, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._dataset_parquet, pyarrow._orc, pyarrow._parquet_encryption, pyarrow._flight, pyarrow._substrait, _cffi_backend, pyarrow._pyarrow_cpp_tests, pyarrow._feather, pyarrow._json, numpy.linalg.lapack_lite, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, pyarrow_cython_example, bound_function_visit_strings (total: 104)
Tests failed for pyarrow-tests-12.0.0-py311h385a57a_8_cpu.conda - moving package to D:\bld\broken
WARNING:conda_build.build:Tests failed for pyarrow-tests-12.0.0-py311h385a57a_8_cpu.conda - moving package to D:\bld\broken
TESTS FAILED: pyarrow-tests-12.0.0-py311h385a57a_8_cpu.conda
Component(s)
Format, Packaging, Python
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 56 (56 by maintainers)
Commits related to this issue
- GH-36026: Fix ORC test segfault in the python wheel windows test — committed to wgtmac/arrow by wgtmac 3 months ago
- GH-36026: Fix ORC test segfault in the python wheel windows test — committed to wgtmac/arrow by wgtmac 3 months ago
- GH-36026: Fix ORC test segfault in the python wheel windows test — committed to wgtmac/arrow by wgtmac 3 months ago
- GH-36026: Fix ORC test segfault in the python wheel windows test — committed to wgtmac/arrow by wgtmac 3 months ago
- GH-36026: [Python] Fix ORC test segfault in the python wheel windows test (#40609) ### Rationale for this change The pyarrow orc reader always crashes when it tries to create an internal orc reader.... — committed to apache/arrow by wgtmac 3 months ago
- GH-36026: [C++][ORC] Check TZDB availability for ORC — committed to wgtmac/arrow by wgtmac 3 months ago
- ORC-1663: [C++] Enable TestTimezone.testMissingTZDB on Windows ### What changes were proposed in this pull request? Enable TestTimezone.testMissingTZDB unit test to run on Windows. ### Why are the ... — committed to apache/orc by wgtmac 3 months ago
- ORC-1663: [C++] Enable TestTimezone.testMissingTZDB on Windows ### What changes were proposed in this pull request? Enable TestTimezone.testMissingTZDB unit test to run on Windows. ### Why are the ... — committed to apache/orc by wgtmac 3 months ago
- GH-36026: [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) ### Rationale for this change When /usr/share/zoneinfo is unavailable and TZDIR env is unset, creating C++ ORC reader will crash... — committed to apache/arrow by wgtmac 3 months ago
- ORC-1684: [C++] Find tzdb without TZDIR when in conda-environments ### What changes were proposed in this pull request? Find tzdb without having to set `TZDIR` when in a conda-environment (where `tz... — committed to apache/orc by h-vetinari 3 months ago
- ORC-1684: [C++] Find tzdb without TZDIR when in conda-environments ### What changes were proposed in this pull request? Find tzdb without having to set `TZDIR` when in a conda-environment (where `tz... — committed to apache/orc by h-vetinari 3 months ago
I can confirm that setting
TZDIR
makes pyarrow built withPYARROW_WITH_ORC=1
pass the test suite also on windows. 🥳Thanks a lot for debugging this, great to have this finally sorted out!
How should we fix this though? I see that in #40609 you’re downloading the tzdb, but that’s not really viable for us in conda-forge. It would be good if pyarrow could automatically check
%CONDA_PREFIX%\share\zoneinfo
when looking for the tzdb (relative to the site-packages directory on windows, the path would be../../share/zoneinfo
).Thanks for the debugging!
I started a CI run that sets
TZDIR
, see https://github.com/conda-forge/arrow-cpp-feedstock/pull/1086/commits/fd6a4f60ba8b78c1696bef2df6e789e82af6b4e2. Given that the build phase passed without issue, I did not setTZDIR
in the build scripts, but I can do that as well if it helps.Hmm. I hope that the
Findlz4Alt.cmake
can be found byset(CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}")
inArrowConfig.cmake
but it seems that it doesn’t work…Anyway, this is not related to ORC. We can ignore this by removing
lz4Alt
fromARROW_SYSTEM_DEPENDENCIES
.ARROW_SYSTEM_DEPENDENCIES
is only needed for static linking and PyArrow uses shared linking.cpp/
build uses itsFindlz4Alt.cmake
(we don’t need to set search path) butpython/
build usesFindlz4Alt.cmake
installed bycmake --install
ofcpp/
(we need to set search path). (If my assumption is correct. 😃I don’t think so. It seems that this issue exists in the old 1.8.x releases as well.