arrow: [Python] Windows fatal exception: access violation
Describe the bug, including details regarding any error messages, version, and platform.
Hi,
When using the pyarrow flight client, I have a user who occasionally sees a Windows fatal exception error. This involves a query with multiple subqueries across many fields. I do have access to the environment and can reproduce. We have found that there is some sort of correlation between the number of fields and the exception occurring. As we decrease the number of fields the issue can occur less and less consistently.
I realize that getting an issue without exact steps to reproduce is unhelpful. However, I am more than willing to try out test builds or build a customer version to gather more details if I can get some guidance.
I was able to easily build a custom version on Linux per the dev docs, but I tried building a custom pyarrow on Windows and ran into issues right away with detection of the compiler. I have my steps and logs below.
Observations
- This only occurs on Windows 10 or 11; the same query runs fun on Linux/macOS
- This only occurs when running as a Python notebook, running as a script works
- It reproduces with both Python 3.11 and 3.12
- Issues occurs with both a pip-only or conda environment
- Disabling all virus or Windows security detection does not help
- A windows event occurs calling out
arrow_flight.dll
Windows Event Log Message
Faulting application name: python3.12.exe, version: 3.12.1150.1013, time stamp: 0x6572422a
Faulting module name: arrow_flight.dll, version: 0.0.0.0, time stamp: 0x65a69ccb
Exception code: 0xc0000005
Fault offset: 0x00000000002dc6b0
Faulting process id: 0x0x4F8
Faulting application start time: 0x0x1DA55FAF308D836
Faulting application path: C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.496.0_x64__qbz5n2kfra8p0\python3.12.exe
Faulting module path: C:\Users\powersj\v3-ear\.venv\Lib\site-packages\pyarrow\arrow_flight.dll
Report Id: f8313105-2c59-4f1a-a8a6-a4227a8ae7d9
Faulting package full name: PythonSoftwareFoundation.Python.3.12_3.12.496.0_x64__qbz5n2kfra8p0
Faulting package-relative application ID: Python
Code
import json
import certifi
from pyarrow.flight import FlightClient, Ticket, FlightCallOptions
import faulthandler
faulthandler.enable()
host = "host"
token = "token"
database = "db"
with open(certifi.where(), "r", encoding="utf-8") as f_cert:
cert = f_cert.read()
with open("kernel-crash.sql", "r", encoding="utf-8") as f_sql:
query = f_sql.read()
options = FlightCallOptions(**{
"headers": [(b"authorization", f"Bearer {token}".encode('utf-8'))],
"timeout": 300
})
ticket_data = {
"database": database,
"sql_query": query,
"query_type": "sql",
}
ticket = Ticket(json.dumps(ticket_data).encode('utf-8'))
with FlightClient(f"grpc+tls://{host}:443", tls_root_certs=cert) as client:
reader = client.do_get(ticket, options)
print(reader.read_all())
Traceback
Windows fatal exception: access violation
Thread 0x000026a8 (most recent call first):
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\parentpoller.py", line 93 in run
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap
Thread 0x00002700 (most recent call first):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 355 in wait
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 655 in wait
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\history.py", line 894 in run
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\history.py", line 60 in only_when_enabled
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\decorator.py", line 232 in fun
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap
Thread 0x00002620 (most recent call first):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 314 in _select
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 323 in select
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1947 in _run_once
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\control.py", line 23 in run
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap
Thread 0x00001ba8 (most recent call first):
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\heartbeat.py", line 106 in run
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap
Thread 0x00001d80 (most recent call first):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 314 in _select
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 323 in select
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1947 in _run_once
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\iostream.py", line 92 in _thread_main
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1010 in run
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap
Current thread 0x000025e0 (most recent call first):
File "C:\Users\powersj\AppData\Local\Temp\ipykernel_9720\769077188.py", line 26 in <module>
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3553 in run_code
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3493 in run_ast_nodes
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3311 in run_cell_async
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\async_helpers.py", line 129 in _pseudo_sync_runner
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3106 in _run_cell
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3051 in run_cell
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\zmqshell.py", line 549 in run_cell
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 446 in do_execute
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 775 in execute_request
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 359 in execute_request
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 437 in dispatch_shell
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 531 in process_one
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 542 in dispatch_queue
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\events.py", line 88 in _run
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1985 in _run_once
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelapp.py", line 739 in start
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\traitlets\config\application.py", line 1075 in launch_instance
File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel_launcher.py", line 17 in <module>
File "<frozen runpy>", line 88 in _run_code
File "<frozen runpy>", line 198 in _run_module_as_main
System Information
$ python --version
Python 3.11.8
(venv)
$ pip list
Package Version
----------------- --------
asttokens 2.4.1
certifi 2024.2.2
colorama 0.4.6
comm 0.2.1
debugpy 1.8.1
decorator 5.1.1
executing 2.0.1
ipdb 0.13.13
ipykernel 6.29.2
ipython 8.21.0
jedi 0.19.1
jupyter_client 8.6.0
jupyter_core 5.7.1
matplotlib-inline 0.1.6
nest-asyncio 1.6.0
numpy 1.26.4
packaging 23.2
parso 0.8.3
pip 23.3.1
platformdirs 4.2.0
prompt-toolkit 3.0.43
psutil 5.9.8
pure-eval 0.2.2
pyarrow 15.0.0
Pygments 2.17.2
python-dateutil 2.8.2
pywin32 306
pyzmq 25.1.2
setuptools 69.0.2
six 1.16.0
stack-data 0.6.3
tornado 6.4
traitlets 5.14.1
wcwidth 0.2.13
wheel 0.42.0
When using conda:
C:\Users\powersj>conda info
active environment : None
user config file : C:\Users\powersj\.condarc
populated config files :
conda version : 23.11.0
conda-build version : not installed
python version : 3.11.5.final.0
solver : libmamba (default)
virtual packages : __archspec=1=x86_64
__conda=23.11.0=0
__win=0=0
base environment : C:\ProgramData\miniconda3 (read only)
conda av data dir : C:\ProgramData\miniconda3\etc\conda
conda av metadata url : None
channel URLs : https://repo.anaconda.com/pkgs/main/win-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/win-64
https://repo.anaconda.com/pkgs/r/noarch
https://repo.anaconda.com/pkgs/msys2/win-64
https://repo.anaconda.com/pkgs/msys2/noarch
package cache : C:\ProgramData\miniconda3\pkgs
C:\Users\powersj\.conda\pkgs
C:\Users\powersj\AppData\Local\conda\conda\pkgs
envs directories : C:\Users\powersj\.conda\envs
C:\ProgramData\miniconda3\envs
C:\Users\powersj\AppData\Local\conda\conda\envs
platform : win-64
user-agent : conda/23.11.0 requests/2.31.0 CPython/3.11.5 Windows/10 Windows/10.0.22621 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.3
administrator : False
netrc file : None
offline mode : False
Build Attempt
C:\Users\powersj>conda create -y -n pyarrow-dev -c conda-forge ^
More? --file arrow\ci\conda_env_cpp.txt ^
More? --file arrow\ci\conda_env_python.txt ^
More? --file arrow\ci\conda_env_gandiva.txt ^
More? python=3.11
<snip>
C:\Users\powersj>conda activate pyarrow-dev
(pyarrow-dev) C:\Users\powersj>set ARROW_HOME=%CONDA_PREFIX%\Library
(pyarrow-dev) C:\Users\powersj>mkdir arrow\cpp\build
(pyarrow-dev) C:\Users\powersj>pushd arrow\cpp\build
(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>cmake -G "Ninja" ^
More? -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
More? -DCMAKE_UNITY_BUILD=ON ^
More? -DARROW_COMPUTE=ON ^
More? -DARROW_CSV=ON ^
More? -DARROW_CXXFLAGS="/WX /MP" ^
More? -DARROW_DATASET=ON ^
More? -DARROW_FILESYSTEM=ON ^
More? -DARROW_HDFS=ON ^
More? -DARROW_JSON=ON ^
More? -DARROW_PARQUET=ON ^
More? -DARROW_WITH_LZ4=ON ^
More? -DARROW_WITH_SNAPPY=ON ^
More? -DARROW_WITH_ZLIB=ON ^
More? -DARROW_WITH_ZSTD=ON ^
More? -DARROW_FLIGHT=ON ^
More? ..
-- Building using CMake version: 3.28.3
-- The C compiler identification is Clang 17.0.6 with GNU-like command-line
-- The CXX compiler identification is unknown
CMake Error at C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang.cmake:170 (message):
The current configuration mixes Clang and MSVC or some other CL compatible
compiler tool. This is not supported. Use either clang or MSVC as both C,
C++ and/or HIP compilers.
Call Stack (most recent call first):
C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang.cmake:180 (__verify_same_language_values)
C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang-C.cmake:1 (include)
C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/CMakeCInformation.cmake:48 (include)
CMakeLists.txt:95 (project)
CMake Error at CMakeLists.txt:95 (project):
No CMAKE_CXX_COMPILER could be found.
Tell CMake where to find the compiler by setting either the environment
variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
to the compiler, or to the compiler name if it is in the PATH.
-- Configuring incomplete, errors occurred!
(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>
It is not clear to me what compiler I am suppose to use, either something from the conda environment or the locally installed one?
If I try setting via the CC and CXX env variables I get:
set CC=C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64\cl.exe
set CXX=C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64\cl.exe
<snip>
-- Building using CMake version: 3.28.3
-- The C compiler identification is MSVC 19.39.33519.0
-- The CXX compiler identification is MSVC 19.39.33519.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - broken
CMake Error at C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/CMakeTestCCompiler.cmake:67 (message):
The C compiler
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: 'C:/Users/powersj/arrow/cpp/build/CMakeFiles/CMakeScratch/TryCompile-j51cjy'
Run Build Command(s): C:/Users/powersj/.conda/envs/pyarrow-dev/Library/bin/ninja.exe -v cmTC_f4d4d
[1/2] C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\cl.exe /nologo /DWIN32 /D_WINDOWS /Zi /Ob0 /Od /RTC1 -MDd /showIncludes /FoCMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj /FdCMakeFiles\cmTC_f4d4d.dir\ /FS -c C:\Users\powersj\arrow\cpp\build\CMakeFiles\CMakeScratch\TryCompile-j51cjy\testCCompiler.c
[2/2] C:\WINDOWS\system32\cmd.exe /C "cd . && C:\Users\powersj\.conda\envs\pyarrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=CMakeFiles\cmTC_f4d4d.dir --rc=rc --mt=CMAKE_MT-NOTFOUND --manifests -- C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\link.exe /nologo CMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj /out:cmTC_f4d4d.exe /implib:cmTC_f4d4d.lib /pdb:cmTC_f4d4d.pdb /version:0.0 /machine:x64 /debug /INCREMENTAL /subsystem:console kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
FAILED: cmTC_f4d4d.exe
C:\WINDOWS\system32\cmd.exe /C "cd . && C:\Users\powersj\.conda\envs\pyarrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=CMakeFiles\cmTC_f4d4d.dir --rc=rc --mt=CMAKE_MT-NOTFOUND --manifests -- C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\link.exe /nologo CMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj /out:cmTC_f4d4d.exe /implib:cmTC_f4d4d.lib /pdb:cmTC_f4d4d.pdb /version:0.0 /machine:x64 /debug /INCREMENTAL /subsystem:console kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
RC Pass 1: command "rc /fo CMakeFiles\cmTC_f4d4d.dir/manifest.res CMakeFiles\cmTC_f4d4d.dir/manifest.rc" failed (exit code 0) with the following output:
The system cannot find the file specified
ninja: build stopped: subcommand failed.
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:95 (project)
-- Configuring incomplete, errors occurred!
(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>
Component(s)
Python
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 26 (17 by maintainers)
For Windows, you’ll want to use vcvarsall.bat or whatever the modern equivalent is, don’t muck with the env vars yourself. Also, possibly try the VS generator for CMake instead of Ninja.
I don’t have any clue about the crash itself. We would need a way to reproduce it.
You could also try downloading “Windbg Preview” from the Windows Store and running your script as
windbgx -g python myscript.py
to get a traceback.