arrow: [Python] Windows fatal exception: access violation

Describe the bug, including details regarding any error messages, version, and platform.

Hi,

When using the pyarrow flight client, I have a user who occasionally sees a Windows fatal exception error. This involves a query with multiple subqueries across many fields. I do have access to the environment and can reproduce. We have found that there is some sort of correlation between the number of fields and the exception occurring. As we decrease the number of fields the issue can occur less and less consistently.

I realize that getting an issue without exact steps to reproduce is unhelpful. However, I am more than willing to try out test builds or build a customer version to gather more details if I can get some guidance.

I was able to easily build a custom version on Linux per the dev docs, but I tried building a custom pyarrow on Windows and ran into issues right away with detection of the compiler. I have my steps and logs below.

Observations

  1. This only occurs on Windows 10 or 11; the same query runs fun on Linux/macOS
  2. This only occurs when running as a Python notebook, running as a script works
  3. It reproduces with both Python 3.11 and 3.12
  4. Issues occurs with both a pip-only or conda environment
  5. Disabling all virus or Windows security detection does not help
  6. A windows event occurs calling out arrow_flight.dll

Windows Event Log Message

Faulting application name: python3.12.exe, version: 3.12.1150.1013, time stamp: 0x6572422a
Faulting module name: arrow_flight.dll, version: 0.0.0.0, time stamp: 0x65a69ccb
Exception code: 0xc0000005
Fault offset: 0x00000000002dc6b0
Faulting process id: 0x0x4F8
Faulting application start time: 0x0x1DA55FAF308D836
Faulting application path: C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.496.0_x64__qbz5n2kfra8p0\python3.12.exe
Faulting module path: C:\Users\powersj\v3-ear\.venv\Lib\site-packages\pyarrow\arrow_flight.dll
Report Id: f8313105-2c59-4f1a-a8a6-a4227a8ae7d9
Faulting package full name: PythonSoftwareFoundation.Python.3.12_3.12.496.0_x64__qbz5n2kfra8p0
Faulting package-relative application ID: Python

Code

import json
import certifi

from pyarrow.flight import FlightClient, Ticket, FlightCallOptions

import faulthandler
faulthandler.enable()

host = "host"
token = "token"
database = "db"

with open(certifi.where(), "r", encoding="utf-8") as f_cert:
    cert = f_cert.read()

with open("kernel-crash.sql", "r", encoding="utf-8") as f_sql:
    query = f_sql.read()

options = FlightCallOptions(**{
    "headers": [(b"authorization", f"Bearer {token}".encode('utf-8'))],
    "timeout": 300
})
ticket_data = {
    "database": database,
    "sql_query": query,
    "query_type": "sql",
}
ticket = Ticket(json.dumps(ticket_data).encode('utf-8'))
with FlightClient(f"grpc+tls://{host}:443", tls_root_certs=cert) as client:
    reader = client.do_get(ticket, options)
    print(reader.read_all())

Traceback

Windows fatal exception: access violation

Thread 0x000026a8 (most recent call first):
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\parentpoller.py", line 93 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00002700 (most recent call first):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 355 in wait
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 655 in wait
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\history.py", line 894 in run
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\history.py", line 60 in only_when_enabled
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\decorator.py", line 232 in fun
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00002620 (most recent call first):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 314 in _select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 323 in select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1947 in _run_once
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\control.py", line 23 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00001ba8 (most recent call first):
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\heartbeat.py", line 106 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00001d80 (most recent call first):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 314 in _select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 323 in select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1947 in _run_once
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\iostream.py", line 92 in _thread_main
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1010 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Current thread 0x000025e0 (most recent call first):
  File "C:\Users\powersj\AppData\Local\Temp\ipykernel_9720\769077188.py", line 26 in <module>
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3553 in run_code
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3493 in run_ast_nodes
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3311 in run_cell_async
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\async_helpers.py", line 129 in _pseudo_sync_runner
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3106 in _run_cell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3051 in run_cell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\zmqshell.py", line 549 in run_cell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 446 in do_execute
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 775 in execute_request
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 359 in execute_request
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 437 in dispatch_shell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 531 in process_one
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 542 in dispatch_queue
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\events.py", line 88 in _run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1985 in _run_once
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelapp.py", line 739 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\traitlets\config\application.py", line 1075 in launch_instance
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel_launcher.py", line 17 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

System Information

$ python --version
Python 3.11.8
(venv)
$ pip list
Package           Version
----------------- --------
asttokens         2.4.1
certifi           2024.2.2
colorama          0.4.6
comm              0.2.1
debugpy           1.8.1
decorator         5.1.1
executing         2.0.1
ipdb              0.13.13
ipykernel         6.29.2
ipython           8.21.0
jedi              0.19.1
jupyter_client    8.6.0
jupyter_core      5.7.1
matplotlib-inline 0.1.6
nest-asyncio      1.6.0
numpy             1.26.4
packaging         23.2
parso             0.8.3
pip               23.3.1
platformdirs      4.2.0
prompt-toolkit    3.0.43
psutil            5.9.8
pure-eval         0.2.2
pyarrow           15.0.0
Pygments          2.17.2
python-dateutil   2.8.2
pywin32           306
pyzmq             25.1.2
setuptools        69.0.2
six               1.16.0
stack-data        0.6.3
tornado           6.4
traitlets         5.14.1
wcwidth           0.2.13
wheel             0.42.0

When using conda:

C:\Users\powersj>conda info

     active environment : None
       user config file : C:\Users\powersj\.condarc
 populated config files :
          conda version : 23.11.0
    conda-build version : not installed
         python version : 3.11.5.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=x86_64
                          __conda=23.11.0=0
                          __win=0=0
       base environment : C:\ProgramData\miniconda3  (read only)
      conda av data dir : C:\ProgramData\miniconda3\etc\conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\ProgramData\miniconda3\pkgs
                          C:\Users\powersj\.conda\pkgs
                          C:\Users\powersj\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\powersj\.conda\envs
                          C:\ProgramData\miniconda3\envs
                          C:\Users\powersj\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/23.11.0 requests/2.31.0 CPython/3.11.5 Windows/10 Windows/10.0.22621 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.3
          administrator : False
             netrc file : None
           offline mode : False

Build Attempt

C:\Users\powersj>conda create -y -n pyarrow-dev -c conda-forge ^
More?       --file arrow\ci\conda_env_cpp.txt ^
More?       --file arrow\ci\conda_env_python.txt ^
More?       --file arrow\ci\conda_env_gandiva.txt ^
More?       python=3.11

<snip>

C:\Users\powersj>conda activate pyarrow-dev

(pyarrow-dev) C:\Users\powersj>set ARROW_HOME=%CONDA_PREFIX%\Library

(pyarrow-dev) C:\Users\powersj>mkdir arrow\cpp\build

(pyarrow-dev) C:\Users\powersj>pushd arrow\cpp\build

(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>cmake -G "Ninja" ^
More?       -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
More?       -DCMAKE_UNITY_BUILD=ON ^
More?       -DARROW_COMPUTE=ON ^
More?       -DARROW_CSV=ON ^
More?       -DARROW_CXXFLAGS="/WX /MP" ^
More?       -DARROW_DATASET=ON ^
More?       -DARROW_FILESYSTEM=ON ^
More?       -DARROW_HDFS=ON ^
More?       -DARROW_JSON=ON ^
More?       -DARROW_PARQUET=ON ^
More?       -DARROW_WITH_LZ4=ON ^
More?       -DARROW_WITH_SNAPPY=ON ^
More?       -DARROW_WITH_ZLIB=ON ^
More?       -DARROW_WITH_ZSTD=ON ^
More?       -DARROW_FLIGHT=ON ^
More?       ..
-- Building using CMake version: 3.28.3
-- The C compiler identification is Clang 17.0.6 with GNU-like command-line
-- The CXX compiler identification is unknown
CMake Error at C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang.cmake:170 (message):
  The current configuration mixes Clang and MSVC or some other CL compatible
  compiler tool.  This is not supported.  Use either clang or MSVC as both C,
  C++ and/or HIP compilers.
Call Stack (most recent call first):
  C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang.cmake:180 (__verify_same_language_values)
  C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang-C.cmake:1 (include)
  C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/CMakeCInformation.cmake:48 (include)
  CMakeLists.txt:95 (project)


CMake Error at CMakeLists.txt:95 (project):
  No CMAKE_CXX_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


-- Configuring incomplete, errors occurred!

(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>

It is not clear to me what compiler I am suppose to use, either something from the conda environment or the locally installed one?

If I try setting via the CC and CXX env variables I get:

set CC=C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64\cl.exe
set CXX=C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64\cl.exe

<snip>
-- Building using CMake version: 3.28.3
-- The C compiler identification is MSVC 19.39.33519.0
-- The CXX compiler identification is MSVC 19.39.33519.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - broken
CMake Error at C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: 'C:/Users/powersj/arrow/cpp/build/CMakeFiles/CMakeScratch/TryCompile-j51cjy'

    Run Build Command(s): C:/Users/powersj/.conda/envs/pyarrow-dev/Library/bin/ninja.exe -v cmTC_f4d4d
    [1/2] C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\cl.exe  /nologo   /DWIN32 /D_WINDOWS  /Zi /Ob0 /Od /RTC1 -MDd /showIncludes /FoCMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj /FdCMakeFiles\cmTC_f4d4d.dir\ /FS -c C:\Users\powersj\arrow\cpp\build\CMakeFiles\CMakeScratch\TryCompile-j51cjy\testCCompiler.c
    [2/2] C:\WINDOWS\system32\cmd.exe /C "cd . && C:\Users\powersj\.conda\envs\pyarrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=CMakeFiles\cmTC_f4d4d.dir --rc=rc --mt=CMAKE_MT-NOTFOUND --manifests  -- C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\link.exe /nologo CMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj  /out:cmTC_f4d4d.exe /implib:cmTC_f4d4d.lib /pdb:cmTC_f4d4d.pdb /version:0.0 /machine:x64  /debug /INCREMENTAL /subsystem:console  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
    FAILED: cmTC_f4d4d.exe
    C:\WINDOWS\system32\cmd.exe /C "cd . && C:\Users\powersj\.conda\envs\pyarrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=CMakeFiles\cmTC_f4d4d.dir --rc=rc --mt=CMAKE_MT-NOTFOUND --manifests  -- C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\link.exe /nologo CMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj  /out:cmTC_f4d4d.exe /implib:cmTC_f4d4d.lib /pdb:cmTC_f4d4d.pdb /version:0.0 /machine:x64  /debug /INCREMENTAL /subsystem:console  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
    RC Pass 1: command "rc /fo CMakeFiles\cmTC_f4d4d.dir/manifest.res CMakeFiles\cmTC_f4d4d.dir/manifest.rc" failed (exit code 0) with the following output:
    The system cannot find the file specified
    ninja: build stopped: subcommand failed.





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:95 (project)


-- Configuring incomplete, errors occurred!

(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>

Component(s)

Python

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Comments: 26 (17 by maintainers)

Most upvoted comments

For Windows, you’ll want to use vcvarsall.bat or whatever the modern equivalent is, don’t muck with the env vars yourself. Also, possibly try the VS generator for CMake instead of Ninja.

I don’t have any clue about the crash itself. We would need a way to reproduce it.

You could also try downloading “Windbg Preview” from the Windows Store and running your script as windbgx -g python myscript.py to get a traceback.