snowflake-connector-python: SNOW-856569: Segmentation Fault in cache.py pickle dump
Python version
Python 3.8.12 (default, Nov 17 2021, 08:36:07) [Clang 7.1.0 (tags/RELEASE_710/final)]
Operating system and processor architecture
Linux x86_64
Installed packages
dbt-core==1.5.2
dbt-snowflake==1.5.2
snowflake-connector-python[secure-local-storage] == 3.0.3
What did you do?
Running DBT on our linux machines with more than 1 thread triggers a segmentation fault in snowflake-connector-python for versions >= 3.x.x We’re not seeing this issue locally on MacOS.
What did you expect to see?
The offending line is here and the truncated stack trace:
Fatal Python error: Segmentation fault
Stack (most recent call first):
File "pypi_snowflake_connector_python/site-packages/snowflake/connector/cache.py", line 511 in _save
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (4 by maintainers)
@sfc-gh-mkeller Glad the repro repo was useful.
I’ve tested with that repo against two commits - the fix from @peterallenwebb’s PR on this branch and the (at the time of writing this) tip of your current PR on this one. Didn’t get any segfaults in 20 runs for either of them via
for run in {1..20}; do make test; done > results.txt.I also did a quick test of 10 runs of
dbt compile --threads 30in ourdbtproject where we originally saw this error repro and also didn’t get any segfaults there.Wouldn’t call my testing alone conclusive, but on previous builds that almost certainly would’ve triggered at least one segfault, so I’d be optimistic about the fixes at least.
In PR #1635 I try to fix the underlying issue. I’d greatly appreciate it if you guys could help me test the code ❤️ I forked @verhey’s repo (thanks a ton for the repro!) to verify that this works with my changes, but I’d like to get some more organic testing to be done as well!
I was able to avoid the segfault by replacing these lines:
https://github.com/snowflakedb/snowflake-connector-python/blob/v3.0.4/src/snowflake/connector/cache.py#L527-L528
with
I tried this after noting that the faulting thread reliably had the same native stack trace:
That suggested a problem with the CPython implementation of pickle.dump(). Perhaps part of the issue is that the object being pickled is modified by another thread during serialization?
I verified the fix using the repro project (https://github.com/verhey/snowflake-thread-repro) created by @verhey. I was able to trigger the segfault every few runs with that method, and found that with the change above there was no segfault in over 20 runs.
@A132770 This python stack trace, particularly with the thread stopped at line 528 in cache.py, fits the pattern exactly.
In our case we found it temporarily sufficient to roll back to before the cache change was implemented (2.7.9). Looking at some of our various containers looks like some slightly newer versions that include the cache change (e.g. 2.7.12) also did not manifest the segfaults – so it’s not quite clear to me at what version this issue started, and why it seemed to be ‘activated’ on 6/28. Hope this helps someone!
I did re-run my testing with your latest code in #1635 , @sfc-gh-mkeller, and found that the segfaults were fixed.
I can’t share our entire dbt project, but here’s an example repo that I’ve been able to get the error to repro in.