snakemake: --googlelifesciences segmentation fault

Snakemake version

Tested on 7.0.0, 6.15.5 and 6.15.0 Describe the bug

Segmentation fault (core dumped) when executing with --google-lifesciences. Logs

Minimal example

Snakefile:

rule all:
    input: expand("done{i}.txt", i=range(100) )

rule test:
    output: "done{i}.txt"
    shell: "echo hi > {output}"

Command line: snakemake --google-lifesciences --default-remote-prefix snake-test -j10 Additional context

Tried -j10 and -j100, to no effect, still same error.

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 52 (52 by maintainers)

Most upvoted comments

haha yes good observations indeed! I absolutely love using Google Cloud but the APIs are constantly moving targets and there are many ways to do the same thing. I try to choose and make the best decision for the time, but I suspect this also changes over time.

vsoch on Mar 3, 2022

That is my doing - the only logic is that the storage client directly worked really well (vs the other apis didn’t have established relible clients). I think the scopes are here: https://github.com/googleapis/python-storage/blob/c6bf78d608832d69031cccb7e4f252dd6241ade7/google/cloud/storage/client.py#L100-L103 and akin to the others, the credentials are found in the environment, mentioned here https://github.com/googleapis/python-storage/blob/c6bf78d608832d69031cccb7e4f252dd6241ade7/google/cloud/storage/client.py#L76-L77 and under the hood (the base class of storage.Client) it’s doing the same thing https://github.com/googleapis/python-cloud-core/blob/24b7de49943a49e8235f9dbee6b32693deed8c1f/google/cloud/client/__init__.py#L178

Yea it totally makes sense to use the storage client. So it does seem that the storage client generates its own credentials from the environment. I guess it could be crosstalk? But seems unlikely as @CowanCS1 mentions. Gonna push some commits and test on a larger workflow and see if I can replicate.

cademirch on Mar 3, 2022

Thanks @cademirch 😃

I’ve tested this version, and can confirm that it eliminates the SIGSEGV, all of the exotic SIGABRT errors, and all of the SSL exceptions. Nice!

Unexpectedly, it is also eliminating another issue I was seeing with this test script where a subset of output files (5-10%) were either not present in cloud storage or present but not recognized by the job. I saw the latter type of MissingOutputExceptions frequently in my own pipeline. With this version, all of the jobs are present every time.

That’s actually all of the issues I was tracking, so hurrah! 👍

@vsoch Thanks for reaching out to get advice - this version is probably creating 10-20 connections per each of these quick jobs, but that would increase linearly with time due to the status checks. I hesitate to go fully into implementing a pool of connections, since I could imagine some edge cases like stale connections which we’d have to handle for minimal benefit compared to simpler solutions. My currently favored compromise is to create a single http connection for each call to _run and then maintain one for _wait_jobs, which would reduce the intial connection count 5-10x and eliminate the scaling with time. Since this version is working fine, I’ll wait to implement anything until you get feedback.

Cheers all

CowanCS1 on Mar 3, 2022

@CowanCS1 Running the MRE with -j 1 right now and it has not failed yet. Thanks for running the gdb btw.

cademirch on Mar 2, 2022

@vsoch I gave it a shot earlier today by trying to get gdb to attach the the running snakemake process, but had no success unfortunately. I’ll look into it more in the morning, though.

cademirch on Mar 2, 2022

Adding to @CowanCS1’s findings. I ran the test with faulthandler enabled and got the following traceback:

Fatal Python error: Segmentation fault

Current thread 0x00007fbbacfa2700 (most recent call first):
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 237 in parse_headers
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 339 in begin
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 1377 in getresponse
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1373 in _conn_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1421 in _request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1701 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 280 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 173 in new_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 190 in _retry_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 922 in execute
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/_helpers.py", line 131 in positional_wrapper
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 903 in _retry_request
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 951 in _wait_for_jobs
  File "/home/ubuntu/snakemake/snakemake/executors/__init__.py", line 756 in _wait_thread
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/threading.py", line 910 in run
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fbbb2873740 (most recent call first):
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/ssl.py", line 1099 in read
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/ssl.py", line 1241 in recv_into
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/socket.py", line 704 in readinto
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 281 in _read_status
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 320 in begin
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 1377 in getresponse
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1373 in _conn_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1421 in _request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1701 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 280 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 173 in new_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 190 in _retry_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 922 in execute
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/_helpers.py", line 131 in positional_wrapper
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 903 in _retry_request
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 832 in run
  File "/home/ubuntu/snakemake/snakemake/executors/__init__.py", line 153 in run_jobs
  File "/home/ubuntu/snakemake/snakemake/scheduler.py", line 588 in run
  File "/home/ubuntu/snakemake/snakemake/scheduler.py", line 540 in schedule
  File "/home/ubuntu/snakemake/snakemake/workflow.py", line 1097 in execute
  File "/home/ubuntu/snakemake/snakemake/__init__.py", line 714 in snakemake
  File "/home/ubuntu/snakemake/snakemake/__init__.py", line 2798 in main
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/bin/snakemake", line 33 in <module>

cademirch on Mar 1, 2022