grpc: Segmentation fault in python google cloud libraries

Please answer these questions before submitting your issue.

What version of gRPC and what language are you using?

I’m using Python. We are using several python gcloud libs: google-api-core==0.1.1 google-auth==1.1.1 google-cloud==0.29.0 google-cloud-bigquery==0.27.0 google-cloud-bigtable==0.28.1 google-cloud-core==0.27.1 google-cloud-datastore==1.4.0 google-cloud-dns==0.28.0 google-cloud-error-reporting==0.28.0 google-cloud-firestore==0.28.0 google-cloud-language==0.31.0 google-cloud-logging==1.4.0 google-cloud-monitoring==0.28.0 google-cloud-pubsub==0.29.0 google-cloud-resource-manager==0.28.0 google-cloud-runtimeconfig==0.28.0 google-cloud-spanner==0.29.0 google-cloud-speech==0.30.0 google-cloud-storage==1.6.0 google-cloud-trace==0.16.0 google-cloud-translate==1.3.0 google-cloud-videointelligence==0.28.0 google-cloud-vision==0.28.0 google-gax==0.15.15 google-resumable-media==0.3.1 googleapis-common-protos==1.5.3 grpc-google-iam-v1==0.11.4 grpcio==1.7.0

What operating system (Linux, Windows, …) and version?

(venv) tanakaed@triage-bot:~/server$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial

What runtime / compiler are you using (e.g. python version or version of gcc)

Python info: `{noformat} (venv) tanakaed@triage-bot:~/server$ python --version Python 3.6.2 :: Anaconda, Inc. (venv) tanakaed@triage-bot:~/server$ conda info Current conda install:

           platform : linux-64
      conda version : 4.3.27
   conda is private : False
  conda-env version : 4.3.27
conda-build version : 3.0.23
     python version : 3.6.2.final.0
   requests version : 2.18.4
   root environment : /home/tanakaed/anaconda3  (writable)
default environment : /home/tanakaed/anaconda3/envs/venv
   envs directories : /home/tanakaed/anaconda3/envs
                      /home/tanakaed/.conda/envs
      package cache : /home/tanakaed/anaconda3/pkgs
                      /home/tanakaed/.conda/pkgs
       channel URLs : https://repo.continuum.io/pkgs/main/linux-64
                      https://repo.continuum.io/pkgs/main/noarch
                      https://repo.continuum.io/pkgs/free/linux-64
                      https://repo.continuum.io/pkgs/free/noarch
                      https://repo.continuum.io/pkgs/r/linux-64
                      https://repo.continuum.io/pkgs/r/noarch
                      https://repo.continuum.io/pkgs/pro/linux-64
                      https://repo.continuum.io/pkgs/pro/noarch
        config file : None
         netrc file : None
       offline mode : False
         user-agent : conda/4.3.27 requests/2.18.4 CPython/3.6.2 Linux/4.10.0-38-generic debian/stretch/sid glibc/2.23    
            UID:GID : 1001:1002

tanakaed@triage-bot:~$ gcc --version gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 Copyright © 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.`

What did you do?

We have python code to both publish and pull messages from pubsub. We also have python that interfaces with google datastore and google logging. I don’t know which one of these codes is triggering this segmentation fault. My code runs fine for a while but after ~60 mins running and processing some cases, a segfault is raised. I ran my python script inside gdb and this is what I got:

Thread 9 “python” received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff054a700 (LWP 29398)] gpr_ref_non_zero (r=0x0) at src/core/lib/support/sync.c:93 93 src/core/lib/support/sync.c: No such file or directory. (gdb) backtrace #0 gpr_ref_non_zero (r=0x0) at src/core/lib/support/sync.c:93 #1 0x00007ffff12c8365 in grpc_stream_ref (refcount=<optimized out>) at src/core/lib/transport/transport.c:50 #2 0x00007ffff12f3490 in send_security_metadata (batch=0x7fff8c0820f0, elem=0x7fff8c0821a0, exec_ctx=0x7ffff0549ec0) at src/core/lib/security/transport/client_auth_filter.c:216 #3 on_host_checked (exec_ctx=exec_ctx@entry=0x7ffff0549ec0, arg=arg@entry=0x7fff8c0820f0, error=<optimized out>) at src/core/lib/security/transport/client_auth_filter.c:231 #4 0x00007ffff12f396f in auth_start_transport_stream_op_batch (exec_ctx=0x7ffff0549ec0, elem=0x7fff8c0821a0, batch=0x7fff8c0820f0) at src/core/lib/security/transport/client_auth_filter.c:316 #5 0x00007ffff1300f68 in waiting_for_pick_batches_resume (elem=<optimized out>, elem=<optimized out>, exec_ctx=0x7ffff0549ec0) at src/core/ext/filters/client_channel/client_channel.c:953 #6 create_subchannel_call_locked (error=0x0, elem=<optimized out>, exec_ctx=0x7ffff0549ec0) at src/core/ext/filters/client_channel/client_channel.c:1016 #7 pick_done_locked (exec_ctx=0x7ffff0549ec0, elem=<optimized out>, error=0x0) at src/core/ext/filters/client_channel/client_channel.c:1042 #8 0x00007ffff12932f3 in grpc_combiner_continue_exec_ctx (exec_ctx=exec_ctx@entry=0x7ffff0549ec0) at src/core/lib/iomgr/combiner.c:259 #9 0x00007ffff129bdf8 in grpc_exec_ctx_flush (exec_ctx=exec_ctx@entry=0x7ffff0549ec0) at src/core/lib/iomgr/exec_ctx.c:93 #10 0x00007ffff129c3c1 in run_closures (exec_ctx=0x7ffff0549ec0, list=…) at src/core/lib/iomgr/executor.c:81 #11 executor_thread (arg=arg@entry=0x5555565d3e00) at src/core/lib/iomgr/executor.c:181 #12 0x00007ffff1285c37 in thread_body (v=<optimized out>) at src/core/lib/support/thd_posix.c:53 #13 0x00007ffff7bc16ba in start_thread (arg=0x7ffff054a700) at pthread_create.c:333 #14 0x00007ffff78f73dd in clone () at …/sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb)

What did you expect to see?

No segmentation fault

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 31 (14 by maintainers)

Most upvoted comments

I have the same problem and a similar stacktrace. I have been able to reproduce the bug in a short snippet (see below). I am using Arch Linux, Python 3.6.4 and the following libs

gapic-google-cloud-datastore-v1==0.15.3
google-api-core==0.1.3
google-auth==1.2.1
google-cloud-core==0.28.0
google-cloud-datastore==1.4.0
google-cloud-pubsub==0.30.1
google-gax==0.15.16
googleapis-common-protos==1.5.3
grpc-google-iam-v1==0.11.4
proto-google-cloud-datastore-v1==0.90.4

I have ran the following code 39 times and it segfaulted everytime with an average running time of 47 seconds. The subscription I am pulling from contains a few hundreds messages. I think the bug is linked to the number of requests to the datastore because when I remove datastore_client.query(kind='TestMessage') , it segfaults far less often. I have tried to reproduce the bug without using pubsub, by spawning 50 threads and querying the datastore but it never segfaulted.

import os
import random
from datetime import datetime
from google.cloud import pubsub, datastore

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 'some-file.json'
subscriber_client = pubsub.SubscriberClient()
datastore_client = datastore.Client()
KEYS = [i for i in range(100)]

def consume(message):
    data = {}
    data['key1'] = random.choice(KEYS)
    data['time'] = datetime.now().isoformat()
    key = datastore_client.key('Test', str(data['key1']), 'TestMessage', str(data['time']))
    entity = datastore.Entity(key)
    entity.update(data)
    datastore_client.put(entity)

    query = datastore_client.query(kind='TestMessage')
    list(query.fetch())

subscription = subscriber_client.subscription_path("some-project", "some-subscription")
subscriber_client.subscribe(subscription, consume)

while True:
    pass

We’ve cut a patch release for 1.9.1 containing the fix. Please reopen if the issue is not resolved.