grpc: ServerBuilder::BuildAndStart hangs on Ubuntu 19.10

What version of gRPC and what language are you using?

Tested on v1.24.x and v1.25.x C++ language

What operating system (Linux, Windows,…) and version?

Ubuntu 19.10

What runtime / compiler are you using (e.g. python version or version of gcc)

gcc/g++ 9.2.1-9ubuntu2

What did you do?

  • Compiled/installed gRPC according to the installation instructions.
  • Compiled examples/cpp/helloworld example.
  • Tried running ./greeter_server

What did you expect to see?

The helloworld application working properly.

What did you see instead?

v1.25.x Test (hangs):
$ GRPC_VERBOSITY=debug GRPC_TRACE=all,-timer_check,-timer ./greeter_server 
D1114 16:43:04.016775905   14313 ev_posix.cc:174]            Using polling engine: epollex
D1114 16:43:04.016867235   14313 dns_resolver_ares.cc:503]   Using ares dns resolver

…and it’s hung here. Specifically line that includes builder.BuildAndStart() on this example.

$ strace ./greeter_server
...
epoll_create1(EPOLL_CLOEXEC)            = 3
eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 4
epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLEXCLUSIVE|EPOLLONESHOT|EPOLLET, {u32=0, u64=0}}) = -1 EINVAL (Invalid argument)
close(4)                                = 0
close(3)                                = 0
epoll_create1(EPOLL_CLOEXEC)            = 3
eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 4
epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLET, {u32=3886127653, u64=94029310171685}}) = 0
uname({sysname="Linux", nodename="<redacted>", ...}) = 0
futex(0x7fdf7afa4500, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7fdf7b89dda8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
v1.24.x Test (segfault):
$ GRPC_VERBOSITY=debug GRPC_TRACE=all,-timer_check,-timer gdb ./greeter_server 
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff725c700 (LWP 31408)]
[New Thread 0x7fffeffff700 (LWP 31409)]
D1114 17:04:06.843734995   31401 ev_posix.cc:174]            Using polling engine: epollex
E1114 17:04:06.843823520   31401 handshaker_registry.cc:103] assertion failed: g_handshaker_factory_lists != nullptr

Thread 1 "greeter_server" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7715899 in __GI_abort () at abort.c:79
#2  0x00007ffff743e2eb in grpc_core::HandshakerRegistry::RegisterHandshakerFactory(bool, grpc_core::HandshakerType, std::unique_ptr<grpc_core::HandshakerFactory, grpc_core::DefaultDelete<grpc_core::HandshakerFactory> >) () from /usr/local/lib/libgrpc.so.8
#3  0x00007ffff74b8f12 in grpc_core::SecurityRegisterHandshakerFactories() () from /usr/local/lib/libgrpc.so.8
#4  0x00007ffff74ba76d in grpc_security_init() () from /usr/local/lib/libgrpc.so.8
#5  0x00007ffff7435a23 in grpc_init () from /usr/local/lib/libgrpc.so.8
#6  0x0000555555565ca4 in grpc::GrpcLibraryCodegen::GrpcLibraryCodegen(bool) ()
#7  0x00007ffff7bf151e in grpc_impl::ServerBuilder::BuildAndStart() () from /usr/local/lib/libgrpc++.so.1
#8  0x0000555555578e29 in RunServer() ()
#9  0x0000555555578f99 in main ()

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 8
  • Comments: 39 (9 by maintainers)

Commits related to this issue

Most upvoted comments

There are some ODR violations in libgrpc++ (master branch); for example client_channel.cc is included in both libgrpc.so and libgrpc++.so. That leads to the same grpc_core::TraceFlag grpc_client_channel_call_trace being constructed twice (notice this=0x7fffff7a07f0 in both traces below), creating a loop in the trace flag chain.

Global objects are merged during dynamic loading, but not their initializers, it’s a well-known phenomenon (examples: 1, 2, 3, 4).

#1  0x00007fffff6e4e24 in grpc_core::TraceFlagList::Add (flag=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>) at src/core/lib/debug/trace.cc:80
#2  0x00007fffff6e4f01 in grpc_core::TraceFlag::TraceFlag (this=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>, default_enabled=false, name=0x7fffff288a06 "client_channel_call") at src/core/lib/debug/trace.cc:98
#3  0x00007fffff0816bc in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at src/core/ext/filters/client_channel/client_channel.cc:100
#4  0x00007fffff0816ed in _GLOBAL__sub_I_client_channel.cc(void) () at src/core/ext/filters/client_channel/client_channel.cc:4046
#5  0x00007fffff7cf37a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7ffffffedde8, env=env@entry=0x7ffffffeddf8) at dl-init.c:72
#6  0x00007fffff7cf476 in call_init (env=0x7ffffffeddf8, argv=0x7ffffffedde8, argc=1, l=<optimized out>) at dl-init.c:30
#7  _dl_init (main_map=0x7fffff7e9190, argc=1, argv=0x7ffffffedde8, env=0x7ffffffeddf8) at dl-init.c:119
#8  0x00007fffff7c10ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#9  0x0000000000000001 in ?? ()
#10 0x00007ffffffedfff in ?? ()
#11 0x0000000000000000 in ?? ()

(gdb) info sym 0x00007fffff0816ed
_GLOBAL__sub_I_client_channel.cc + 19 in section .text of /usr/local/lib/libgrpc.so.9

#1  0x00007fffff6e4e24 in grpc_core::TraceFlagList::Add (flag=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>) at src/core/lib/debug/trace.cc:80
#2  0x00007fffff6e4f01 in grpc_core::TraceFlag::TraceFlag (this=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>, default_enabled=false, name=0x7fffff6f5006 "client_channel_call") at src/core/lib/debug/trace.cc:98
#3  0x00007fffff64d19e in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at src/core/ext/filters/client_channel/client_channel.cc:100
#4  0x00007fffff64d1cf in _GLOBAL__sub_I_client_channel.cc(void) () at src/core/ext/filters/client_channel/client_channel.cc:4046
#5  0x00007fffff7cf37a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7ffffffedde8, env=env@entry=0x7ffffffeddf8) at dl-init.c:72
#6  0x00007fffff7cf476 in call_init (env=0x7ffffffeddf8, argv=0x7ffffffedde8, argc=1, l=<optimized out>) at dl-init.c:30
#7  _dl_init (main_map=0x7fffff7e9190, argc=1, argv=0x7ffffffedde8, env=0x7ffffffeddf8) at dl-init.c:119
#8  0x00007fffff7c10ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#9  0x0000000000000001 in ?? ()
#10 0x00007ffffffedfff in ?? ()
#11 0x0000000000000000 in ?? ()

(gdb) info sym 0x00007fffff64d1cf
_GLOBAL__sub_I_client_channel.cc + 19 in section .text of /usr/local/lib/libgrpc++.so.1

If the implementation is the same, why does libgrpc++ depend on libgrpc??

Anyway, as a temporary workaround add a check to void TraceFlagList::Add(TraceFlag* flag) (in src/core/lib/debug/trace.cc) to avoid adding the flag twice:

void TraceFlagList::Add(TraceFlag* flag) {
  for (TraceFlag* t = root_tracer_; t != nullptr; t = t->next_tracer_) {
    if (t == flag) {
      return;
    }
  }
  flag->next_tracer_ = root_tracer_;
  root_tracer_ = flag;
}

I downgraded to 1.24 and it works, but the downgrade was a pain. I thing that grpc make install is broken, sometimes it does not install anything, I had to install it manually.

Two days of my life are gone and I have a really bad feeling about the maturity of this lib…

@veblush Hello! I checked with the latest v1.26.x branch and was able to properly compile and run the examples, as well as my own programs with this version on Ubuntu 19.10. Thanks, cheers~

@ya-mouse just checked - 1.26.0 still has this bug.

Same behavior, none of the mentioned fixes helped me.

Ubuntu 19.10 grpc v 1.25.0 gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008 protoc v 3.8.0

That’s really crazy, I can do nothing but wait for fix…

There are some ODR violations in libgrpc++ (master branch); for example client_channel.cc is included in both libgrpc.so and libgrpc++.so. That leads to the same grpc_core::TraceFlag grpc_client_channel_call_trace being constructed twice (notice this=0x7fffff7a07f0 in both traces below), creating a loop in the trace flag chain.

Global objects are merged during dynamic loading, but not their initializers, it’s a well-known phenomenon (examples: 1, 2, 3, 4).

#1  0x00007fffff6e4e24 in grpc_core::TraceFlagList::Add (flag=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>) at src/core/lib/debug/trace.cc:80
#2  0x00007fffff6e4f01 in grpc_core::TraceFlag::TraceFlag (this=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>, default_enabled=false, name=0x7fffff288a06 "client_channel_call") at src/core/lib/debug/trace.cc:98
#3  0x00007fffff0816bc in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at src/core/ext/filters/client_channel/client_channel.cc:100
#4  0x00007fffff0816ed in _GLOBAL__sub_I_client_channel.cc(void) () at src/core/ext/filters/client_channel/client_channel.cc:4046
#5  0x00007fffff7cf37a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7ffffffedde8, env=env@entry=0x7ffffffeddf8) at dl-init.c:72
#6  0x00007fffff7cf476 in call_init (env=0x7ffffffeddf8, argv=0x7ffffffedde8, argc=1, l=<optimized out>) at dl-init.c:30
#7  _dl_init (main_map=0x7fffff7e9190, argc=1, argv=0x7ffffffedde8, env=0x7ffffffeddf8) at dl-init.c:119
#8  0x00007fffff7c10ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#9  0x0000000000000001 in ?? ()
#10 0x00007ffffffedfff in ?? ()
#11 0x0000000000000000 in ?? ()

(gdb) info sym 0x00007fffff0816ed
_GLOBAL__sub_I_client_channel.cc + 19 in section .text of /usr/local/lib/libgrpc.so.9

#1  0x00007fffff6e4e24 in grpc_core::TraceFlagList::Add (flag=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>) at src/core/lib/debug/trace.cc:80
#2  0x00007fffff6e4f01 in grpc_core::TraceFlag::TraceFlag (this=0x7fffff7a07f0 <grpc_core::grpc_client_channel_call_trace>, default_enabled=false, name=0x7fffff6f5006 "client_channel_call") at src/core/lib/debug/trace.cc:98
#3  0x00007fffff64d19e in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at src/core/ext/filters/client_channel/client_channel.cc:100
#4  0x00007fffff64d1cf in _GLOBAL__sub_I_client_channel.cc(void) () at src/core/ext/filters/client_channel/client_channel.cc:4046
#5  0x00007fffff7cf37a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7ffffffedde8, env=env@entry=0x7ffffffeddf8) at dl-init.c:72
#6  0x00007fffff7cf476 in call_init (env=0x7ffffffeddf8, argv=0x7ffffffedde8, argc=1, l=<optimized out>) at dl-init.c:30
#7  _dl_init (main_map=0x7fffff7e9190, argc=1, argv=0x7ffffffedde8, env=0x7ffffffeddf8) at dl-init.c:119
#8  0x00007fffff7c10ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#9  0x0000000000000001 in ?? ()
#10 0x00007ffffffedfff in ?? ()
#11 0x0000000000000000 in ?? ()

(gdb) info sym 0x00007fffff64d1cf
_GLOBAL__sub_I_client_channel.cc + 19 in section .text of /usr/local/lib/libgrpc++.so.1

If the implementation is the same, why does libgrpc++ depend on libgrpc??

Anyway, as a temporary workaround add a check to void TraceFlagList::Add(TraceFlag* flag) (in src/core/lib/debug/trace.cc) to avoid adding the flag twice:

void TraceFlagList::Add(TraceFlag* flag) {
  for (TraceFlag* t = root_tracer_; t != nullptr; t = t->next_tracer_) {
    if (t == flag) {
      return;
    }
  }
  flag->next_tracer_ = root_tracer_;
  root_tracer_ = flag;
}

Many thanks, the workaround works fine for me. I am in a ubuntu 18.04 environment.