grpc: ev_poll_posix (Mac OS): fd_orphan fails to remove fd from fork list, postfork_child closes fd that no longer belongs to gRPC, breaks Python subprocess.Popen
What version of gRPC and what language are you using?
$ pip show grpcio Name: grpcio Version: 1.21.1
What operating system (Linux, Windows,…) and version?
Mac OS 10.14.5
What runtime / compiler are you using (e.g. python version or version of gcc)
$ python -V Python 2.7.15
What did you do?
If possible, provide a recipe for reproducing the error. Try being specific and include code snippets if helpful.
- Start examples/python/route_guide/route_guide_server.py. The bug reproduces with default options, but to reproduce faster, it helps to set
options=[('grpc.max_connection_idle_ms', 10000)]
. - Modify examples/python/route_guide/route_guide_client.py to add a short delay, then
subprocess.Popen
call:
def run():
with grpc.insecure_channel('localhost:50051') as channel:
stub = route_guide_pb2_grpc.RouteGuideStub(channel)
print("-------------- GetFeature --------------")
guide_get_feature(stub)
time.sleep(20) # wait for channel to go READY --> IDLE
devnull = open(os.devnull)
print('devnull fd: %d' % devnull.fileno())
try:
subprocess.Popen(['/bin/echo', 'It worked!'], stdin=devnull)
except IOError as e:
print('Exception: ', e)
print('Child traceback: ', e.child_traceback)
raise
What did you expect to see?
It worked!
What did you see instead?
Exception: [Errno 9] Bad file descriptor
Child traceback: Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 973, in _execute_child
_dup2(p2cread, 0)
File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 972, in _dup2
os.dup2(a, b)
OSError: [Errno 9] Bad file descriptor
Traceback (most recent call last):
File "route_guide_client.py", line 129, in <module>
run()
File "route_guide_client.py", line 120, in run
subprocess.Popen(['/bin/echo', 'It worked!'], stdin=devnull)
File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 9] Bad file descriptor
Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).
Server log (nothing special):
$ python route_guide_server.py
D0602 21:01:49.878758000 4527408576 ev_posix.cc:174] Using polling engine: poll
D0602 21:01:49.878840000 4527408576 dns_resolver_ares.cc:480] Using ares dns resolver
I0602 21:02:04.501015000 123145540087808 chttp2_transport.cc:1800] ipv6:[::1]:59007: Sending goaway err={"created":"@1559523724.500976000","description":"max_idle","file":"src/core/ext/filters/max_age/max_age_filter.cc","file_line":259,"http2_error":0}
Client log (“GRPC_VERBOSITY=DEBUG GRPC_TRACE=all,-timer_check python route_guide_client.py”), interesting parts only:
# channel opening, created TCP socket fd 3
I0602 21:01:54.472020000 4501616064 ev_posix.cc:254] (fd-trace) fd_create(3, tcp-client:ipv6:[::1]:50051, 1)
# channel has been idle for 10 seconds, server sends GOAWAY
I0602 21:02:09.475174000 123145316352000 chttp2_transport.cc:1157] ipv6:[::1]:50051: Got goaway [0] err={"created":"@1559523729.475129000","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1151,"grpc_status":14,"http2_error":0,"raw_bytes":"max_idle"}
# fd 3 shutting down
I0602 21:02:09.479882000 123145316352000 connectivity_state.cc:147] SET: 0x7fc8fd07e2a8 client_transport: TRANSIENT_FAILURE --> SHUTDOWN [close_transport]
I0602 21:02:09.479894000 123145316352000 ev_posix.cc:274] (fd-trace) fd_shutdown(3)
I0602 21:02:09.479925000 123145316352000 resource_quota.cc:547] RU shutdown 0x7fc8fb64c120
I0602 21:02:09.479944000 123145316352000 ev_posix.cc:267] (fd-trace) grpc_fd_orphan, fd:3 closed
# open('/dev/null') reuses fd 3
devnull fd: 3
# subprocess relies on fd inherited across fork, but grpc postfork thinks
# it still owns the fd and closes it:
I0602 21:02:14.504559000 4501616064 executor.cc:473] EXECUTOR Executor::SetThreadingAll(0) called
I0602 21:02:14.504602000 4501616064 executor.cc:154] EXECUTOR (default-executor) SetThreading(0) begin
I0602 21:02:14.504633000 123145314742272 executor.cc:249] EXECUTOR (default-executor) [0]: shutdown
I0602 21:02:14.504712000 4501616064 executor.cc:202] EXECUTOR (default-executor) Thread 1 of 1 joined
I0602 21:02:14.504742000 4501616064 executor.cc:224] EXECUTOR (default-executor) SetThreading(0) done
I0602 21:02:14.504752000 4501616064 executor.cc:154] EXECUTOR (resolver-executor) SetThreading(0) begin
I0602 21:02:14.504795000 123145315278848 executor.cc:249] EXECUTOR (resolver-executor) [0]: shutdown
I0602 21:02:14.504857000 4501616064 executor.cc:202] EXECUTOR (resolver-executor) Thread 1 of 1 joined
I0602 21:02:14.504879000 4501616064 executor.cc:224] EXECUTOR (resolver-executor) SetThreading(0) done
I0602 21:02:14.506219000 4501616064 executor.cc:473] EXECUTOR Executor::SetThreadingAll(1) called
I0602 21:02:14.506302000 4501616064 executor.cc:154] EXECUTOR (default-executor) SetThreading(1) begin
I0602 21:02:14.506467000 4501616064 executor.cc:224] EXECUTOR (default-executor) SetThreading(1) done
I0602 21:02:14.506505000 4501616064 executor.cc:154] EXECUTOR (resolver-executor) SetThreading(1) begin
I0602 21:02:14.506677000 4501616064 executor.cc:224] EXECUTOR (resolver-executor) SetThreading(1) done
I0602 21:02:14.506711000 123145315815424 executor.cc:238] EXECUTOR (resolver-executor) [0]: step (sub_depth=0)
I0602 21:02:14.506751000 123145315278848 executor.cc:238] EXECUTOR (default-executor) [0]: step (sub_depth=0)
I0602 21:02:14.507265000 4501616064 executor.cc:473] EXECUTOR Executor::SetThreadingAll(1) called
I0602 21:02:14.511119000 4501616064 executor.cc:154] EXECUTOR (default-executor) SetThreading(1) begin
I0602 21:02:14.512976000 4501616064 executor.cc:224] EXECUTOR (default-executor) SetThreading(1) done
I0602 21:02:14.513005000 123145315278848 executor.cc:238] EXECUTOR (default-executor) [0]: step (sub_depth=0)
I0602 21:02:14.515061000 4501616064 executor.cc:154] EXECUTOR (resolver-executor) SetThreading(1) begin
I0602 21:02:14.519730000 4501616064 executor.cc:224] EXECUTOR (resolver-executor) SetThreading(1) done
I0602 21:02:14.519785000 123145315815424 executor.cc:238] EXECUTOR (resolver-executor) [0]: step (sub_depth=0)
# subprocess.Popen fails in child on dup'ing stdin fd to fd 0
Exception: [Errno 9] Bad file descriptor
Child traceback: Traceback (most recent call last):
File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 973, in _execute_child
_dup2(p2cread, 0)
File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 972, in _dup2
os.dup2(a, b)
OSError: [Errno 9] Bad file descriptor
Client DTruss (syscall trace), starting during the sleep:
dtrace: system integrity protection is on, some features will not be available
PID/THRD SYSCALL(args) = return
...
# fork in parent / child
86401/0x282e0f: fork() = 86414 0
86414/0x282e98: fork() = 0 0
# child closing a series of fd's, including 3
86414/0x282e98: close(0x4) = 0 0
86401/0x282e9b: proc_info(0x5, 0x15181, 0x2) = 0 0
86414/0x282e98: close(0x5) = 0 0
86414/0x282e98: close(0x3) = 0 0
# Python attempting to set up child stdin, getting EBADF
86414/0x282e98: dup2(0x3, 0x0, 0x0) = -1 Err#9
...
It worked!
But probably if the program were to use naked os.fork()
, something else would break badly.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 16 (13 by maintainers)
@lidizheng PR has been posted (20452)… please let me know if there’s anything I need to do on my end to ease the way through for it.
The accounting of file-descriptor reference counts is wrong. Here’s some debug logging from the prior python example script. This sequence shows an outbound socket being created, connection attempt, failure and disposal.
The socket is closed, but the ref count doesn’t drop to 0 (it’s at 4). The code that closes the file-descriptor doesn’t check the ref-count. But aside from that whatever caused the ref-count to bump up (via “multipoller”) isn’t being undone.
Since the ref-count doesn’t reach 0, the FD (3) is not unregistered from the fork list and is thus left to be closed at fork time.