grpc: [Aio] Async IO manager deadlock with current API

Current gRPC Python API will deadlock using Async IO manager. This problem needs to be solved before merging the first PR.

Reproduction script:

import grpc
import asyncio
from grpc.experimental import aio

async def main():
	print('===Testing Aio Channel===')
	async with aio.insecure_channel('localhost:50051') as channel:
		print(await channel.unary_unary(b'')(b''))
	print('===Finish Aio Channel===')


	print('===Testing Normal Channel===')
	with grpc.insecure_channel('localhost:50051') as channel:
		print(channel.unary_unary(b'')(b''))  # <- deadlock
	print('===Finish Normal Channel===')


aio.init_grpc_aio()
asyncio.get_event_loop().run_until_complete(main())

Log:

I0814 16:23:13.264381376  220688 call.cc:1927]               grpc_call_start_batch(call=0x564af3f8b4a0, ops=0x564af3f69a50, nops=6, tag=0x7f0903e0dc50, reserved=(nil))
I0814 16:23:13.264387822  220688 call.cc:1526]               ops[0]: SEND_INITIAL_METADATA(nil)
I0814 16:23:13.264392291  220688 call.cc:1526]               ops[1]: SEND_MESSAGE ptr=0x564af3dcbc20
I0814 16:23:13.264396790  220688 call.cc:1526]               ops[2]: SEND_CLOSE_FROM_CLIENT
I0814 16:23:13.264401459  220688 call.cc:1526]               ops[3]: RECV_INITIAL_METADATA ptr=0x7f0903e1ee88
I0814 16:23:13.264406097  220688 call.cc:1526]               ops[4]: RECV_MESSAGE ptr=0x7f0903e58f20
I0814 16:23:13.264411516  220688 call.cc:1526]               ops[5]: RECV_STATUS_ON_CLIENT metadata=0x7f0905300bf8 status=0x7f0905300c10 details=0x7f0905300c18
D0814 16:23:13.264423396  220688 dns_resolver.cc:242]        Start resolving.
I0814 16:23:13.264452976  220688 completion_queue.cc:963]    grpc_completion_queue_next(cq=0x564af3f17fd0, deadline=gpr_timespec { tv_sec: 1565824993, tv_nsec: 464450963, clock_type: 1 }, reserved=(nil))
I0814 16:23:13.464982891  220688 completion_queue.cc:963]    grpc_completion_queue_next(cq=0x564af3f17fd0, deadline=gpr_timespec { tv_sec: 1565824993, tv_nsec: 664976288, clock_type: 1 }, reserved=(nil))
I0814 16:23:13.665982102  220688 completion_queue.cc:963]    grpc_completion_queue_next(cq=0x564af3f17fd0, deadline=gpr_timespec { tv_sec: 1565824993, tv_nsec: 865976002, clock_type: 1 }, reserved=(nil))
I0814 16:23:13.866982076  220688 completion_queue.cc:963]    grpc_completion_queue_next(cq=0x564af3f17fd0, deadline=gpr_timespec { tv_sec: 1565824994, tv_nsec: 66976175, clock_type: 1 }, reserved=(nil))

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 21 (11 by maintainers)

Most upvoted comments

@pfreixes +1 to the implicit grpc_init_aio() call as long as we have very clear error messages when a failure occurs as a result of this call.

I’m still considering the implications of having a synchronous server running on top of an already initialized Aio environment, and most important if it does make sense. Initially, I would be tempted to not trying to give support to this use case.

I don’t think it particularly matters how we accomplish it in terms of implementation. The use case that we’re enabling here is coexisting synchronous gRPC and asyncio gRPC in the same process. People need this in order to incrementally migrate from their existing sync stack to asyncio.

As for testing, it would be great if we could figure out a way to run all of the existing synchronous tests with something async added into the test process. Then we can be 100% sure that things are compatible. You might consider adding a new test runner for this.

I’m planning to start coding the solution for supporting the synchronous stack on top of the Aio package Today, for circumventing the “deadlock” issue. The solution would be basically inspired by the POC that I made for having the io loop running in a separated thread [1].

Just for having all on the same page, the solution proposed would allow us to:

  • Run synchronous client requests on an already existing an initialized Aio environment.

The solution wouldn’t provide support for:

  • Run the Aio package on an already initialized synchronous environment.

I’m still considering the implications of having a synchronous server running on top of an already initialized Aio environment, and most important if it does make sense. Initially, I would be tempted to not trying to give support to this use case.

Also, I’m considering on removing the need of having to initialized explicitly the Aio package by calling the grpc_init_aio function, so doing this behind the scenes when either an Aio channel or an Aio server is created. For unsupported combinations, an Exception would be raised. This theoretically would help also to close these issues [2] [3]

/cc @gnossen @lidizheng

[1] https://github.com/Skyscanner/grpc/commit/996636ac853acd2ce60afbb8843181321a9915b4 [2] https://github.com/grpc/grpc/issues/20806 [3] https://github.com/grpc/grpc/issues/19985

For option 1, this is on the road map of C-Core team. It is possible that they will provide SWE to do it. I agree it is non-trivial, but it will benefits all wrapper languages. Particularly, the migration cases.

The RuntimeError solution should serve as a fallback plan. For now, I agree it is better to throw an exception, after all, exception with a reason is much nicer than deadlock.