grpc: Client crashes in Retry test. Unhandled exception at 0xXXXX (grpc_csharp_ext.x64.dll)
What version of gRPC and what language are you using?
2.31.0-pre1, C#
What operating system (Linux, Windows,…) and version?
Windows
What runtime / compiler are you using (e.g. python version or version of gcc)
.net framework v4.7.2
What did you do?
Please provide either 1) A unit test for reproducing the bug or 2) Specific steps for us to follow to reproduce the bug. If there’s not enough information to debug the problem, gRPC team may close the issue at their discretion. You’re welcome to re-open the issue once you have a reproduction.
-
Created a Mock gRpc server with service “service”:“[nameSpace].LegacyQuery” Method: LegacyQuery. (There is one more service in my code but not used in this test)
service LegacyQuery { rpc LegacyQuery (LegacyQueryRequest) returns (stream LegacyQueryResponse) {} }
In this mock service handler, I wrote code like this to fail the first (original) request but succeed the retries
public override Task LegacyQuery( ArraySegment<byte> request, IServerStreamWriter<ArraySegment<byte>> handlerStreamWriter, ServerCallContext context) { if (TryGetRetryAttempts(context, out int attempts) && attempts > 0) { this.SetResponseHeader(context.ResponseTrailers); byte[] bytes = new byte[] { 88, 99 }; return handlerStreamWriter.WriteAsync(new ArraySegment<byte>(bytes)); } else { throw new RpcException(new Status(StatusCode.Unavailable, "Failing the request to test retry."), "failed for retry"); } }
there is one ChannelOption configured on server side for retry: Server side ChannelOption Name - grpc.service_config Value -
{ "methodConfig": [ { "name": [ { "service": "[nameSpace].AnotherQuery", "method": "Query" }, { "service": "[nameSpace].LegacyQuery", "method": "LegacyQuery" } ], "retryPolicy": { "maxAttempts": 3, "initialBackoff": "0.1s", "maxBackoff": "5s", "backoffMultiplier": 2, "retryableStatusCodes": [ "UNAVAILABLE" ] } } ] }
-
The client code is a unit test
try { var call = client.LegacyQuery(legacyQueryRequest); while (await call.ResponseStream.MoveNext().ConfigureAwait(false)) { response = call.ResponseStream.Current; } Assert.IsNotNull(response); } catch (Exception ex) { Console.WriteLine(ex.ToString()); throw; } catch { Console.WriteLine("Non-CLS compliant exception caught."); }
with on ChannelOption for client Name - grpc.service_config Value -
{ "methodConfig": [ { "name": [ { "service": "[nameSpace].LegacyQuery" } ], "waitForReady": true, "retryPolicy": { "maxAttempts": 3, "initialBackoff": "0.5s", "maxBackoff": "5s", "backoffMultiplier": 2.0, "retryableStatusCodes": [ "UNAVAILABLE" ] } } ] }
What did you expect to see?
The first (original) request fails, but retry happened once and succeeds
What did you see instead?
The first (original) request failed. Client side crashes upon the first response with failure is received
Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).
-
crash happened here
int gpr_unref(gpr_refcount* r) { gpr_atm prior = gpr_atm_full_fetch_add(&r->count, -1); GPR_ASSERT(prior > 0); <--- failed here return prior == 1; }
-
exception details
Unhandled exception at 0x00007FF8AB0F2D2D (grpc_csharp_ext.x64.dll) in testhost.exe: Fatal program exit requested. occurred
- crash call stack I observed in VS
grpc_csharp_ext.x64.dll!abort() Line 77 at minkernel\crts\ucrt\src\appcrt\startup\abort.cpp(77) grpc_csharp_ext.x64.dll!gpr_unref(gpr_refcount * r) Line 105 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\gpr\sync.cc(105) [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::`anonymous-namespace'::CallData::SubchannelCallBatchData::Unref() Line 492 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\client_channel\client_channel.cc(492) grpc_csharp_ext.x64.dll!grpc_core::`anonymous namespace'::CallData::RecvTrailingMetadataReady(void * arg, grpc_error * error) Line 3201 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\client_channel\client_channel.cc(3201) [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation & closure, grpc_closure *) Line 244 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244) grpc_csharp_ext.x64.dll!grpc_core::SubchannelCall::RecvTrailingMetadataReady(void * arg, grpc_error * error) Line 296 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\client_channel\subchannel.cc(296) [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation & closure, grpc_closure *) Line 244 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244) grpc_csharp_ext.x64.dll!recv_trailing_metadata_ready(void * user_data, grpc_error * error) Line 249 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\message_size\message_size_filter.cc(249) [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation & closure, grpc_closure *) Line 244 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244) grpc_csharp_ext.x64.dll!recv_trailing_metadata_ready(void * user_data, grpc_error * error) Line 222 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\http\client\http_client_filter.cc(222) [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation &) Line 244 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244) grpc_csharp_ext.x64.dll!grpc_core::`anonymous namespace'::CallData::OnRecvTrailingMetadataReady(void * arg, grpc_error * error) Line 320 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\http\message_compress\message_decompress_filter.cc(320) grpc_csharp_ext.x64.dll!exec_ctx_run(grpc_closure * closure, grpc_error * error) Line 46 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\exec_ctx.cc(46) grpc_csharp_ext.x64.dll!grpc_core::ExecCtx::Flush() Line 154 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\exec_ctx.cc(154) grpc_csharp_ext.x64.dll!pollset_work(grpc_pollset * pollset, grpc_pollset_worker * * worker_hdl, __int64 deadline) Line 132 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\pollset_windows.cc(132) grpc_csharp_ext.x64.dll!cq_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1274 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\surface\completion_queue.cc(1274) grpc_csharp_ext.x64.dll!grpc_completion_queue_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1350 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\surface\completion_queue.cc(1350) grpc_csharp_ext.x64.dll!grpcsharp_completion_queue_next(grpc_completion_queue * cq) Line 411 at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\csharp\ext\grpc_csharp_ext.c(411) [Managed to Native Transition] Grpc.Core.dll!Grpc.Core.Internal.GrpcThreadPool.RunHandlerLoop(Grpc.Core.Internal.CompletionQueueSafeHandle cq, Grpc.Core.Profiling.IProfiler optionalProfiler) Line 172 at T:\src\github\grpc\src\csharp\Grpc.Core\Internal\GrpcThreadPool.cs(172) mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 980 at f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs(980) mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 928 at f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs(928) mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 917 at f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs(917) mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() Line 111 at f:\dd\ndp\clr\src\BCL\system\threading\thread.cs(111) [Native to Managed Transition] mscorlib.ni.dll!00007ff919683e85()
Request wireshark capture:
431 487.433763 127.0.0.1 127.0.0.1 TCP 344 51921 → 50886 [PSH, ACK] Seq=126 Ack=90 Win=2619648 Len=300 0000 02 00 00 00 45 00 01 54 c0 e7 40 00 80 06 00 00 ....E..T..@..... 0010 7f 00 00 01 7f 00 00 01 ca d1 c6 c6 c7 85 06 fc ................ 0020 c7 db e8 37 50 18 27 f9 39 13 00 00 00 01 11 01 ...7P.'.9....... 0030 04 00 00 00 01 86 83 40 0a 3a 61 75 74 68 6f 72 .......@.:author 0040 69 74 79 0f 31 32 37 2e 30 2e 30 2e 31 3a 35 30 ity.127.0.0.1:50 0050 38 38 36 40 05 3a 70 61 74 68 30 2f 4d 69 63 72 886@.:path0/[Na 0060 6f 73 6f 66 74 2e 42 69 6e 67 2e 58 61 70 2e 47 mespace].G 0070 72 70 63 2e 4c 65 67 61 63 79 51 75 65 72 79 2f rpc.LegacyQuery/ 0080 4c 65 67 61 63 79 51 75 65 72 79 40 02 74 65 08 LegacyQuery@.te. 0090 74 72 61 69 6c 65 72 73 40 0c 63 6f 6e 74 65 6e trailers@.conten 00a0 74 2d 74 79 70 65 10 61 70 70 6c 69 63 61 74 69 t-type.applicati 00b0 6f 6e 2f 67 72 70 63 40 0a 75 73 65 72 2d 61 67 on/grpc@.user-ag 00c0 65 6e 74 37 67 72 70 63 2d 63 73 68 61 72 70 2f ent7grpc-csharp/ 00d0 32 2e 33 31 2e 30 2d 70 72 65 31 20 67 72 70 63 2.31.0-pre1 grpc 00e0 2d 63 2f 31 31 2e 30 2e 30 20 28 77 69 6e 64 6f -c/11.0.0 (windo 00f0 77 73 3b 20 63 68 74 74 70 32 29 40 14 67 72 70 ws; chttp2)@.grp 0100 63 2d 61 63 63 65 70 74 2d 65 6e 63 6f 64 69 6e c-accept-encodin 0110 67 15 69 64 65 6e 74 69 74 79 2c 64 65 66 6c 61 g.identity,defla 0120 74 65 2c 67 7a 69 70 40 0f 61 63 63 65 70 74 2d te,gzip@.accept- 0130 65 6e 63 6f 64 69 6e 67 0d 69 64 65 6e 74 69 74 encoding.identit 0140 79 2c 67 7a 69 70 00 00 09 00 01 00 00 00 01 00 y,gzip.......... 0150 00 00 00 04 0a 02 37 42 ......7B
Response (with failure) wireshark capture
433 502.101974 127.0.0.1 127.0.0.1 TCP 163 50886 → 51921 [PSH, ACK] Seq=90 Ack=426 Win=2619136 Len=119 0000 02 00 00 00 45 00 00 9f c0 e9 40 00 80 06 00 00 ....E.....@..... 0010 7f 00 00 01 7f 00 00 01 c6 c6 ca d1 c7 db e8 37 ...............7 0020 c7 85 08 28 50 18 27 f7 28 6e 00 00 00 00 61 01 ...(P.'.(n....a. 0030 05 00 00 00 01 88 40 0c 63 6f 6e 74 65 6e 74 2d ......@.content- 0040 74 79 70 65 10 61 70 70 6c 69 63 61 74 69 6f 6e type.application 0050 2f 67 72 70 63 40 0b 67 72 70 63 2d 73 74 61 74 /grpc@.grpc-stat 0060 75 73 02 31 34 40 0c 67 72 70 63 2d 6d 65 73 73 us.14@.grpc-mess 0070 61 67 65 22 46 61 69 6c 69 6e 67 20 74 68 65 20 age"Failing the 0080 72 65 71 75 65 73 74 20 74 6f 20 74 65 73 74 20 request to test 0090 72 65 74 72 79 2e 00 00 04 08 00 00 00 00 00 00 retry........... 00a0 00 00 09 ...
-
When I looked into this frame in callstack
void CallData::RecvTrailingMetadataReady(void* arg, grpc_error* error) { SubchannelCallBatchData* batch_data = static_cast<SubchannelCallBatchData*>(arg); grpc_call_element* elem = batch_data->elem;
I found the refs {count=-1} might be suspicious
static_cast<SubchannelCallBatchData*>(arg) 0x000001f5d7fab400 {refs={count=-1 } elem=0x000001f5d80a5690 {filter=0x00007ff8ab1f5720 {grpc_csharp_ext.x64.dll!grpc_channel_filter grpc_client_channel_filter} {...} ...} ...} grpc_core::`anonymous-namespace'::CallData::SubchannelCallBatchData * + **refs {count=-1 }** gpr_refcount + elem 0x000001f5d80a5690 {filter=0x00007ff8ab1f5720 {grpc_csharp_ext.x64.dll!grpc_channel_filter grpc_client_channel_filter} {...} ...} grpc_call_element * + subchannel_call {value_=0x0000000000000000 <NULL> } grpc_core::RefCountedPtr<grpc_core::SubchannelCall> + batch {on_complete=0x000001f5d80cc2f8 {next_data={next=0x0000000000000000 <NULL> mpscq_node={space_={_Val=...} } ...} ...} ...} grpc_transport_stream_op_batch + on_complete {next_data={next=0x0000000000000000 <NULL> mpscq_node={space_={_Val=0.0000000000000000 _Pad=0x000001f5d7fab458 "" } } ...} ...} grpc_closure
See TROUBLESHOOTING.md for how to diagnose problems better.
Anything else we should know about your project / environment?
btw, I did try 2.30.0, while with the same test code and configurations, it crashes on server side, before the mock grpc handler gets a chance to process the first (original) request
it crashes with “write access violation” failure in service_config.cc code
Exception thrown: write access violation. sep was 0x1. occurred
const ServiceConfigParser::ParsedConfigVector* ServiceConfig::GetMethodParsedConfigVector(const grpc_slice& path) const { // Try looking up the full path in the map. auto it = parsed_method_configs_map_.find(path); if (it != parsed_method_configs_map_.end()) return it->second; // If we didn’t find a match for the path, try looking for a wildcard // entry (i.e., change “/service/method” to “/service/”). UniquePtr<char> path_str(grpc_slice_to_c_string(path)); char* sep = strrchr(path_str.get(), ‘/’) + 1; if (sep == nullptr) return nullptr; // Shouldn’t ever happen. *sep = ‘\0’; grpc_slice wildcard_path = grpc_slice_from_static_string(path_str.get()); it = parsed_method_configs_map_.find(wildcard_path); if (it != parsed_method_configs_map_.end()) return it->second; // Try default method config, if set. return default_method_config_vector_; }
here is the call stack
grpc_csharp_ext.x64.dll!grpc_core::ServiceConfig::GetMethodParsedConfigVector(const grpc_slice & path) Line 213 C++ grpc_csharp_ext.x64.dll!
anonymous namespace'::call_data::call_data(grpc_call_element * elem, const
anonymous-namespace’::channel_data & chand, const grpc_call_element_args & args) Line 133 C++ grpc_csharp_ext.x64.dll!message_size_init_call_elem(grpc_call_element * elem, const grpc_call_element_args * args) Line 279 C++ grpc_csharp_ext.x64.dll!grpc_call_stack_init(grpc_channel_stack * channel_stack, int initial_refs, void(*)(void *, grpc_error *) destroy, void * destroy_arg, const grpc_call_element_args * elem_args) Line 180 C++ grpc_csharp_ext.x64.dll!grpc_call_create(const grpc_call_create_args * args, grpc_call * * out_call) Line 424 C++ grpc_csharp_ext.x64.dll!`anonymous namespace’::accept_stream(void * cd, grpc_transport * __formal, const void * transport_server_data) Line 1087 C++ grpc_csharp_ext.x64.dll!grpc_chttp2_parsing_accept_stream(grpc_chttp2_transport * t, unsigned int id) Line 774 C++ grpc_csharp_ext.x64.dll!init_header_frame_parser(grpc_chttp2_transport * t, int is_continuation) Line 646 C++ grpc_csharp_ext.x64.dll!init_frame_parser(grpc_chttp2_transport * t) Line 293 C++ grpc_csharp_ext.x64.dll!grpc_chttp2_perform_read(grpc_chttp2_transport * t, const grpc_slice & slice) Line 179 C++ grpc_csharp_ext.x64.dll!read_action_locked(void * tp, grpc_error * error) Line 2528 C++ grpc_csharp_ext.x64.dll!grpc_combiner_continue_exec_ctx() Line 237 C++ grpc_csharp_ext.x64.dll!grpc_core::ExecCtx::Flush() Line 156 C++ grpc_csharp_ext.x64.dll!pollset_work(grpc_pollset * pollset, grpc_pollset_worker * * worker_hdl, __int64 deadline) Line 132 C++ grpc_csharp_ext.x64.dll!cq_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1059 C++ grpc_csharp_ext.x64.dll!grpc_completion_queue_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1135 C++ grpc_csharp_ext.x64.dll!grpcsharp_completion_queue_next(grpc_completion_queue * cq) Line 411 C [Managed to Native Transition] Grpc.Core.dll!Grpc.Core.Internal.GrpcThreadPool.RunHandlerLoop(Grpc.Core.Internal.CompletionQueueSafeHandle cq, Grpc.Core.Profiling.IProfiler optionalProfiler) Line 172 C# mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 980 C# mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 928 C# mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 917 C# mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() Line 111 C# [Native to Managed Transition] kernel32.dll!BaseThreadInitThunk() Unknown ntdll.dll!RtlUserThreadStart() Unknown
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 24 (11 by maintainers)
Commits related to this issue
- C# add RetryTest with service config to reproduce #23618 — committed to jtattermusch/grpc by jtattermusch 4 years ago
- C# add RetryTest with service config to reproduce #23618 — committed to jtattermusch/grpc by jtattermusch 4 years ago
- C# add RetryTest with service config to reproduce #23618 — committed to jtattermusch/grpc by jtattermusch 4 years ago
- C# add RetryTest with service config to reproduce #23618 (#24413) * C# add RetryTest with service config to reproduce #23618 * fixup tests — committed to grpc/grpc by jtattermusch 2 years ago
we are also experiencing this in mono on linux, any info would be great
For 2.37.0 I get:
For 2.38.0 and 2.39.1:
At 2.40.0 the issue seems resolved and the test passes consistently.