grpc: Client crashes in Retry test. Unhandled exception at 0xXXXX (grpc_csharp_ext.x64.dll)

What version of gRPC and what language are you using?

2.31.0-pre1, C#

What operating system (Linux, Windows,…) and version?

Windows

What runtime / compiler are you using (e.g. python version or version of gcc)

.net framework v4.7.2

What did you do?

Please provide either 1) A unit test for reproducing the bug or 2) Specific steps for us to follow to reproduce the bug. If there’s not enough information to debug the problem, gRPC team may close the issue at their discretion. You’re welcome to re-open the issue once you have a reproduction.

  1. Created a Mock gRpc server with service “service”:“[nameSpace].LegacyQuery” Method: LegacyQuery. (There is one more service in my code but not used in this test)

     service LegacyQuery {
       rpc LegacyQuery (LegacyQueryRequest) returns (stream LegacyQueryResponse) {}
     }
    

    In this mock service handler, I wrote code like this to fail the first (original) request but succeed the retries

     public override Task LegacyQuery(
         ArraySegment<byte> request, IServerStreamWriter<ArraySegment<byte>> handlerStreamWriter, ServerCallContext context)
     {
         if (TryGetRetryAttempts(context, out int attempts) && attempts > 0)
         {
             this.SetResponseHeader(context.ResponseTrailers);
             byte[] bytes = new byte[] { 88, 99 };
             return handlerStreamWriter.WriteAsync(new ArraySegment<byte>(bytes));
         }
         else
         {
             throw new RpcException(new Status(StatusCode.Unavailable, "Failing the request to test retry."), "failed for retry");
         }
     }
    

    there is one ChannelOption configured on server side for retry: Server side ChannelOption Name - grpc.service_config Value -

    {
        "methodConfig": [
            {
                "name": [
                    {
                        "service": "[nameSpace].AnotherQuery",
                        "method": "Query"
                    },
                    {
                        "service": "[nameSpace].LegacyQuery",
                        "method": "LegacyQuery"
                    }
                ],
                "retryPolicy": {
                    "maxAttempts": 3,
                    "initialBackoff": "0.1s",
                    "maxBackoff": "5s",
                    "backoffMultiplier": 2,
                    "retryableStatusCodes": [
                        "UNAVAILABLE"
                    ]
                }
            }
        ]
    }
  1. The client code is a unit test

         try
         {
             var call = client.LegacyQuery(legacyQueryRequest);
             while (await call.ResponseStream.MoveNext().ConfigureAwait(false))
             {
                 response = call.ResponseStream.Current;
             }
    
             Assert.IsNotNull(response);
         }
         catch (Exception ex)
         {
             Console.WriteLine(ex.ToString());
             throw;
         }
         catch
         {
             Console.WriteLine("Non-CLS compliant exception caught.");
         }
    

    with on ChannelOption for client Name - grpc.service_config Value -

    {
        "methodConfig": [
            {
                "name": [
                    {
                        "service": "[nameSpace].LegacyQuery"
                    }
                ],
                "waitForReady": true,
                "retryPolicy": {
                    "maxAttempts": 3,
                    "initialBackoff": "0.5s",
                    "maxBackoff": "5s",
                    "backoffMultiplier": 2.0,
                    "retryableStatusCodes": [
                        "UNAVAILABLE"
                    ]
                }
            }
        ]
    }

What did you expect to see?

The first (original) request fails, but retry happened once and succeeds

What did you see instead?

The first (original) request failed. Client side crashes upon the first response with failure is received

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

  • crash happened here

      int gpr_unref(gpr_refcount* r) {
        gpr_atm prior = gpr_atm_full_fetch_add(&r->count, -1);
        GPR_ASSERT(prior > 0);  <--- failed here
        return prior == 1;
      }
    
  • exception details

  Unhandled exception at 0x00007FF8AB0F2D2D (grpc_csharp_ext.x64.dll) in testhost.exe: Fatal program exit requested. occurred
  • crash call stack I observed in VS
    grpc_csharp_ext.x64.dll!abort() Line 77
        at minkernel\crts\ucrt\src\appcrt\startup\abort.cpp(77)
    grpc_csharp_ext.x64.dll!gpr_unref(gpr_refcount * r) Line 105
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\gpr\sync.cc(105)
    [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::`anonymous-namespace'::CallData::SubchannelCallBatchData::Unref() Line 492
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\client_channel\client_channel.cc(492)
    grpc_csharp_ext.x64.dll!grpc_core::`anonymous namespace'::CallData::RecvTrailingMetadataReady(void * arg, grpc_error * error) Line 3201
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\client_channel\client_channel.cc(3201)
    [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation & closure, grpc_closure *) Line 244
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244)
    grpc_csharp_ext.x64.dll!grpc_core::SubchannelCall::RecvTrailingMetadataReady(void * arg, grpc_error * error) Line 296
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\client_channel\subchannel.cc(296)
    [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation & closure, grpc_closure *) Line 244
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244)
    grpc_csharp_ext.x64.dll!recv_trailing_metadata_ready(void * user_data, grpc_error * error) Line 249
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\message_size\message_size_filter.cc(249)
    [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation & closure, grpc_closure *) Line 244
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244)
    grpc_csharp_ext.x64.dll!recv_trailing_metadata_ready(void * user_data, grpc_error * error) Line 222
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\http\client\http_client_filter.cc(222)
    [Inline Frame] grpc_csharp_ext.x64.dll!grpc_core::Closure::Run(const grpc_core::DebugLocation &) Line 244
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\closure.h(244)
    grpc_csharp_ext.x64.dll!grpc_core::`anonymous namespace'::CallData::OnRecvTrailingMetadataReady(void * arg, grpc_error * error) Line 320
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\ext\filters\http\message_compress\message_decompress_filter.cc(320)
    grpc_csharp_ext.x64.dll!exec_ctx_run(grpc_closure * closure, grpc_error * error) Line 46
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\exec_ctx.cc(46)
    grpc_csharp_ext.x64.dll!grpc_core::ExecCtx::Flush() Line 154
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\exec_ctx.cc(154)
    grpc_csharp_ext.x64.dll!pollset_work(grpc_pollset * pollset, grpc_pollset_worker * * worker_hdl, __int64 deadline) Line 132
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\iomgr\pollset_windows.cc(132)
    grpc_csharp_ext.x64.dll!cq_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1274
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\surface\completion_queue.cc(1274)
    grpc_csharp_ext.x64.dll!grpc_completion_queue_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1350
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\core\lib\surface\completion_queue.cc(1350)
    grpc_csharp_ext.x64.dll!grpcsharp_completion_queue_next(grpc_completion_queue * cq) Line 411
        at t:\src\github\grpc\workspace_csharp_ext_windows_x64\src\csharp\ext\grpc_csharp_ext.c(411)
    [Managed to Native Transition]
    Grpc.Core.dll!Grpc.Core.Internal.GrpcThreadPool.RunHandlerLoop(Grpc.Core.Internal.CompletionQueueSafeHandle cq, Grpc.Core.Profiling.IProfiler optionalProfiler) Line 172
        at T:\src\github\grpc\src\csharp\Grpc.Core\Internal\GrpcThreadPool.cs(172)
    mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 980
        at f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs(980)
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 928
        at f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs(928)
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 917
        at f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs(917)
    mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() Line 111
        at f:\dd\ndp\clr\src\BCL\system\threading\thread.cs(111)
    [Native to Managed Transition]
    mscorlib.ni.dll!00007ff919683e85()

Request wireshark capture:

    431	487.433763	127.0.0.1	127.0.0.1	TCP	344	51921 → 50886 [PSH, ACK] Seq=126 Ack=90 Win=2619648 Len=300
    0000   02 00 00 00 45 00 01 54 c0 e7 40 00 80 06 00 00   ....E..T..@.....
    0010   7f 00 00 01 7f 00 00 01 ca d1 c6 c6 c7 85 06 fc   ................
    0020   c7 db e8 37 50 18 27 f9 39 13 00 00 00 01 11 01   ...7P.'.9.......
    0030   04 00 00 00 01 86 83 40 0a 3a 61 75 74 68 6f 72   .......@.:author
    0040   69 74 79 0f 31 32 37 2e 30 2e 30 2e 31 3a 35 30   ity.127.0.0.1:50
    0050   38 38 36 40 05 3a 70 61 74 68 30 2f 4d 69 63 72   886@.:path0/[Na
    0060   6f 73 6f 66 74 2e 42 69 6e 67 2e 58 61 70 2e 47   mespace].G
    0070   72 70 63 2e 4c 65 67 61 63 79 51 75 65 72 79 2f   rpc.LegacyQuery/
    0080   4c 65 67 61 63 79 51 75 65 72 79 40 02 74 65 08   LegacyQuery@.te.
    0090   74 72 61 69 6c 65 72 73 40 0c 63 6f 6e 74 65 6e   trailers@.conten
    00a0   74 2d 74 79 70 65 10 61 70 70 6c 69 63 61 74 69   t-type.applicati
    00b0   6f 6e 2f 67 72 70 63 40 0a 75 73 65 72 2d 61 67   on/grpc@.user-ag
    00c0   65 6e 74 37 67 72 70 63 2d 63 73 68 61 72 70 2f   ent7grpc-csharp/
    00d0   32 2e 33 31 2e 30 2d 70 72 65 31 20 67 72 70 63   2.31.0-pre1 grpc
    00e0   2d 63 2f 31 31 2e 30 2e 30 20 28 77 69 6e 64 6f   -c/11.0.0 (windo
    00f0   77 73 3b 20 63 68 74 74 70 32 29 40 14 67 72 70   ws; chttp2)@.grp
    0100   63 2d 61 63 63 65 70 74 2d 65 6e 63 6f 64 69 6e   c-accept-encodin
    0110   67 15 69 64 65 6e 74 69 74 79 2c 64 65 66 6c 61   g.identity,defla
    0120   74 65 2c 67 7a 69 70 40 0f 61 63 63 65 70 74 2d   te,gzip@.accept-
    0130   65 6e 63 6f 64 69 6e 67 0d 69 64 65 6e 74 69 74   encoding.identit
    0140   79 2c 67 7a 69 70 00 00 09 00 01 00 00 00 01 00   y,gzip..........
    0150   00 00 00 04 0a 02 37 42                           ......7B

Response (with failure) wireshark capture

    433	502.101974	127.0.0.1	127.0.0.1	TCP	163	50886 → 51921 [PSH, ACK] Seq=90 Ack=426 Win=2619136 Len=119
    0000   02 00 00 00 45 00 00 9f c0 e9 40 00 80 06 00 00   ....E.....@.....
    0010   7f 00 00 01 7f 00 00 01 c6 c6 ca d1 c7 db e8 37   ...............7
    0020   c7 85 08 28 50 18 27 f7 28 6e 00 00 00 00 61 01   ...(P.'.(n....a.
    0030   05 00 00 00 01 88 40 0c 63 6f 6e 74 65 6e 74 2d   ......@.content-
    0040   74 79 70 65 10 61 70 70 6c 69 63 61 74 69 6f 6e   type.application
    0050   2f 67 72 70 63 40 0b 67 72 70 63 2d 73 74 61 74   /grpc@.grpc-stat
    0060   75 73 02 31 34 40 0c 67 72 70 63 2d 6d 65 73 73   us.14@.grpc-mess
    0070   61 67 65 22 46 61 69 6c 69 6e 67 20 74 68 65 20   age"Failing the 
    0080   72 65 71 75 65 73 74 20 74 6f 20 74 65 73 74 20   request to test 
    0090   72 65 74 72 79 2e 00 00 04 08 00 00 00 00 00 00   retry...........
    00a0   00 00 09                                          ...
  • When I looked into this frame in callstack

      void CallData::RecvTrailingMetadataReady(void* arg, grpc_error* error) {
        SubchannelCallBatchData* batch_data =
            static_cast<SubchannelCallBatchData*>(arg);
        grpc_call_element* elem = batch_data->elem;
    

    I found the refs {count=-1} might be suspicious

    static_cast<SubchannelCallBatchData*>(arg)	0x000001f5d7fab400 {refs={count=-1 } elem=0x000001f5d80a5690 {filter=0x00007ff8ab1f5720 {grpc_csharp_ext.x64.dll!grpc_channel_filter grpc_client_channel_filter} {...} ...} ...}	grpc_core::`anonymous-namespace'::CallData::SubchannelCallBatchData *
    +		**refs	{count=-1 }**	gpr_refcount
    +		elem	0x000001f5d80a5690 {filter=0x00007ff8ab1f5720 {grpc_csharp_ext.x64.dll!grpc_channel_filter grpc_client_channel_filter} {...} ...}	grpc_call_element *
    +		subchannel_call	{value_=0x0000000000000000 <NULL> }	grpc_core::RefCountedPtr<grpc_core::SubchannelCall>
    +		batch	{on_complete=0x000001f5d80cc2f8 {next_data={next=0x0000000000000000 <NULL> mpscq_node={space_={_Val=...} } ...} ...} ...}	grpc_transport_stream_op_batch
    +		on_complete	{next_data={next=0x0000000000000000 <NULL> mpscq_node={space_={_Val=0.0000000000000000 _Pad=0x000001f5d7fab458 "" } } ...} ...}	grpc_closure

See TROUBLESHOOTING.md for how to diagnose problems better.

Anything else we should know about your project / environment?

btw, I did try 2.30.0, while with the same test code and configurations, it crashes on server side, before the mock grpc handler gets a chance to process the first (original) request

it crashes with “write access violation” failure in service_config.cc code

Exception thrown: write access violation. sep was 0x1. occurred

const ServiceConfigParser::ParsedConfigVector* ServiceConfig::GetMethodParsedConfigVector(const grpc_slice& path) const { // Try looking up the full path in the map. auto it = parsed_method_configs_map_.find(path); if (it != parsed_method_configs_map_.end()) return it->second; // If we didn’t find a match for the path, try looking for a wildcard // entry (i.e., change “/service/method” to “/service/”). UniquePtr<char> path_str(grpc_slice_to_c_string(path)); char* sep = strrchr(path_str.get(), ‘/’) + 1; if (sep == nullptr) return nullptr; // Shouldn’t ever happen. *sep = ‘\0’; grpc_slice wildcard_path = grpc_slice_from_static_string(path_str.get()); it = parsed_method_configs_map_.find(wildcard_path); if (it != parsed_method_configs_map_.end()) return it->second; // Try default method config, if set. return default_method_config_vector_; }

here is the call stack

grpc_csharp_ext.x64.dll!grpc_core::ServiceConfig::GetMethodParsedConfigVector(const grpc_slice & path) Line 213 C++ grpc_csharp_ext.x64.dll!anonymous namespace'::call_data::call_data(grpc_call_element * elem, const anonymous-namespace’::channel_data & chand, const grpc_call_element_args & args) Line 133 C++ grpc_csharp_ext.x64.dll!message_size_init_call_elem(grpc_call_element * elem, const grpc_call_element_args * args) Line 279 C++ grpc_csharp_ext.x64.dll!grpc_call_stack_init(grpc_channel_stack * channel_stack, int initial_refs, void(*)(void *, grpc_error *) destroy, void * destroy_arg, const grpc_call_element_args * elem_args) Line 180 C++ grpc_csharp_ext.x64.dll!grpc_call_create(const grpc_call_create_args * args, grpc_call * * out_call) Line 424 C++ grpc_csharp_ext.x64.dll!`anonymous namespace’::accept_stream(void * cd, grpc_transport * __formal, const void * transport_server_data) Line 1087 C++ grpc_csharp_ext.x64.dll!grpc_chttp2_parsing_accept_stream(grpc_chttp2_transport * t, unsigned int id) Line 774 C++ grpc_csharp_ext.x64.dll!init_header_frame_parser(grpc_chttp2_transport * t, int is_continuation) Line 646 C++ grpc_csharp_ext.x64.dll!init_frame_parser(grpc_chttp2_transport * t) Line 293 C++ grpc_csharp_ext.x64.dll!grpc_chttp2_perform_read(grpc_chttp2_transport * t, const grpc_slice & slice) Line 179 C++ grpc_csharp_ext.x64.dll!read_action_locked(void * tp, grpc_error * error) Line 2528 C++ grpc_csharp_ext.x64.dll!grpc_combiner_continue_exec_ctx() Line 237 C++ grpc_csharp_ext.x64.dll!grpc_core::ExecCtx::Flush() Line 156 C++ grpc_csharp_ext.x64.dll!pollset_work(grpc_pollset * pollset, grpc_pollset_worker * * worker_hdl, __int64 deadline) Line 132 C++ grpc_csharp_ext.x64.dll!cq_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1059 C++ grpc_csharp_ext.x64.dll!grpc_completion_queue_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) Line 1135 C++ grpc_csharp_ext.x64.dll!grpcsharp_completion_queue_next(grpc_completion_queue * cq) Line 411 C [Managed to Native Transition] Grpc.Core.dll!Grpc.Core.Internal.GrpcThreadPool.RunHandlerLoop(Grpc.Core.Internal.CompletionQueueSafeHandle cq, Grpc.Core.Profiling.IProfiler optionalProfiler) Line 172 C# mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 980 C# mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 928 C# mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 917 C# mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() Line 111 C# [Native to Managed Transition] kernel32.dll!BaseThreadInitThunk() Unknown ntdll.dll!RtlUserThreadStart() Unknown

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 24 (11 by maintainers)

Commits related to this issue

Most upvoted comments

we are also experiencing this in mono on linux, any info would be great

For 2.37.0 I get:

Test Run Aborted with error System.Exception: One or more errors occurred.
 ---> System.Exception: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Exception: An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)

For 2.38.0 and 2.39.1:

Grpc.Core.RpcException : Status(StatusCode="Unavailable", Detail="Attempt 1 failed on purpose", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1649291588.018000000","description":"Error received from peer ipv6:[::1]:63003","file":"..\..\..\src\core\lib\surface\call.cc","file_line":1067,"grpc_message":"Attempt 1 failed on purpose","grpc_status":14}")

At 2.40.0 the issue seems resolved and the test passes consistently.