grpc: if no request for a long time, server died?

Hi

I run a server and found if no request for a long time, maybe 24 hours, then If I connect to server with same channel, will be a network error.

We use haproxy as load balance. How could client check health of the connection/channel? then how to fix channel and reconnect (with Python code and Java code)?

Thanks so much.

The error log in Java side below:

2016-02-27 15:24:29.786 [grpc-default-worker-ELG-3] ERROR io.netty.handler.codec.http2.Http2ConnectionHandler.error:181 - Sending GOAWAY failed: lastStreamId '0', errorCode '2', debugData 'connection timed out: internal-dev-internel-proxy-1365586739.cn-north-1.elb.amazonaws.com.cn/10.1.6.9:50052'. Forcing shutdown of the connection.
java.nio.channels.ClosedChannelException: null
2016-02-27 15:24:29.788 [http-bio-8080-exec-5] ERROR com.vipkid.security.impl.AuthorizeServiceImpl.getRoleList:55 - Server Error,User Token = 236d50b3-ea62-4174-ad65-c849166d14fa ,e={}
io.grpc.StatusRuntimeException: UNAVAILABLE
    at io.grpc.Status.asRuntimeException(Status.java:431) ~[grpc-core-0.13.1.jar:0.13.1]
    at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:208) ~[grpc-stub-0.13.1.jar:0.13.1]
    at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:141) ~[grpc-stub-0.13.1.jar:0.13.1]
    at com.vipkid.proto.service.AuthServiceGrpc$AuthServiceBlockingStub.getRoleList(AuthServiceGrpc.java:235) ~[auth-proto-1.1-20160226.061926-2.jar:na]
    at com.vipkid.client.service.AuthServiceClient.getRoleList(AuthServiceClient.java:68) ~[auth-client-1.1-20160226.071118-3.jar:na]
    at com.vipkid.security.impl.AuthorizeServiceImpl.getRoleList(AuthorizeServiceImpl.java:52) ~[AuthorizeServiceImpl.class:na]
    at sun.reflect.GeneratedMethodAccessor1046.invoke(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_65]
    at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_65]
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) [spring-aop-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) [spring-aop-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) [spring-aop-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:98) [spring-tx-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:262) [spring-tx-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:95) [spring-tx-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) [spring-aop-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207) [spring-aop-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at com.sun.proxy.$Proxy35.getRoleList(Unknown Source) [na:na]
    at com.vipkid.service.StaffAuthService.login(StaffAuthService.java:64) [StaffAuthService.class:na]
    at com.vipkid.service.StaffAuthService$$FastClassBySpringCGLIB$$f265537e.invoke(<generated>) [spring-core-4.0.6.RELEASE.jar:na]
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) [spring-core-4.0.6.RELEASE.jar:4.0.6.RELEASE]
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:708) [spring-aop-4.0.6.RELEASE.jar:4.0.6.RELEASE]

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 18 (7 by maintainers)

Most upvoted comments

I am using Python server implemented and Java Client.

It’s really easy to find where the problem is. If I run the client and do nothing, then in 30s or 1s which depends on AWS ELB setting, Client will get the message (Channel has beed shutdown/terminated by another endpoint) in log.

If we try to reuse channel this time, I will get the UNAVAILABLE.

In Java Client, I have to close channel and recreate a new channel object. Looks like it won’t reconnect.

But…

I have a Python client and Python server implemented, same on AWS ELB, if I have a IDEL connection, the first try I will get UNAVAILABLE also, but the second try will be ok. Just like what @giladwolff said.

And Yes. This is a client-side problem, but not that friendly.

Thanks and sorry for poor English. 😃

2016-05-26 8:29 GMT+08:00 Gilad Wolff notifications@github.com:

I believe I am hitting the same issue with a node js grpc client (0.14.1) and a java server (0.14.0). I create a client, and after a longish period of inactivity the first call on the client fails with UNAVAILABLE. If I retry I get a reply.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/grpc/grpc/issues/5468#issuecomment-221746675


http://www.guojing.me