grpc: Python Client 0.14.0 don't recover from connection failure

Issue

Python client version 0.14.0 don’t recover and reconnect to server if any failures occurs. Problem can’t be reproduced in 0.13.1.

Either if the client is started and the server isn’t available. Or if the server restart while client is running.

Tests

Tested on gRPC latest pypi version 0.14.0 as .1 isn’t in pypi yet. On both OSX ElCapitan and Linux (Ubuntu trusty). And using Python and Golang server. With and without IPv6.

To show the problem, I modified greeter_client.py to loop:

diff --git a/examples/python/helloworld/greeter_client.py b/examples/python/helloworld/greeter_client.py
index 40d637f..4b53dcb 100644
--- a/examples/python/helloworld/greeter_client.py
+++ b/examples/python/helloworld/greeter_client.py
@@ -30,6 +30,7 @@
 """The Python implementation of the GRPC helloworld.Greeter client."""

 from __future__ import print_function
+import time

 from grpc.beta import implementations

@@ -41,8 +42,10 @@ _TIMEOUT_SECONDS = 10
 def run():
   channel = implementations.insecure_channel('localhost', 50051)
   stub = helloworld_pb2.beta_create_Greeter_stub(channel)
-  response = stub.SayHello(helloworld_pb2.HelloRequest(name='you'), _TIMEOUT_SECONDS)
-  print("Greeter client received: " + response.message)
+  while True:
+    time.sleep(1)
+    response = stub.SayHello(helloworld_pb2.HelloRequest(name='you'), _TIMEOUT_SECONDS)
+    print("Greeter client received: " + response.message)
(test2) [11:26] ~/test/src/grpc/examples/python/helloworld python greeter_client.py
Greeter client received: Hello, you!
Greeter client received: Hello, you!
Greeter client received: Hello, you!
Greeter client received: Hello, you!
E0531 11:26:07.204037000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 11:26:07.204308000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
Traceback (most recent call last):
  File "greeter_client.py", line 52, in <module>
    run()
  File "greeter_client.py", line 47, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name='you'), _TIMEOUT_SECONDS)
  File "/Users/test2/lib/python2.7/site-packages/grpc/framework/crust/implementations.py", line 75, in __call__
    protocol_options, metadata, request)
  File "/Users/ben/test2/lib/python2.7/site-packages/grpc/framework/crust/_calls.py", line 109, in blocking_unary_unary
    return next(rendezvous)
  File "/Users/test2/lib/python2.7/site-packages/grpc/framework/crust/_control.py", line 415, in next
    raise self._termination.abortion_error
grpc.framework.interfaces.face.face.NetworkError: NetworkError(code=StatusCode.UNAVAILABLE, details="")
E0531 11:26:12.321777000 123145306509312 tcp_client_posix.c:191] failed to connect to 'ipv4:127.0.0.1:50051': timeout occurred
E0531 11:26:12.425114000 123145306509312 tcp_client_posix.c:191] failed to connect to 'ipv6:[::1]:50051': timeout occurred
E0531 11:26:16.325317000 123145306509312 tcp_client_posix.c:191] failed to connect to 'ipv4:127.0.0.1:50051': timeout occurred
E0531 11:26:17.189443000 123145306509312 tcp_client_posix.c:191] failed to connect to 'ipv6:[::1]:50051': timeout occurred

First it’s working, then I stop server and start it. It never recover when server is started.

after this, the process must be killed (see #3820).

Go client output:

2016/05/31 11:21:35 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::1]:50051: getsockopt: connection refused"; Reconnecting to "localhost:50051"
2016/05/31 11:21:37 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::1]:50051: getsockopt: connection refused"; Reconnecting to "localhost:50051"
2016/05/31 11:21:40 Greeting: Hello, world!

it recovered.

How to reproduce

Run unchanged greeter_client.py before greeter_server.py.

the client never connecter ever after that,.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 4
  • Comments: 16 (4 by maintainers)

Most upvoted comments

@bclermont Looks like this problem started from 0.14.0 and really need to fix.

If we close server, client will not recover from connection failure, it try to reconnect but not success.

Try this:

start a server and client, then close server, have a client call (will return network error), then restart the server.

Now, if you are using netstat -an|grep 50051, you should find out two ESTABLISHED connections. But wait some seconds, you will find out the port is changing!

Looks like there is a auto connection recovery mechanism. But has a bug, client try to reconnect (success) but looks like client don’t think is success.

I just tried C++ client as it also use the same C library and it recovered correctly:

~/test/src/grpc/examples/cpp/helloworld git diff greeter_client.cc
diff --git a/examples/cpp/helloworld/greeter_client.cc b/examples/cpp/helloworld/greeter_client.cc
index bf3b63c..e9eaef2 100644
--- a/examples/cpp/helloworld/greeter_client.cc
+++ b/examples/cpp/helloworld/greeter_client.cc
@@ -34,6 +34,8 @@
 #include <iostream>
 #include <memory>
 #include <string>
+#include <unistd.h>
+

 #include <grpc++/grpc++.h>

@@ -88,8 +90,11 @@ int main(int argc, char** argv) {
   GreeterClient greeter(grpc::CreateChannel(
       "localhost:50051", grpc::InsecureChannelCredentials()));
   std::string user("world");
-  std::string reply = greeter.SayHello(user);
-  std::cout << "Greeter received: " << reply << std::endl;
+  while(1) {
+    std::string reply = greeter.SayHello(user);
+    std::cout << "Greeter received: " << reply << std::endl;
+    usleep(1000000);
+  }

   return 0;
 }

output:

~/test/src/grpc/examples/cpp/helloworld ./greeter_client
D0531 11:53:53.948001000 140735317880832 ev_posix.c:100] Using polling engine: poll
Greeter received: Hello world
Greeter received: Hello world
Greeter received: Hello world
Greeter received: Hello world
Greeter received: RPC failed
E0531 11:53:58.964509000 140735317880832 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 11:53:58.964788000 140735317880832 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
Greeter received: RPC failed
E0531 11:54:00.968349000 140735317880832 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 11:54:00.968441000 140735317880832 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
Greeter received: RPC failed
Greeter received: Hello world
Greeter received: Hello world
Greeter received: Hello world

@bclermont looks like master fixed this problem, master grpcio version is 0.15.0.dev0.

Same problem here, mark.

As 0.14.1 isn’t in pypi, I couldn’t test it

I tried gRPC 0.13.1 and the problem isn’t there:

(test3) [12:52] ~/test/src/grpc/examples/python/helloworld python greeter_client.py
E0531 12:52:55.225108000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 12:52:55.225396000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
E0531 12:52:56.225564000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 12:52:56.225672000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
E0531 12:52:57.536165000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 12:52:57.536254000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
E0531 12:52:59.830059000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 12:52:59.830173000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
Greeter client received: Hello you
Greeter client received: Hello you
Greeter client received: Hello you
Greeter client received: Hello you
Greeter client received: Hello you
Greeter client received: Hello you
E0531 12:53:10.162397000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 12:53:10.162692000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
E0531 12:53:11.165139000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv6:[::1]:50051': socket error: connection refused
E0531 12:53:11.165253000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused

maybe it’s related to:

E0531 11:26:07.204308000 123145306509312 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:50051': socket error: connection refused
Traceback (most recent call last):
  File "greeter_client.py", line 52, in <module>
    run()
  File "greeter_client.py", line 47, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name='you'), _TIMEOUT_SECONDS)
  File "/Users/test2/lib/python2.7/site-packages/grpc/framework/crust/implementations.py", line 75, in __call__
    protocol_options, metadata, request)
  File "/Users/ben/test2/lib/python2.7/site-packages/grpc/framework/crust/_calls.py", line 109, in blocking_unary_unary
    return next(rendezvous)
  File "/Users/test2/lib/python2.7/site-packages/grpc/framework/crust/_control.py", line 415, in next
    raise self._termination.abortion_error
grpc.framework.interfaces.face.face.NetworkError: NetworkError(code=StatusCode.UNAVAILABLE, details="")
E0531 11:26:12.321777000 123145306509312 tcp_client_posix.c:191] failed to connect to 'ipv4:127.0.0.1:50051': timeout occurred

after the exception is raised, the log switch from tcp_client_posix.c:173 to tcp_client_posix.c:191

I have the same problem. I found that self._down() can’t exit the threads pool.

@kpayson64 Yes, sure, I think fail fast is better than wait for n times then fail. Please fix this bug ASAP, Python client is unuseable.

hi

master has fixed. Install from source should work. this is bad.

BUT install from source with tag 0.14.1 still include this bug.

We downgrade to 0.13.1.

Hope 0.14.1 will fix this. (When release to pip)

Please TEST it.

Thanks

GuoJing

http://guojing.me http://guojing.me/ The one who love coding. –> ret <– echo null rm -rf /

在 2016年6月2日,下午1:26,Kailash Sethuraman notifications@github.com 写道:

Thanks for flagging this, will take a look asap. /cc @soltanmm https://github.com/soltanmm — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grpc/grpc/issues/6730#issuecomment-223198543, or mute the thread https://github.com/notifications/unsubscribe/AAWKQa0akOTQyejCoXq23kxOnTSHh4bGks5qHmlzgaJpZM4IqJvJ.