serving: TFS doesn't handle grpc request deadline/timeout properly

Bug Report

System information

  • Linux 4.4.121-k8s #1 SMP Sun Mar 11 19:39:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux:
  • On all 1.3, 1.7 and 1.8 i can observe same behaviours:

Describe the problem

Even i set the deadline/timeout for all the gRPC as 30s, tensorflow serving doesn’t respect that, still queue/run all the requests until it reaches its end;

Thus in my go grpc client i can observe that all requests already returned with panic: rpc error: code = DeadlineExceeded desc = context deadline exceeded, but in TFS model server i still see incoming request for a long time, and the request duration(processing time) keep increasing from the normal around 30s to 1m, 2m, 3m … until it finishes all requests.

What i expect is the TFS model server ends the request and releases the resources immediately when the request already exceeds its deadline or the client was already gone.

Exact Steps to Reproduce

Just use whatever grpc client to send heavy request to TFS model server continuously, like 1 request per second, and last 1 minute like that, and for each request set a short deadline/timeout, then observe whether TFS model server respect the deadline/timeout.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 31 (8 by maintainers)

Most upvoted comments

Closing due to 14 days of inactivity - please do re-open if you manage to get stack trace for this 😃

Hi Inshi,

We haven’t heard from you for 7+ days now. We will close this for now assuming you no longer need help on this issue. If you do need help, please feel free to reopen this issue.

Thanks

we need thread stacks for all threads of the modelserver to help debug background activity (after RPC is terminated/cancelled from the client side).

afaik there is no debian package for pstack and lsstack – so installing these will be non-trivial (assuming they actually work). so please do not use these tools.

i’d recommend install elfutils package (apt install elfutils) on your system (where the modelserver is running) and then run /usr/bin/eu-stack -p <pid-of-modelserver> to dump thread stack of all threads.

ideally you want to collect stack traces (read: run eu-stack) after the RPC is done, but while the CPU usage is still high (on the server). this will help us see what (unnecessary) background activity is still happening.

Thanks Inshi. Just as an update, we are able to re-produce the time-out part. Folks are on the way to collect necessary stack traces and doing further investigation.