etcd: gRPC v1.7.3 transport "panic: send on closed channel" on *serverHandlerTransport

Hi

Similar to #8595 we still get a panic with v3.2.10. Here the log:

Nov 22 09:44:38 p1-linux-mlsu007 etcd[124081]: compacted raft log at 6678643315
Nov 22 09:44:46 p1-linux-mlsu007 etcd[124081]: apply entries took too long [3.216781052s for 2 entries]
Nov 22 09:44:46 p1-linux-mlsu007 etcd[124081]: avoid queries with large range/delete range!
Nov 22 09:44:47 p1-linux-mlsu007 etcd[124081]: purged file /appl/etcd/data/p1-linux-mlsu007/member/snap/0000000000003b38-000000018e1365ae.snap successfully
Nov 22 09:44:47 p1-linux-mlsu007 etcd[124081]: purged file /appl/etcd/data/p1-linux-mlsu007/member/snap/0000000000003b38-000000018e137937.snap successfully
Nov 22 09:44:47 p1-linux-mlsu007 etcd[124081]: purged file /appl/etcd/data/p1-linux-mlsu007/member/snap/0000000000003b38-000000018e138cc2.snap successfully
Nov 22 09:44:47 p1-linux-mlsu007 etcd[124081]: purged file /appl/etcd/data/p1-linux-mlsu007/member/snap/0000000000003b38-000000018e13a04f.snap successfully
Nov 22 09:44:47 p1-linux-mlsu007 etcd[124081]: purged file /appl/etcd/data/p1-linux-mlsu007/member/snap/0000000000003b38-000000018e13b3da.snap successfully
Nov 22 09:44:48 p1-linux-mlsu007 etcd[124081]: failed to send out heartbeat on time (exceeded the 150ms timeout for 56.359018ms)
Nov 22 09:44:48 p1-linux-mlsu007 etcd[124081]: server is likely overloaded
Nov 22 09:44:48 p1-linux-mlsu007 etcd[124081]: failed to send out heartbeat on time (exceeded the 150ms timeout for 56.399292ms)
Nov 22 09:44:48 p1-linux-mlsu007 etcd[124081]: server is likely overloaded
Nov 22 09:44:59 p1-linux-mlsu007 etcd[124081]: start to snapshot (applied: 6678653654, lastsnap: 6678648315)
Nov 22 09:44:59 p1-linux-mlsu007 etcd[124081]: panic: send on closed channel
Nov 22 09:44:59 p1-linux-mlsu007 systemd[1]: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 22 09:44:59 p1-linux-mlsu007 systemd[1]: Unit etcd.service entered failed state.
Nov 22 09:44:59 p1-linux-mlsu007 systemd[1]: etcd.service failed.
Nov 22 09:45:09 p1-linux-mlsu007 systemd[1]: etcd.service holdoff time over, scheduling restart.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 26 (23 by maintainers)

Most upvoted comments

I have this same issue in 3.2.17+dfsg-1 (ubuntu 18.04)… my entire cluster dies regularly 😦 It seems the fix in 3.2.11 did not resolve the issue?