fluentd: sometimes out_forward heartbeat(tcp) failed with fluentd>0.14.9

sometimes out_forward heartbeat(tcp) failed

when sender fluentd: 0.14.11 and 0.14.13 aggregator fluentd: 0.14.11 and 0.14.13

This problem does not have occurred in sender v0.14.8.

fluentd.log

2017-03-01 06:49:37 +0000 [warn]: recovered forwarding server '' host="log-aggregator.local" port=10224
2017-03-01 06:49:37 +0000 [warn]: recovered forwarding server '' host="log-aggregator.local" port=10224
2017-03-01 06:49:45 +0000 [warn]: retry succeeded. chunk_id="549a573de08a4928f31d10f236155293"
2017-03-01 06:49:51 +0000 [warn]: retry succeeded. chunk_id="549a579f25b7d32e5db46d5b6ab57ce5"
2017-03-01 06:54:18 +0000 [warn]: detached forwarding server '' host="log-aggregator.local" port=10224 hard_timeout=true
2017-03-01 06:54:18 +0000 [warn]: buffer flush took longer time than slow_flush_log_threshold: elapsed_time=60.00606061401777 slow_flush_log_threshold=20.0 plugin_id="object:2b093717af00"
2017-03-01 06:55:18 +0000 [warn]: failed to flush the buffer. retry_time=0 next_retry_seconds=2017-03-01 06:55:19 +0000 chunk="549a5c80fadbc463279c0c9408e0c210" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2017-03-01 06:56:18 +0000 [warn]: suppressed same stacktrace
2017-03-01 06:56:18 +0000 [warn]: failed to flush the buffer. retry_time=0 next_retry_seconds=2017-03-01 06:55:19 +0000 chunk="549a5c80fadbc463279c0c9408e0c210" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2017-03-01 06:56:18 +0000 [warn]: suppressed same stacktrace
2017-03-01 06:57:18 +0000 [warn]: detached forwarding server '' host="log-aggregator.local" port=10224 hard_timeout=true
2017-03-01 07:00:18 +0000 [warn]: failed to flush the buffer. retry_time=1 next_retry_seconds=2017-03-01 06:56:19 +0000 chunk="549a5c80fadbc463279c0c9408e0c210" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2017-03-01 07:01:18 +0000 [warn]: suppressed same stacktrace
2017-03-01 07:07:18 +0000 [warn]: failed to flush the buffer. retry_time=2 next_retry_seconds=2017-03-01 07:03:20 +0000 chunk="549a5c80fadbc463279c0c9408e0c210" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2017-03-01 07:07:18 +0000 [warn]: suppressed same stacktrace
2017-03-01 07:08:18 +0000 [warn]: buffer flush took longer time than slow_flush_log_threshold: elapsed_time=780.0749890550505 slow_flush_log_threshold=20.0 plugin_id="object:2b093717af00"
2017-03-01 07:08:18 +0000 [warn]: failed to flush the buffer. retry_time=3 next_retry_seconds=2017-03-01 07:07:22 +0000 chunk="549a5c80fadbc463279c0c9408e0c210" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2017-03-01 07:08:18 +0000 [warn]: suppressed same stacktrace

tcpdump in sender-server

sender-server$ tcpdump -s 0 -nn port 10224
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
07:05:31.338389 IP 172.17.0.3.36562 > 10.80.1.45.10224: Flags [F.], seq 2513917648, ack 2404494309, win 229, options [nop,nop,TS val 136819588 ecr 32246235], length 0
07:05:44.186393 IP 172.17.0.3.36562 > 10.80.1.45.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136822800 ecr 32246235], length 0
07:06:09.850392 IP 172.17.0.3.36562 > 10.80.1.45.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136829216 ecr 32246235], length 0
07:06:18.528050 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 230348335, ack 1336453961, win 229, options [nop,nop,TS val 136831385 ecr 32070779], length 0
07:06:18.730397 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136831436 ecr 32070779], length 0
07:06:18.934388 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136831487 ecr 32070779], length 0
07:06:19.342383 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136831589 ecr 32070779], length 0
07:06:20.158387 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136831793 ecr 32070779], length 0
07:06:21.794381 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136832202 ecr 32070779], length 0
07:06:25.066536 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136833020 ecr 32070779], length 0
07:06:31.610412 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136834656 ecr 32070779], length 0
07:06:44.698389 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136837928 ecr 32070779], length 0
07:07:10.906398 IP 172.17.0.3.48160 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136844480 ecr 32070779], length 0
07:07:18.527420 IP 10.80.19.80.54804 > 10.80.31.242.10224: Flags [S], seq 1102294848, win 29200, options [mss 1460,sackOK,TS val 136846385 ecr 0,nop,wscale 7], length 0
07:07:18.527641 IP 10.80.31.242.10224 > 10.80.19.80.54804: Flags [S.], seq 2300741022, ack 1102294849, win 26847, options [mss 8961,sackOK,TS val 32290337 ecr 136846385,nop,wscale 8], length 0
07:07:18.527676 IP 10.80.19.80.54804 > 10.80.31.242.10224: Flags [.], ack 1, win 229, options [nop,nop,TS val 136846385 ecr 32290337], length 0
07:07:18.534713 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 1614423095, ack 1935553673, win 229, options [nop,nop,TS val 136846387 ecr 32071528], length 0
07:07:18.738405 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136846438 ecr 32071528], length 0
07:07:18.942403 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136846489 ecr 32071528], length 0
07:07:19.350413 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136846591 ecr 32071528], length 0
07:07:20.166387 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136846795 ecr 32071528], length 0
07:07:21.802406 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136847204 ecr 32071528], length 0
07:07:25.074384 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136848022 ecr 32071528], length 0
07:07:31.610408 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136849656 ecr 32071528], length 0
07:07:44.698388 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136852928 ecr 32071528], length 0
07:08:10.874391 IP 172.17.0.3.48310 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136859472 ecr 32071528], length 0
07:08:18.293572 IP 10.80.31.242.10224 > 10.80.19.80.54804: Flags [F.], seq 1, ack 1, win 105, options [nop,nop,TS val 32305278 ecr 136846385], length 0
07:08:18.294385 IP 10.80.19.80.54804 > 10.80.31.242.10224: Flags [.], ack 2, win 229, options [nop,nop,TS val 136861327 ecr 32305278], length 0
07:08:18.535001 IP 10.80.19.80.54804 > 10.80.31.242.10224: Flags [F.], seq 1, ack 2, win 229, options [nop,nop,TS val 136861387 ecr 32305278], length 0
07:08:18.535208 IP 10.80.31.242.10224 > 10.80.19.80.54804: Flags [.], ack 2, win 105, options [nop,nop,TS val 32305339 ecr 136861387], length 0
07:08:18.539957 IP 10.80.19.80.50448 > 10.80.1.45.10224: Flags [S], seq 2820230188, win 29200, options [mss 1460,sackOK,TS val 136861388 ecr 0,nop,wscale 7], length 0
07:08:18.542520 IP 10.80.1.45.10224 > 10.80.19.80.50448: Flags [S.], seq 2297201150, ack 2820230189, win 26847, options [mss 8961,sackOK,TS val 32306338 ecr 136861388,nop,wscale 8], length 0
07:08:18.542546 IP 10.80.19.80.50448 > 10.80.1.45.10224: Flags [.], ack 1, win 229, options [nop,nop,TS val 136861389 ecr 32306338], length 0
07:08:18.542739 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 2020798759, ack 1505973842, win 229, options [nop,nop,TS val 136861389 ecr 32072279], length 0
07:08:18.746408 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136861440 ecr 32072279], length 0
07:08:18.950400 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136861491 ecr 32072279], length 0
07:08:19.362390 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136861594 ecr 32072279], length 0
07:08:20.178382 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136861798 ecr 32072279], length 0
07:08:21.814400 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136862207 ecr 32072279], length 0
07:08:25.082385 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136863024 ecr 32072279], length 0
07:08:31.630389 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136864661 ecr 32072279], length 0
07:08:44.730388 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136867936 ecr 32072279], length 0
07:09:10.906437 IP 172.17.0.3.48410 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136874480 ecr 32072279], length 0
07:09:18.130913 IP 10.80.1.45.10224 > 10.80.19.80.50448: Flags [F.], seq 1, ack 1, win 105, options [nop,nop,TS val 32321235 ecr 136861389], length 0
07:09:18.134411 IP 10.80.19.80.50448 > 10.80.1.45.10224: Flags [.], ack 2, win 229, options [nop,nop,TS val 136876287 ecr 32321235], length 0
07:09:18.567265 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 1740187083, ack 3998838662, win 229, options [nop,nop,TS val 136876395 ecr 32074529], length 0
07:09:18.770410 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136876446 ecr 32074529], length 0
07:09:18.974390 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136876497 ecr 32074529], length 0
07:09:19.382381 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136876599 ecr 32074529], length 0
07:09:20.198396 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136876803 ecr 32074529], length 0
07:09:21.834405 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136877212 ecr 32074529], length 0
07:09:25.106399 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136878030 ecr 32074529], length 0
07:09:31.642416 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136879664 ecr 32074529], length 0
07:09:44.730373 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136882936 ecr 32074529], length 0
07:10:10.874392 IP 172.17.0.3.48670 > 10.80.31.242.10224: Flags [F.], seq 0, ack 1, win 229, options [nop,nop,TS val 136889472 ecr 32074529], length 0

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 16 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for the report. So v0.14.10 changes cause this problem. I will check socket helper and related code.

I’m not sure the expected sequence on this situation, client side close, but Problem2 sequence also happens on other situation. For example, python HTTP server and ruby TCP client also has same sequence.

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

$ ruby -rsocket -e 'sock = TCPSocket.open("127.0.0.1", 8000); sock.close'

$ sudo tcpdump -i lo port 8000
09:09:03.904187 IP localhost.57868 > localhost.8000: Flags [S], seq 4262166317, win 43690, options [mss 65495,sackOK,TS val 6988912 ecr 0,nop,wscale 7], length 0
09:09:03.904202 IP localhost.8000 > localhost.57868: Flags [S.], seq 2989685197, ack 4262166318, win 43690, options [mss 65495,sackOK,TS val 6988912 ecr 6988912,nop,wscale 7], length 0
09:09:03.904214 IP localhost.57868 > localhost.8000: Flags [.], ack 1, win 342, options [nop,nop,TS val 6988912 ecr 6988912], length 0
09:09:03.904238 IP localhost.57868 > localhost.8000: Flags [F.], seq 1, ack 1, win 342, options [nop,nop,TS val 6988912 ecr 6988912], length 0
09:09:03.904445 IP localhost.8000 > localhost.57868: Flags [F.], seq 1, ack 2, win 342, options [nop,nop,TS val 6988912 ecr 6988912], length 0
09:09:03.904455 IP localhost.57868 > localhost.8000: Flags [.], ack 2, win 342, options [nop,nop,TS val 6988912 ecr 6988912], length 0

So with this situation, first FIN’s ACK is skipped on Linux? Mac has different sequence

01:13:24.858278 IP localhost.59517 > localhost.irdmi: Flags [S], seq 3075251422, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 518559330 ecr 0,sackOK,eol], length 0
01:13:24.858392 IP localhost.irdmi > localhost.59517: Flags [S.], seq 1135067205, ack 3075251423, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 518559330 ecr 518559330,sackOK,eol], length 0
01:13:24.858406 IP localhost.59517 > localhost.irdmi: Flags [.], ack 1, win 12759, options [nop,nop,TS val 518559330 ecr 518559330], length 0
01:13:24.858416 IP localhost.irdmi > localhost.59517: Flags [.], ack 1, win 12759, options [nop,nop,TS val 518559330 ecr 518559330], length 0
01:13:24.858439 IP localhost.59517 > localhost.irdmi: Flags [F.], seq 1, ack 1, win 12759, options [nop,nop,TS val 518559330 ecr 518559330], length 0
01:13:24.858453 IP localhost.irdmi > localhost.59517: Flags [.], ack 2, win 12759, options [nop,nop,TS val 518559330 ecr 518559330], length 0
01:13:24.858457 IP localhost.59517 > localhost.irdmi: Flags [.], ack 1, win 12759, options [nop,nop,TS val 518559330 ecr 518559330], length 0
01:13:24.858666 IP localhost.irdmi > localhost.59517: Flags [F.], seq 1, ack 2, win 12759, options [nop,nop,TS val 518559330 ecr 518559330], length 0
01:13:24.859027 IP localhost.59517 > localhost.irdmi: Flags [.], ack 2, win 12759, options [nop,nop,TS val 518559330 ecr 518559330], length 0

@bungoume Sorry for the late. https://github.com/fluent/fluentd/pull/1581 I wrote a patch. Could you test this?