micrometer: StatsD UDP publishing stops after failing to send lines
We have updated several applications to Spring Boot 2.1.8
and metric publishing using the StatsD meter registry has stopped. The actuator/metrics
endpoint shows metrics being collected as expected. We publish to localhost
on port 8125
at a 1m
interval (all defaults).
The actuator/configprops
endpoint shows the metrics publication is enabled for statsd.
management.metrics.export.statsd-org.springframework.boot.actuate.autoconfigure.metrics.export.statsd.StatsdProperties: {
prefix: "management.metrics.export.statsd",
properties: {
flavor: "DATADOG",
publishUnchangedMeters: true,
port: 8125,
pollingFrequency: {
units: [
"SECONDS",
"NANOS"
]
},
host: "localhost",
maxPacketLength: 1400,
enabled: true
}
}
Result summary of Boot and Micrometer versions
Spring Boot Version | Micrometer Version | Result |
---|---|---|
2.1.8 | 1.1.6 | ❌ metrics not published to statsd |
2.1.8 | 1.1.5 | ✅ working |
2.1.7 | 1.1.4 | ✅ working |
2.1.7 | 1.1.6 | ❌ metrics not published to statsd |
When running Micrometer 1.1.5
and 1.1.4
, the Spring Boot java process shows an established connection on an ephemeral port sending to port 8125.
netstat -uv | grep 8125
udp 0 0 localhost:53613 localhost:8125 ESTABLISHED
sudo netstat -lanp | grep 53613
udp 0 0 ::ffff:127.0.0.1:53613 ::ffff:127.0.0.1:8125 ESTABLISHED 3295/java
UDP Traffic being received with a capture of the UDP traffic using tcpdump.
sudo tcpdump -i lo -n udp port 8125
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
19:17:34.053140 IP 127.0.0.1.53613 > 127.0.0.1.8125: UDP, length 1260
19:17:34.134113 IP 127.0.0.1.53613 > 127.0.0.1.8125: UDP, length 1261
19:17:34.199557 IP 127.0.0.1.53613 > 127.0.0.1.8125: UDP, length 1261
Utilizing the same command when running the same application with 1.1.6
, the java process does not show any ephemeral port established and bound to 8125.
After changing the logging level to DEBUG
for io.micrometer.shaded.reactor
, I could see many onNextDropped
debug messages output for each meter with its associated line data:
2019-09-17 21:51:12,895 DEBUG 3319 (http-nio-5000-exec-2) [Operators] onNextDropped: method.timed:2.426269|ms|#class:LabelJdbcStore,exception:none,method:findLabeledPids
Since Operators
uses multicast, that led me to look at ifconfig
and finding that multicast is not turned on for the loopback interface for our AWS EC2 instances, but it is for eth0.
ifconfig - No Multicast on loopback
ifconfig
eth0 Link encap:Ethernet HWaddr 0A:F3:B8:FE:4E:2E
inet addr:10.36.73.178 Bcast:10.36.79.255 Mask:255.255.240.0
inet6 addr: fe80::8f3:b8ff:fefe:4e2e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1
RX packets:5908501 errors:0 dropped:0 overruns:0 frame:0
TX packets:3941816 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6430044133 (5.9 GiB) TX bytes:934258311 (890.9 MiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:2918885 errors:0 dropped:0 overruns:0 frame:0
TX packets:2918885 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5714060914 (5.3 GiB) TX bytes:5714060914 (5.3 GiB)
I turned on multicast for the loopback and restarted the Spring Boot app and metrics are now being published.
Turn on multicast
sudo ifconfig lo multicast
ifconfig lo
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MULTICAST MTU:65536 Metric:1
RX packets:3058498 errors:0 dropped:0 overruns:0 frame:0
TX packets:3058498 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5988434965 (5.5 GiB) TX bytes:5988434965 (5.5 GiB)
When the loopback interface is configured without multicast, metrics are published when using Micrometer 1.1.5
. The eth0
interface has multicast turn on so I’m wondering if that interface was being selected in 1.1.5
and now it isn’t in 1.1.6
. I haven’t been able to narrow down what change in 1.1.6
caused this to stop working.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 6
- Comments: 17 (8 by maintainers)
Commits related to this issue
- Downgrade to reactor-netty 0.8.6 / netty 4.1.34 This is a workaround to #1591 until we understand the root cause and it is fixed. Resolves #1613 — committed to micrometer-metrics/micrometer by shakuzen 5 years ago
- Rewrite Statsd publishing implementation (#1782) Upgrades to the latest version of reactor-core and reactor-netty. Uses a `DirectProcessor` instead of a `UnicastProcessor` that can only be subscribed... — committed to micrometer-metrics/micrometer by shakuzen 4 years ago
@shakuzen I have deployed one of our affected applications with the
1.1.7
release and it seems to have remedied the reported problem seen with1.1.6
. For us, it did not take days for the break to occur as suggested by @breun above, but was immediate upon deploy. Either way, I will watch it for a while to make sure it doesn’t stop working.Thanks for leaving this open and continuing to look into the root cause in reactor-netty and netty.
If I were to guess, the upgrade from reactor-netty 0.8.1 to 0.8.11 (see #1574 and #1561) is a likely candidate for the difference in behavior here. In previous patches, we never upgraded the shaded version of reactor-netty we were using. I will have to look into this more to confirm the root cause and what to do about this, though.
Thank you for the thorough investigation @edwardsre and write-up of the issue.