micrometer: StatsD UDP publishing stops after failing to send lines

We have updated several applications to Spring Boot 2.1.8 and metric publishing using the StatsD meter registry has stopped. The actuator/metrics endpoint shows metrics being collected as expected. We publish to localhost on port 8125 at a 1m interval (all defaults).

The actuator/configprops endpoint shows the metrics publication is enabled for statsd.

management.metrics.export.statsd-org.springframework.boot.actuate.autoconfigure.metrics.export.statsd.StatsdProperties: {
  prefix: "management.metrics.export.statsd",
  properties: {
    flavor: "DATADOG",
    publishUnchangedMeters: true,
    port: 8125,
    pollingFrequency: {
    units: [
      "SECONDS",
      "NANOS"
    ]
    },
    host: "localhost",
    maxPacketLength: 1400,
    enabled: true
  }
}

Result summary of Boot and Micrometer versions

Spring Boot Version Micrometer Version Result
2.1.8 1.1.6 ❌ metrics not published to statsd
2.1.8 1.1.5 ✅ working
2.1.7 1.1.4 ✅ working
2.1.7 1.1.6 ❌ metrics not published to statsd

When running Micrometer 1.1.5 and 1.1.4, the Spring Boot java process shows an established connection on an ephemeral port sending to port 8125.

netstat -uv | grep 8125
udp        0      0 localhost:53613             localhost:8125              ESTABLISHED 
sudo netstat -lanp | grep 53613
udp        0      0 ::ffff:127.0.0.1:53613      ::ffff:127.0.0.1:8125       ESTABLISHED 3295/java     

UDP Traffic being received with a capture of the UDP traffic using tcpdump.

sudo tcpdump -i lo -n udp port 8125
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
19:17:34.053140 IP 127.0.0.1.53613 > 127.0.0.1.8125: UDP, length 1260
19:17:34.134113 IP 127.0.0.1.53613 > 127.0.0.1.8125: UDP, length 1261
19:17:34.199557 IP 127.0.0.1.53613 > 127.0.0.1.8125: UDP, length 1261

Utilizing the same command when running the same application with 1.1.6, the java process does not show any ephemeral port established and bound to 8125.

After changing the logging level to DEBUG for io.micrometer.shaded.reactor, I could see many onNextDropped debug messages output for each meter with its associated line data:

2019-09-17 21:51:12,895 DEBUG 3319 (http-nio-5000-exec-2) [Operators]  onNextDropped: method.timed:2.426269|ms|#class:LabelJdbcStore,exception:none,method:findLabeledPids

Since Operators uses multicast, that led me to look at ifconfig and finding that multicast is not turned on for the loopback interface for our AWS EC2 instances, but it is for eth0.

ifconfig - No Multicast on loopback

ifconfig
eth0      Link encap:Ethernet  HWaddr 0A:F3:B8:FE:4E:2E  
          inet addr:10.36.73.178  Bcast:10.36.79.255  Mask:255.255.240.0
          inet6 addr: fe80::8f3:b8ff:fefe:4e2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:5908501 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3941816 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:6430044133 (5.9 GiB)  TX bytes:934258311 (890.9 MiB)
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:2918885 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2918885 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5714060914 (5.3 GiB)  TX bytes:5714060914 (5.3 GiB)

I turned on multicast for the loopback and restarted the Spring Boot app and metrics are now being published.

Turn on multicast

sudo ifconfig lo multicast

ifconfig lo
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MULTICAST  MTU:65536  Metric:1
          RX packets:3058498 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3058498 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5988434965 (5.5 GiB)  TX bytes:5988434965 (5.5 GiB)

When the loopback interface is configured without multicast, metrics are published when using Micrometer 1.1.5. The eth0 interface has multicast turn on so I’m wondering if that interface was being selected in 1.1.5 and now it isn’t in 1.1.6. I haven’t been able to narrow down what change in 1.1.6 caused this to stop working.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 6
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@shakuzen I have deployed one of our affected applications with the 1.1.7 release and it seems to have remedied the reported problem seen with 1.1.6. For us, it did not take days for the break to occur as suggested by @breun above, but was immediate upon deploy. Either way, I will watch it for a while to make sure it doesn’t stop working.

Thanks for leaving this open and continuing to look into the root cause in reactor-netty and netty.

If I were to guess, the upgrade from reactor-netty 0.8.1 to 0.8.11 (see #1574 and #1561) is a likely candidate for the difference in behavior here. In previous patches, we never upgraded the shaded version of reactor-netty we were using. I will have to look into this more to confirm the root cause and what to do about this, though.

Thank you for the thorough investigation @edwardsre and write-up of the issue.