node-newrelic: Undici instrumentation breaks fastify throughput tracking

Description

We enabled undici instrumentation in our production and saw huge decrease in our service throughput, but all other KPIs were stable (which means the issue is the metric, the actual throughput didn’t drop)

We are using fastify with Node 16, our base NR configuration is

NEW_RELIC_FEATURE_FLAG_NEW_PROMISE_TRACKING=true NEW_RELIC_FEATURE_FLAG_UNRESOLVED_PROMISE_CLEANUP=false

We enabled undici instrumentation (newrelic SDK v8.5.1) using these flags:

NEW_RELIC_FEATURE_FLAG_UNDICI_INSTRUMENTATION=true NEW_RELIC_FEATURE_FLAG_UNDICI_ASYNC_TRACKING=false

this caused the first drop u see on the screenshot.

image

our next try was to set undici_asyn_tracking to “true”

NEW_RELIC_FEATURE_FLAG_UNDICI_INSTRUMENTATION=true NEW_RELIC_FEATURE_FLAG_UNDICI_ASYNC_TRACKING=true

This change increased the throughput for a while after app restart, but then the throughput dropped again (this is the second drop on the screenshot)

image

after disabling undici instrumentation, throughput got back to normal.

Expected Behavior

I don’t see how an http client instrumentation(undici) in my service(fastify), can cause the throughput metric to get broken (as I mentioned the real throughput stayed the same, its just the metrics report that got broken)

Your Environment

Fastify v3.22.1 Node 16.11 undici 4.7.1

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

@artur-ma Just wanted to give you a heads up that we’re considering work related to this in the upcoming roadmap. We don’t have a definitive timeline yet. I’ll update when we do. Thanks!

@artur-ma I tried saving references to the channels and ran our test and it is still an issue. However, that’s not to say your issue has not been resolved. The repro case we created is very brute force. So I’m going to make this suggested change and perhaps you could retest, or could you apply the patch below and test yourself without us releasing a version of the agent?

commit c43cbf2bd168e6901124bc6bcc850c7ffd1691ca
Author: Bob Evans <robert.evans25@gmail.com>
Date:   Wed Mar 30 12:09:24 2022 -0400

    updated diag channel subscriptions to be hard references

diff --git a/lib/instrumentation/undici.js b/lib/instrumentation/undici.js
index 17eafdc6..d1b2a8b2 100644
--- a/lib/instrumentation/undici.js
+++ b/lib/instrumentation/undici.js
@@ -69,7 +69,8 @@ module.exports = function addUndiciChannels(agent, undici, modName, shim) {
    * @param {object} params
    * @param {object} params.request undici request object
    */
-  diagnosticsChannel.channel('undici:request:create').subscribe(({ request }) => {
+  const requestCreate = diagnosticsChannel.channel('undici:request:create')
+  requestCreate.subscribe(({ request }) => {
     const parent = getParentSegment()
     request[SYMBOLS.PARENT_SEGMENT] = parent
     if (!parent || (parent && parent.opaque)) {
@@ -113,7 +114,8 @@ module.exports = function addUndiciChannels(agent, undici, modName, shim) {
    * @param {object} params.request undicie request object
    * @param {TLSSocket | net.Socket} socket active socket connection
    */
-  diagnosticsChannel.channel('undici:client:sendHeaders').subscribe(({ request, socket }) => {
+  const sendHeaders = diagnosticsChannel.channel('undici:client:sendHeaders')
+  sendHeaders.subscribe(({ request, socket }) => {
     const parentSegment = request[SYMBOLS.PARENT_SEGMENT]
     if (!parentSegment || (parentSegment && parentSegment.opaque)) {
       return
@@ -156,7 +158,8 @@ module.exports = function addUndiciChannels(agent, undici, modName, shim) {
    * @param {object} params.request undici request object
    * @param {object} params.response { statusCode, headers, statusText }
    */
-  diagnosticsChannel.channel('undici:request:headers').subscribe(({ request, response }) => {
+  const requestHeaders = diagnosticsChannel.channel('undici:request:headers')
+  requestHeaders.subscribe(({ request, response }) => {
     const activeSegment = request[SYMBOLS.SEGMENT]
     if (!activeSegment) {
       return
@@ -187,7 +190,8 @@ module.exports = function addUndiciChannels(agent, undici, modName, shim) {
    *
    * @param {object} params.request undici request object
    */
-  diagnosticsChannel.channel('undici:request:trailers').subscribe(({ request }) => {
+  const requestTrailers = diagnosticsChannel.channel('undici:request:trailers')
+  requestTrailers.subscribe(({ request }) => {
     endAndRestoreSegment(request)
   })
 
@@ -198,7 +202,8 @@ module.exports = function addUndiciChannels(agent, undici, modName, shim) {
    *
    * Note: This event occurs before the error handler so we will always log it for now.
    */
-  diagnosticsChannel.channel('undici:request:error').subscribe(({ request, error }) => {
+  const requestError = diagnosticsChannel.channel('undici:request:error')
+  requestError.subscribe(({ request, error }) => {
     endAndRestoreSegment(request, error)
   })

Commit Sha

I’ve been able to reproduce this a couple ways but only one way consistently: by modifying an endpoint to hang for a period of time and then allowing traffic to resume (in this case by exiting but may be other ways) with other endpoints being hit. It is quite contrived but at least demonstrates a problem. I have not toggled all the potential variable bits to confirm the exact contributing factors.

A standalone reproduction (of at least a similar case to what is documented here) can be found here: https://github.com/michaelgoin/undici-fastify-repro. The default configuration which doesn’t all lump everything under a single transaction may not see a drop in traffic in the UI (in fast may increase due to quick failures of what doesn’t get lumped together) but the logs will show the state conflation.

Nice find @artur-ma. I’m going to see if it addresses the issue you raised here 🤞🏻

Maybe this is somehow related: https://github.com/nodejs/node/issues/42170 even if not, it should be changed for now (till it will be fixed in Node) to avoid garbage collection of subscribers, u have assign the channel to a variable

here for example https://github.com/newrelic/node-newrelic/blob/main/lib/instrumentation/undici.js#L72 it should be

const createChannel = diagnosticsChannel.channel('undici:request:create');
createChannel.subscribe(({ request });

Ok @artur-ma. I’ll try to reproduce. If I can’t, I’ll probably have you go through our technical support to get the right info.