undici: Multiple parallel long fetches with overridden, non-zero, timeouts fails all the time

Bug Description

When running multiple LONG running fetches in parallel, some of them seem to just die and hit a timeout that doesnt really seem relevant

name: HeadersTimeoutError
    message: Headers Timeout Error
    stack: >-
      HeadersTimeoutError: Headers Timeout Error
          at onParserTimeout (/src/archive/undici/lib/client.js:908:28)
          at callTimer (/src/archive/undici/node_modules/@sinonjs/fake-timer
  s/src/fake-timers-src.js:743:24)
          at doTickInner (/src/archive/undici/node_modules/@sinonjs/fake-tim
  ers/src/fake-timers-src.js:1311:29)
          at doTick (/src/archive/undici/node_modules/@sinonjs/fake-timers/s
  rc/fake-timers-src.js:1392:20)
          at Object.tick (/src/archive/undici/node_modules/@sinonjs/fake-tim
  ers/src/fake-timers-src.js:1400:20)
          at Server.<anonymous>
  (/src/archive/undici/test/fetch/fetch-timeouts.js:32:11)
          at Server.emit (node:events:513:28)
          at parserOnIncoming (node:_http_server:1065:12)
          at HTTPParser.parserOnHeadersComplete (node:_http_common:117:17)
    code: UND_ERR_HEADERS_TIMEOUT

Reproducible By

Tweaking minutesPerRequest, numParallelRequests, and timeout here makes for some pretty confusing results. When timeout is 0, everything seems to work correctly. I suspect there are a few timers that are overlapping eachother or getting shared somehow.

Simply set timeout to 0 and see that the test passes, and set timeout to 30 minutes and see that it fails

"use strict";

const { test } = require("tap");

const { fetch, Agent } = require("../..");
const { createServer } = require("http");
const FakeTimers = require("@sinonjs/fake-timers");

// Tweaking request length can have an impact too potentially (Setting to 1 seems to pass, and setting it to 10 min fails on my machine)
const minutesPerRequest = 10;

// On my machine setting this to 0 passes, and setting it to something very high fails with a UND_ERR_HEADERS_TIMEOUT
const timeout = 1000 * 60 * (minutesPerRequest + 5);
// const timeout = 0;

// Tweaking num requests to either 1 or multiple has different impacts
const numParallelRequests = 5;

test("Fetch very long parallel requests, timeout overridden", (t) => {
  const msToDelay = 1000 * 60 * minutesPerRequest;

  t.setTimeout(undefined);
  t.plan(numParallelRequests);

  const clock = FakeTimers.install();
  t.teardown(clock.uninstall.bind(clock));

  const server = createServer((req, res) => {
    setTimeout(() => {
      res.end("hello");
    }, msToDelay);
    clock.tick(msToDelay + 1);
  });
  t.teardown(server.close.bind(server));

  server.listen(0, () => {
    for (let i = 0; i < numParallelRequests; i++) {
      fetch(`http://localhost:${server.address().port}`, {
        path: "/",
        method: "GET",
        dispatcher: new Agent({
          headersTimeout: timeout,
          connectTimeout: timeout,
          bodyTimeout: timeout,
        }),
      })
        .then((response) => response.text())
        .then((response) => {
          t.equal("hello", response);
        })
        .catch((err) => {
          // This should not happen, a timeout error should not occur
          t.error(err);
        });
    }

    clock.tick(msToDelay - 1);
  });
});

Expected Behavior

The fetches to succeed

Logs & Screenshots

Environment

Node v18.13.0

Additional context

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 22 (22 by maintainers)

Most upvoted comments

setting these in socket = tls.connect({}) doesnt work because we are upgrading/assigning the socket, but manually setting socket.setKeepAlive with them after connect() works, so I think this will still require a code change

keepAlive [<boolean>](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Boolean_type) If set to true, it enables keep-alive functionality on the socket immediately after the connection is established, similarly on what is done in [socket.setKeepAlive([enable][, initialDelay])](https://nodejs.org/api/net.html#socketsetkeepaliveenable-initialdelay). Default: false.
keepAliveInitialDelay [<number>](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type) If set to a positive number, it sets the initial delay before the first keepalive probe is sent on an idle socket.Default: 0.

@xconverge would you mind opening a PR?

It should be a new parameter: tcpKeepAliveInterval: number | null - Default: null. @xconverge you can go ahead and start a PR and we can discuss about the default value for this later.

WDYT @ronag? Should this be enabled by default? It might actually help with some ECONNRESETs.

My experience with these kind of issues tells me the fakeTimers problem is not really related.

What this looks like: a firewall is truncating the connection in a way that we do not recognize, generating an error that we ignore.