undici: `fetch` can make `worker_threads` stuck and process hang

Bug Description

When undici is run inside worker_threads it may prevent the worker from terminating and makes the whole process hang.

Reproducible By

This Node script runs undici in worker threads and does 4x requests there before exiting. If this does not reproduce on your fast environment, try increasing the requests and const ROUNDS.

"undici": "^5.21.0"
// Save as `undici.mjs` and run as `node uncidi.mjs`
import { isMainThread, parentPort, Worker } from "node:worker_threads";
import { fileURLToPath } from "node:url";
import { cpus } from "node:os";

const THREADS = cpus().length - 1;
const ROUNDS = 10;
const filename = fileURLToPath(import.meta.url);

if (isMainThread) {
  const stuckWorkers = [];
  const interval = setInterval(() => {
    if (stuckWorkers.length > 0) {
      console.log(`There are ${stuckWorkers.length} stuck workers`);
    }
  }, 1000);

  const timeout = setTimeout(() => {
    setTimeout(() => console.log("Still running"), 5_000); // This is never called so process.exit does something, but it's still unable to terminate worker running undici
    console.log("Forcing process.exit on main thread");
    process.exit();
  }, 30_000);

  const task = async () => {
    const worker = new Worker(filename, {});

    return new Promise((resolve) => {
      worker.on("message", (message) => {
        if (message !== "DONE") return;

        const timer = setTimeout(() => {
          console.log("Unable to terminate");
          stuckWorkers.push(worker);
        }, 10_000);
        worker.terminate().then(() => {
          clearTimeout(timer);
          resolve();
        });
      });
    });
  };

  const pool = Array(ROUNDS).fill(task);

  async function execute() {
    const task = pool.shift();

    if (task) {
      await task();
      return execute();
    }
  }

  await Promise.all(
    Array(THREADS)
      .fill(execute)
      .map((task) => task())
  );
  console.log("All done!");
  clearTimeout(timeout);
  clearInterval(interval);
} else {
  const { fetch } = await import("undici");

  const request = async (pathWithParams) => {
    const url = new URL(pathWithParams, "https://api.spacexdata.com");

    const timer = setTimeout(
      () => console.log("Request taking more than 10s"), // This is never logged so requests are not stuck
      10_000
    );
    const output = await fetch(url.href, {
      method: "GET",
      headers: { "Content-Type": "application/json" },
    });
    clearTimeout(timer);
    return output;
  };
  await Promise.all([
    request("/v4/starlink").then((r) => r.json()),
    request("/v4/starlink/5eed770f096e59000698560d").then((r) => r.json()),
    request("/v4/starlink").then((r) => r.json()),
    request("/v4/starlink/5eed770f096e59000698560d").then((r) => r.json()),
  ]);

  parentPort.postMessage("DONE");
}

Expected Behavior

Package should be safe to run inside worker_threads. If it’s not safe, it should detect such environment and throw an error.

Logs & Screenshots

ari ~/repro  $ node undici.mjs 
Unable to terminate
There are 1 stuck workers
...
There are 1 stuck workers
Forcing process.exit on main thread
^Z
[1]+  Stopped                 node undici.mjs

ari ~/repro  $ pgrep node
24498

ari ~/repro  $ kill  24498
ari ~/repro  $ pgrep node
24498
# ^^ Ok this one is really stuck

ari ~/repro  $ kill  -9 24498
[1]+  Killed: 9               node undici.mjs

ari ~/repro  $ pgrep node
# no output, process killed

Environment

MacOS 13.2.1 Node 18.14.0

Additional context

https://github.com/vitest-dev/vitest/issues/3077

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 13
  • Comments: 30 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Now removing process.exit fixes it:

import { request } from "undici";

request("https://media.tenor.com/qy_WcGdRzfgAAAAC/xluna-high-five.gif").then(res => res.body.arrayBuffer()).then(buf => Buffer.from(buf))
  .then(() => ({
    buffer: Buffer.alloc(10),
    fileExtension: "bin"
  }))
  .then(() => {
    //// comment out process.exit and the worker would not exit anymore
    // process.exit();
  });

I think this might be a bug in Node.js after all. @addaleax, have you got an idea why process.exit() in a worker might cause the worker to freeze?

Even worse, adding NODE_DEBUG=worker make the worker actually terminate correctly.

Hi @AriPerkkio, Is this fixed in the latest node (v21.7.1)? A Webstream fix patch (nodejs/node#51526) has been added. It seems to have fixed this issue.

Great work @tsctx and @jasnell (from https://github.com/nodejs/node/pull/51255, 👋)!

This issue seems to be fixed. I ran these tests using undici@6.2.0 and following Node versions:

  • v18.19.1 stuck on first run
  • v21.7.1 10/10 runs without any worker hangs even with stress
  • v21.7.0 10/10 runs without any worker hangs
  • v21.6.2 10/10 runs without any worker hangs
  • v21.5.0 stuck on first run

Looks like the fix was orignally included in v21.6.2 in https://github.com/nodejs/node/pull/51255, but reworked for v21.7.1 in https://github.com/nodejs/node/pull/51526.

This also seems to fix two other node:worker_thread hangs that were not using Undici at all. I just now tested both of these with v21.7.1 and they no longer hang. With v18.19.1 the worker hang is easily reproduced:

Let’s close this issue as fix on Node’s side has landed and root cause was not in Undici.

That issue seems unrelated, undici does not allocate any FileHandle.

We ended up removing all use of fetch from our codebase and restricting use of fetch by overriding globalThis.fetch and the issue disappeared. 🤷

I wonder if this is the same of https://github.com/nodejs/node/issues/47228

[…] docs explicitly say that it will “[s]top all JavaScript execution in the worker thread as soon as possible”. There’s no guarantee that it’ll actually close the worker if there is a resource (seemingly a stream from the debugging above) still ref’d.

Yes, worker.terminate is likely working correctly. I would expect that once await fetch() has resolved, it would not leave any active resources in the background. By reaching the end of worker file I would expect all JavaScript to have been run. But fetch seems to have some side effect here.

Also when I replace worker_threads with child_process this issue does not happen. And when I use worker_threads and replace undici with node-fetch, this issue does not happen.

Do you notice this issue if you run tests sequentially?

The issue comes up even when only a single worker is used. But if worker fires fetch calls sequentially instead of parallel with Promise.all it does not come up. Also those sequential fetch calls seem to be much faster than parallel ones.

We run hundreds of fetch tests in workers w/o issue, but one at a time.

Here is failing test case: https://github.com/nodejs/undici/pull/2032

Yep. I can confirm this happens.