undici: `fetch` can make `worker_threads` stuck and process hang
Bug Description
When undici
is run inside worker_threads
it may prevent the worker from terminating and makes the whole process hang.
Reproducible By
This Node script runs undici
in worker threads and does 4x requests there before exiting. If this does not reproduce on your fast environment, try increasing the requests and const ROUNDS
.
"undici": "^5.21.0"
// Save as `undici.mjs` and run as `node uncidi.mjs`
import { isMainThread, parentPort, Worker } from "node:worker_threads";
import { fileURLToPath } from "node:url";
import { cpus } from "node:os";
const THREADS = cpus().length - 1;
const ROUNDS = 10;
const filename = fileURLToPath(import.meta.url);
if (isMainThread) {
const stuckWorkers = [];
const interval = setInterval(() => {
if (stuckWorkers.length > 0) {
console.log(`There are ${stuckWorkers.length} stuck workers`);
}
}, 1000);
const timeout = setTimeout(() => {
setTimeout(() => console.log("Still running"), 5_000); // This is never called so process.exit does something, but it's still unable to terminate worker running undici
console.log("Forcing process.exit on main thread");
process.exit();
}, 30_000);
const task = async () => {
const worker = new Worker(filename, {});
return new Promise((resolve) => {
worker.on("message", (message) => {
if (message !== "DONE") return;
const timer = setTimeout(() => {
console.log("Unable to terminate");
stuckWorkers.push(worker);
}, 10_000);
worker.terminate().then(() => {
clearTimeout(timer);
resolve();
});
});
});
};
const pool = Array(ROUNDS).fill(task);
async function execute() {
const task = pool.shift();
if (task) {
await task();
return execute();
}
}
await Promise.all(
Array(THREADS)
.fill(execute)
.map((task) => task())
);
console.log("All done!");
clearTimeout(timeout);
clearInterval(interval);
} else {
const { fetch } = await import("undici");
const request = async (pathWithParams) => {
const url = new URL(pathWithParams, "https://api.spacexdata.com");
const timer = setTimeout(
() => console.log("Request taking more than 10s"), // This is never logged so requests are not stuck
10_000
);
const output = await fetch(url.href, {
method: "GET",
headers: { "Content-Type": "application/json" },
});
clearTimeout(timer);
return output;
};
await Promise.all([
request("/v4/starlink").then((r) => r.json()),
request("/v4/starlink/5eed770f096e59000698560d").then((r) => r.json()),
request("/v4/starlink").then((r) => r.json()),
request("/v4/starlink/5eed770f096e59000698560d").then((r) => r.json()),
]);
parentPort.postMessage("DONE");
}
Expected Behavior
Package should be safe to run inside worker_threads
. If it’s not safe, it should detect such environment and throw an error.
Logs & Screenshots
ari ~/repro $ node undici.mjs
Unable to terminate
There are 1 stuck workers
...
There are 1 stuck workers
Forcing process.exit on main thread
^Z
[1]+ Stopped node undici.mjs
ari ~/repro $ pgrep node
24498
ari ~/repro $ kill 24498
ari ~/repro $ pgrep node
24498
# ^^ Ok this one is really stuck
ari ~/repro $ kill -9 24498
[1]+ Killed: 9 node undici.mjs
ari ~/repro $ pgrep node
# no output, process killed
Environment
MacOS 13.2.1 Node 18.14.0
Additional context
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 13
- Comments: 30 (18 by maintainers)
Commits related to this issue
- fix: only ref socket when in use Refs: https://github.com/nodejs/undici/issues/2026#issuecomment-1502713911 — committed to nodejs/undici by ronag a year ago
- fix: only ref socket when in use Refs: https://github.com/nodejs/undici/issues/2026#issuecomment-1502713911 — committed to nodejs/undici by ronag a year ago
- fix: only ref socket when in use Refs: https://github.com/nodejs/undici/issues/2026#issuecomment-1502713911 — committed to nodejs/undici by ronag a year ago
- fix: only ref socket when in use Refs: https://github.com/nodejs/undici/issues/2026#issuecomment-1502713911 — committed to nodejs/undici by ronag a year ago
- fix: only ref socket when in use Refs: https://github.com/nodejs/undici/issues/2026#issuecomment-1502713911 — committed to nodejs/undici by ronag a year ago
- fix: only ref socket when in use Refs: https://github.com/nodejs/undici/issues/2026#issuecomment-1502713911 — committed to nodejs/undici by ronag a year ago
Now removing
process.exit
fixes it:I think this might be a bug in Node.js after all. @addaleax, have you got an idea why
process.exit()
in a worker might cause the worker to freeze?Even worse, adding
NODE_DEBUG=worker
make the worker actually terminate correctly.Hi @AriPerkkio, Is this fixed in the latest node (v21.7.1)? A Webstream fix patch (nodejs/node#51526) has been added. It seems to have fixed this issue.
Great work @tsctx and @jasnell (from https://github.com/nodejs/node/pull/51255, 👋)!
This issue seems to be fixed. I ran these tests using
undici@6.2.0
and following Node versions:v18.19.1
stuck on first runv21.7.1
10/10 runs without any worker hangs even withstress
v21.7.0
10/10 runs without any worker hangsv21.6.2
10/10 runs without any worker hangsv21.5.0
stuck on first runLooks like the fix was orignally included in
v21.6.2
in https://github.com/nodejs/node/pull/51255, but reworked forv21.7.1
in https://github.com/nodejs/node/pull/51526.This also seems to fix two other
node:worker_thread
hangs that were not using Undici at all. I just now tested both of these withv21.7.1
and they no longer hang. Withv18.19.1
the worker hang is easily reproduced:dompurify
: https://github.com/vitest-dev/vitest/issues/3077#issuecomment-1867306736.fastify
: https://github.com/vitest-dev/vitest/issues/4956#issuecomment-1908312966Let’s close this issue as fix on Node’s side has landed and root cause was not in Undici.
We ended up removing all use of
fetch
from our codebase and restricting use offetch
by overridingglobalThis.fetch
and the issue disappeared. 🤷I wonder if this is the same of https://github.com/nodejs/node/issues/47228
Yes,
worker.terminate
is likely working correctly. I would expect that onceawait fetch()
has resolved, it would not leave any active resources in the background. By reaching the end of worker file I would expect all JavaScript to have been run. Butfetch
seems to have some side effect here.Also when I replace
worker_threads
withchild_process
this issue does not happen. And when I useworker_threads
and replaceundici
withnode-fetch
, this issue does not happen.The issue comes up even when only a single worker is used. But if worker fires
fetch
calls sequentially instead of parallel withPromise.all
it does not come up. Also those sequentialfetch
calls seem to be much faster than parallel ones.Here is failing test case: https://github.com/nodejs/undici/pull/2032
Yep. I can confirm this happens.