undici: Leak and very long hanging when fetch throws `ConnectTimeoutError(UND_ERR_CONNECT_TIMEOUT)`

Version

v21.0.0

Platform

Linux archhost 6.5.7-arch1-1 nodejs/node#1 SMP PREEMPT_DYNAMIC Tue, 10 Oct 2023 21:10:21 +0000 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

  1. pull https://github.com/nikelborm/fetch_timeout_poc
  2. enable wifi and connect to network with internet
  3. connect to wireguard vpn with sending all trafic through this vpn server
  4. disable wifi
  5. run npm start

index.js

"use strict";

const startedAt = Date.now();
console.log('started fetch at: ', new Date(startedAt));

try {
  await fetch('https://jsonplaceholder.typicode.com/todos/1');
} catch (error) {
  console.log(error);
}

const finishedAt = Date.now();
console.log('finished fetch at: ', new Date(finishedAt))

console.log(`fetch took ${finishedAt - startedAt} ms`)

console.log('timestamp when script should finish: ', Date.now())

package.json

{
  "name": "poc",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "type": "module",
  "scripts": {
    "start": "node index.js; node -e \"console.log('timestamp when script actually finished: ', Date.now())\""
  }
}

How often does it reproduce? Is there a required condition?

It reproduces every time with described network configuration. No other condition. Same behavior in node v20.7.0 and v18.18.0

What is the expected behavior? Why is that the expected behavior?

Fetch waits 10 seconds, throws error UND_ERR_CONNECT_TIMEOUT. Then script exits immediately after it completed. Difference between timestamp when script should finish and timestamp when script actually finished should be near zero.

What do you see instead?

Fetch is being executed as expected. It kills itself after about 10 seconds and then returns control back to the script. But the problem is that something in the background is still blocking node.js and does not allow process to exit. It just hangs waiting for something to be finished even if the script was completed.

started fetch at:  2023-10-21T00:37:09.239Z
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11372:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///home/nikel/projects/alarm/index.js:7:3 {
  cause: _ConnectTimeoutError: Connect Timeout Error
      at onConnectTimeout (node:internal/deps/undici/undici:6616:28)
      at node:internal/deps/undici/undici:6574:50
      at Immediate._onImmediate (node:internal/deps/undici/undici:6605:13)
      at process.processImmediate (node:internal/timers:478:21) {
    code: 'UND_ERR_CONNECT_TIMEOUT'
  }
}
finished fetch at:  2023-10-21T00:37:19.391Z
fetch took 10152 ms
timestamp when script should finish:  1697848639391
timestamp when script actually finished:  1697848749482

After all code in the script been executed it takes 110 seconds (1697848749482 - 1697848639391 = 110091) to exit from process.

But enabling wifi back (restoring internet connection) during process’s hanging (after the moment fetch already failed and logged error) magically speeds up exit from process.

Additional information

Running completely offline

If we skip 2-3 steps and just run script offline we get other kind of error and expected behavior.

Following is being printed to console:

started fetch at:  2023-10-21T00:54:31.085Z
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11372:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///home/nikel/projects/alarm/index.js:7:3 {
  cause: Error: getaddrinfo EAI_AGAIN jsonplaceholder.typicode.com
      at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:118:26) {
    errno: -3001,
    code: 'EAI_AGAIN',
    syscall: 'getaddrinfo',
    hostname: 'jsonplaceholder.typicode.com'
  }
}
finished fetch at:  2023-10-21T00:54:31.250Z
fetch took 165 ms
timestamp when script should finish:  1697849671250
timestamp when script actually finished:  1697849671310

Difference between timestamp when script should finish and timestamp when script actually finished is 60 ms what is fine.

Running online

If we skip 4 step and just run script online we get no errors and expected behavior.

Following is being printed to console:

started fetch at:  2023-10-21T00:58:19.347Z
finished fetch at:  2023-10-21T00:58:19.995Z
fetch took 648 ms
timestamp when script should finish:  1697849899995
timestamp when script actually finished:  1697849900162

Difference between timestamp when script should finish and timestamp when script actually finished is 167 ms what also seems fine.

Running using node-fetch with described network configuration

If we install node-fetch with npm i node-fetch and then import it in the beginning using import fetch from 'node-fetch'; it will replace original fetch and will give us different behavior:

started fetch at:  2023-10-21T01:48:05.984Z
FetchError: request to https://jsonplaceholder.typicode.com/todos/1 failed, reason: getaddrinfo EAI_AGAIN jsonplaceholder.typicode.com
    at ClientRequest.<anonymous> (file:///home/nikel/projects/alarm/node_modules/node-fetch/src/index.js:108:11)
    at ClientRequest.emit (node:events:517:28)
    at TLSSocket.socketErrorListener (node:_http_client:501:9)
    at TLSSocket.emit (node:events:517:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'EAI_AGAIN',
  code: 'EAI_AGAIN',
  erroredSysCall: 'getaddrinfo'
}
finished fetch at:  2023-10-21T01:50:06.143Z
fetch took 120159 ms
timestamp when script should finish:  1697853006143
timestamp when script actually finished:  1697853006236

When fetch from node-fetch fails, it logs last few lines to console and script exits immediately. Difference between timestamp when script should finish and timestamp when script actually finished is 93 ms what also seems fine.

The interesting thing is that fetch from node-fetch throws the same error as error thrown by original fetch when vpn disabled and notebook completely offline.

fetch from node-fetch returns control after 120 seconds. Original fetch returns control after 10 seconds and then hangs in the background 110 seconds. Looks very suspicious (120 == 110 + 10).

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 26 (24 by maintainers)

Most upvoted comments

downgrading node version with nvm solved for me