got: Help debugging ECONNRESET problems

What would you like to discuss?

My tool uses got@9.6.0 and regularly get these types of error from npmjs:

{
    "name": "RequestError",
    "code": "ECONNRESET",
    "host": "registry.npmjs.org",
    "hostname": "registry.npmjs.org",
    "method": "GET",
    "path": "/@typescript-eslint%2Feslint-plugin",
    "protocol": "https:",
    "url": "https://registry.npmjs.org/@typescript-eslint%2Feslint-plugin",
    "gotOptions": {
      "path": "/@typescript-eslint%2Feslint-plugin",
      "protocol": "https:",
      "slashes": true,
      "auth": null,
      "host": "registry.npmjs.org",
      "port": null,
      "hostname": "registry.npmjs.org",
      "hash": null,
      "search": null,
      "query": null,
      "pathname": "/@typescript-eslint%2Feslint-plugin",
      "href": "https://registry.npmjs.org/@typescript-eslint%2Feslint-plugin",
      "headers": {
        "user-agent": "Renovate Bot (GitHub App 2740)",
        "authorization": "** redacted **",
        "cache-control": "no-cache",
        "accept": "application/json",
        "accept-encoding": "gzip, deflate"
      },
      "hooks": {
        "beforeError": [],
        "init": [],
        "beforeRequest": [],
        "beforeRedirect": [],
        "beforeRetry": [],
        "afterResponse": []
      },
      "retry": {
        "methods": {},
        "statusCodes": {},
        "errorCodes": {}
      },
      "decompress": true,
      "throwHttpErrors": true,
      "followRedirect": true,
      "stream": false,
      "form": false,
      "json": true,
      "cache": false,
      "useElectronNet": false,
      "method": "GET"
    },
    "message": "read ECONNRESET",
    "stack": "RequestError: read ECONNRESET\n    at ClientRequest.request.once.error (/home/ubuntu/renovateapp/node_modules/got/source/request-as-event-emitter.js:178:14)\n    at Object.onceWrapper (events.js:286:20)\n    at ClientRequest.emit (events.js:203:15)\n    at ClientRequest.EventEmitter.emit (domain.js:448:20)\n    at ClientRequest.origin.emit.args (/home/ubuntu/renovateapp/node_modules/@szmarczak/http-timer/source/index.js:37:11)\n    at TLSSocket.socketErrorListener (_http_client.js:392:9)\n    at TLSSocket.emit (events.js:198:13)\n    at TLSSocket.EventEmitter.emit (domain.js:448:20)\n    at emitErrorNT (internal/streams/destroy.js:91:8)\n    at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)"
  },

I’m wondering, does the retry part above indicate that there’s actually no retrying? Or could it indicate an incomplete stringifying of the object, etc?

Additionally, is there any additional debugging of the err object I can do once such errors are thrown?

Checklist

  • I have read the documentation.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 35 (15 by maintainers)

Most upvoted comments

Nope, we switched to v12 long time ago.

FYI I was finally able to verify got v11 in production and the npm errors immediately stopped. Thank you for the assistance.

Definitely before that commit, it has started without any change. Maybe we can try the custom agent pattern to log sockets

Is there any way to determine if we have too many got instances at once, or too many sockets, or other resource problems like that?

The past two days has seen a big problem in our production app that I’ve tentatively traced down to got never returning from await calls, even though we’re supplying a timeout. I’m wondering if it’s some type of “silent” problem such as resource exhaustion. Referring to got@9 still.

There is some clustering in time, yes. Certainly the problems tend to be more likely to be together than separate, including when one repository depends on multiple of the problem libraries. I’ll try to sort a snapshot of logs and see how pronounced the clustering is.

I think we can remove it, as it seems to have no effect (assuming I configured it correctly)

do you know if it’s still applicable?

Yes, because http is based on streams.

After adding in “manual” retries (catching got errors and recursing) I was finally able to lower the error rate. Anecdotally it seems like most are recovering first-time, as I think we had both thought should happen.

So it seems that retries in Got 9 are buggy. I’d prefer to switch to Got 10.

My best guess is that Renovate’s custom got caching behaviour prevents got’s own retry mechanism from succeeding.

Dunno. Possibly.

I wonder if it’s possible to add a catch here and delete the cache in case of failure?

Should be possible. Just remember to rethrow the error.

I’ll send you a PR with Got 10 soon.