next.js: Starting at 13.4.13-canary.0 Internal Server Error due to connection refused

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

13.4.13-canary.0

Which area(s) of Next.js are affected? (leave empty if unsure)

No response

Link to the code that reproduces this issue or a replay of the bug

no response

To Reproduce

just update to this version and get up a production server

Describe the Bug

  • error Failed to handle request for /sw.js TypeError: fetch failed at Object.fetch (node:internal/deps/undici/undici:11576:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async invokeRequest (/app/node_modules/.pnpm/next@13.4.13-canary.0_@babel+core@7.22.5_react-dom@18.2.0_react@18.2.0_sass@1.64.0/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:34:23) at async requestHandler (/app/node_modules/.pnpm/next@13.4.13-canary.0_@babel+core@7.22.5_react-dom@18.2.0_react@18.2.0_sass@1.64.0/node_modules/next/dist/server/lib/start-server.js:329:35) at async Server.<anonymous> (/app/node_modules/.pnpm/next@13.4.13-canary.0_@babel+core@7.22.5_react-dom@18.2.0_react@18.2.0_sass@1.64.0/node_modules/next/dist/server/lib/start-server.js:148:13) { cause: Error: connect ECONNREFUSED 127.0.0.1:41367 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1592:16) { errno: -111, code: ‘ECONNREFUSED’, syscall: ‘connect’, address: ‘127.0.0.1’, port: 41367 } }

Expected Behavior

Server Working

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

No response

NEXT-1510

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 31
  • Comments: 57 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Reproduction

I have made a simple reproduction repository which reproduces the issue every time. The only changes I’ve made to the default reproduction template were adding the Dockerfile of the with-docker example, and setting output: 'standalone' in the next config.

Steps

  1. Clone https://github.com/darthmaim-reproductions/vercel-next.js-53171
  2. Run this command to build and start the next.js app in a docker container
    docker build -t vercel-next.js-53171 . && docker run --rm -p3000:3000 vercel-next.js-53171
    
  3. Access http://localhost:3000/
  4. The server responds with Internal Server Error
  5. The server logs output
    - ready started server on e68ae5711fa3:3000, url: http://e68ae5711fa3:3000
    Listening on port 3000 url: http://e68ae5711fa3:3000
    - error Failed to handle request for /
    TypeError: fetch failed
        at Object.fetch (node:internal/deps/undici/undici:11576:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
        at async invokeRequest (/app/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:21:12)
        at async requestHandler (/app/node_modules/next/dist/server/lib/start-server.js:336:33)
        at async Server.<anonymous> (/app/node_modules/next/dist/server/lib/start-server.js:152:13) {
      cause: Error: connect ECONNREFUSED 127.0.0.1:41303
          at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16) {
        errno: -111,
        code: 'ECONNREFUSED',
        syscall: 'connect',
        address: '127.0.0.1',
        port: 41303
      }
    }
    - error Failed to handle request for /favicon.ico
    TypeError: fetch failed
        at Object.fetch (node:internal/deps/undici/undici:11576:11)
        at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
        at async invokeRequest (/app/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:21:12)
        at async requestHandler (/app/node_modules/next/dist/server/lib/start-server.js:336:33)
        at async Server.<anonymous> (/app/node_modules/next/dist/server/lib/start-server.js:152:13) {
      cause: Error: connect ECONNREFUSED 127.0.0.1:41303
          at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16) {
        errno: -111,
        code: 'ECONNREFUSED',
        syscall: 'connect',
        address: '127.0.0.1',
        port: 41303
      }
    }
    

For self-hosted apps that use output: standalone Next.js v13.4.13 introduced some weird and really hard to reproduce behaviours. This is an attempt to summarize my experience. Note that all of the things described below worked fine in v13.4.12.

Below is the next.config.js file:

/** @type {import('next').NextConfig} */
const nextConfig = {
    output: 'standalone',
}

module.exports = nextConfig

For reproduction do the following:

  1. Build an app where output: standalone is configured.
  2. For easier reproduction copy the static and public folder as described here: https://nextjs.org/docs/app/api-reference/next-config-js/output#automatically-copying-traced-files
  3. Start the app with node server.js on a server where localhost resolves to an ipv6 address.

In such an environment, the following 2 scenarios fail:

1. Internationalization middleware always redirects to localhost

Reproduction:

  1. Use the example from the Next.js repository: https://github.com/vercel/next.js/tree/canary/examples/app-dir-i18n-routing
  2. Configure output: standalone.
  3. Increase Next.js version to 13.4.13.
  4. Host the app on a differen server (not your local machine) where localhost resolves to an ipv6 address.
  5. Visit the corresponding url in the browser, e.g.: http://my-test-server.mycompanydomain:3000

Result: The internationalization middleware redirects to: http://localhost:3000/en Expected: The internationalization middleware should redirect to: http://my-test-server.mycompanydomain:3000/en

2. Redirects in Server Actions force a hard reload of the whole page in the browser

When running on a Server that only uses ipv4, Server Actions now work perfectly fine thanks to: https://github.com/vercel/next.js/pull/53373 and https://github.com/vercel/next.js/pull/53368 (great work by the way, I was really looking forward to that).

However, when hosted on a Server where localhost resolves to an ipv6 address and a Server Action uses redirect("/some-other-path") an error message is logged and the redirect is enforced via a hard page reload in the browser:

Reproduction:

  1. Use any simple service where Server Actions are enabled.
  2. Call a Server Action via a form’s action that does a redirect to another page or the same page (it doesn’t matter).

Result: The new page is visited via a hard reload in the web browser and the following error is logged to the console: redirect_error Expected: Navigation should not reload the whole page and no error message should be logged.

Additional Info

These were two simple scenarios I was able to reproduce. Moreover, there also seems to be a problem regarding environment variables for output: standalone in 13.4.13: https://github.com/vercel/next.js/issues/53579 and https://github.com/vercel/next.js/issues/53367

This makes it quite hard to test how aspects like manually setting HOSTANAME=0.0.0.0 or to 127.0.0.1 would work. In my case, none of the two reproduction scenarios described could be solved by this.

One more thing I noticed is that in 13.4.13 starting the server logs the following: bug_reproduction_13 Note the first line: “ready started server on 0.0.0.0:3000, url: http://localhost:3000” Is this intended to be that way?

I hope that this can help with troubleshooting. I really think that this is a hard one to reproduce, as problems become present in different aspects of the app, but only in a specific environment.

Thank you for providing the minimal reproduction – I’ve marked this one to be dug into further. In the meantime, please remain on 13.4.12 or lower while we dig into this 🙏

Error finally detected! As mentioned by @balazsorban44, it’s an IPv6 handling error. If you force the use of 0.0.0.0 in Docker environments because you can’t probably use 127.0.0.1, it will cause issues when you invoke the rewrite like this:

return NextResponse.rewrite(
        new URL(
          url,
          request.url,
        ),
        getResponseInit(),
      );

The next call will invoke the middleware again with the new request, which is not intended. To avoid this, modify it to:

return NextResponse.rewrite(
         // this is to try to not called the middleware twice remove when next handle this better
        new URL(
          url,
          request.url
            .replace('https://localhost', 'https://0.0.0.0')
            .replace('http://localhost', 'http://0.0.0.0'),
        ),
        getResponseInit(),
      );

The middleware is not called twice because Next.js detects it as the same request and simply re-routes it.

This reproduces on 13.4.13 as well so suggesting using that without a fix is not a good suggestion.

I’ve been losing my mind as well with too much time sunk into this…

I’m on Next version: 13.4.19

I constantly get these errors in my logs; I imagine they’re from K8’s probes for liveness and readiness; even though the container launches just fine, but later fails with 500 errors it seems for those probes. I used to utilize a basic health check page, then I tried utilizing an API route that returned a simple 200 OK for the body.

It seems odd to me that the localPort is always changing, but maybe that’s the internal port?

Errors

- ready started server on [::]:3000, url: http://localhost:3000
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at async invokeRequest (/app/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:17:12)
    at async invokeRender (/app/node_modules/next/dist/server/lib/router-server.js:254:29)
    at async handleRequest (/app/node_modules/next/dist/server/lib/router-server.js:447:24)
    at async requestHandler (/app/node_modules/next/dist/server/lib/router-server.js:464:13)
    at async Server.<anonymous> (/app/node_modules/next/dist/server/lib/start-server.js:117:13) {
  cause: SocketError: other side closed
      at Socket.onSocketEnd (/app/node_modules/next/dist/compiled/undici/index.js:1:63301)
      at Socket.emit (node:events:526:35)
      at endReadableNT (node:internal/streams/readable:1376:12)
      at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
    code: 'UND_ERR_SOCKET',
    socket: {
      localAddress: '::1',
      localPort: 43952,
      remoteAddress: undefined,
      remotePort: undefined,
      remoteFamily: undefined,
      timeout: undefined,
      bytesWritten: 999,
      bytesRead: 542
    }
  }
}
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at async invokeRequest (/app/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:17:12)
    at async invokeRender (/app/node_modules/next/dist/server/lib/router-server.js:254:29)
    at async handleRequest (/app/node_modules/next/dist/server/lib/router-server.js:447:24)
    at async requestHandler (/app/node_modules/next/dist/server/lib/router-server.js:464:13)
    at async Server.<anonymous> (/app/node_modules/next/dist/server/lib/start-server.js:117:13) {
  cause: Error: read ECONNRESET
      at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
    errno: -104,
    code: 'ECONNRESET',
    syscall: 'read'
  }
}

I also tried setting HOSTNAME "0.0.0.0" in my Dockerfile but that does not help. It’s also odd the bootup message still says: - ready started server on [::]:3000, url: http://localhost:3000

Oddities

I just forwarded a tunnel to my container and the root page loaded up quickly, which is what I was using for probes previously, but this time when I tried to hit my new /api/health-check it was taking a long time so I opened up another page to it, and it eventually loaded, and the first request gave me a ERR_CONNECTION_RESET on Chrome. Oddly it doesn’t seem anything showed up in the logs related to this though 🤔

It seems that these requests are majorly lagging now or failing. But if I access the app from the domain on the internet everything is loading up quickly and smoothly oddly… 😔

The first request on the tunnel to a page goes through quickly, but then any subsequent request lags horribly or times out 😬 Utilizing curl on the container via localhost seems to consistently work as well.

Hello,

Same issue on our side. (Build standalone+docker+kubernetes).

App is running fine locally outside docker container.

Everything did work on kubernetes env for next.js 13.4.12.

When using 13.4.13, readiness probe or liveness probe cannot be reached on kubernetes pods with following error Liveness probe failed: Get "http://172.16.*.*:3000/healthz": dial tcp 172.16.*.*:3000: connect: connection refused

Please note as well that our application use a env var named “HOSTNAME” and that it does conflict with the process.env.HOSTNAME used inside the generated next server.js file. I had to implement a postbuild script to rewrite server.js to not check this env var and always use localhost

It is not fixed on v13.4.20-canary.9.

Please reopen this issue

It should be noted that:

  • by default, k8s injects the HOSTNAME environment variable into pods, and its value is the pod name.

So in k8s, the server will never listen on 0.0.0.0 after updating to 13.4.13. Unless we update the deployments or start commands.

Always nice when a patch introduces a breaking change 🙄. What’s the workaround for runtime environmental variables? As I can get the app to run in docker by setting the hostname (as above) but the runtime envs are not found (kinda essential as they are for auth) this is on a standalone build. (checked on the next canary release and the same issue is there too).

I had the same issues with 13.4.13-canary.5. Rollback to 13.4.12 solved it for me.

As well as @charnog I do not have any rewrites in the app.

Here is the error from my container (sorry I no longer have it in plaintext)

image

nextjs: ^13.4.13 I looked into nextjs code and found a possible cause of the issue. The solution I found worked fine when nextjs app was started via Docker container.

If you look into .next/standalone/server.js you’ll see the following lines of code

const hostname = process.env.HOSTNAME || 'localhost'
...
startServer({
...
  hostname: hostname === 'localhost' ? '0.0.0.0' : hostname,
...

It seems that process.env.HOSTNAME was set by docker builder to some random string, therefore nextjs server was started with that random hostname instead of ‘0.0.0.0’. This means that connections was only accepted if that random hostname was used in any client request.

Workaround solution

server inside of container should accept connections by any address (localhost, 127.0.0.1, …) - so it must be 0.0.0.0. To achieve this I set HOSTNAME in Dockerfile to localhost, hence in the aforementioned .next/standalone/server.js it was resolved as hostname=‘0.0.0.0’ prop for the startServer. Hope this will help someone.

Same problem here. It started with 13.4.13-canary.0, and I can also see it in 13.4.13-canary.5. I’ve been trying to figure it out on my local machine using Docker, but no luck. I can’t really pinpoint what’s causing it. We don’t have rewrites in the middleware (as @joacub mentioned), for most of the requests, NextResponse.next() is executed, but we do have rewrites in the config. It could somehow be related.

I know a small example would help, but like I said, I can’t narrow it down yet. Still trying to reproduce it locally.

Anyway, here are the logs from 13.4.13-canary.5 from our pods in the Kubernetes cluster. Maybe they’ll help somehow.

TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at async invokeRequest (/app/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:21:12)
    at async invokeRender (/app/node_modules/next/dist/server/lib/router-server.js:226:29)
    at async handleRequest (/app/node_modules/next/dist/server/lib/router-server.js:419:24)
    at async requestHandler (/app/node_modules/next/dist/server/lib/router-server.js:436:13) {
  cause: SocketError: other side closed
      at Socket.onSocketEnd (/app/node_modules/next/dist/compiled/undici/index.js:1:63301)
      at Socket.emit (node:events:526:35)
      at endReadableNT (node:internal/streams/readable:1359:12)
      at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
    code: 'UND_ERR_SOCKET',
    socket: {
      localAddress: '127.0.0.1',
      localPort: 52428,
      remoteAddress: undefined,
      remotePort: undefined,
      remoteFamily: undefined,
      timeout: undefined,
      bytesWritten: 770,
      bytesRead: 207
    }
  }
}
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at async invokeRequest (/app/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:21:12)
    at async invokeRender (/app/node_modules/next/dist/server/lib/router-server.js:226:29)
    at async handleRequest (/app/node_modules/next/dist/server/lib/router-server.js:419:24)
    at async requestHandler (/app/node_modules/next/dist/server/lib/router-server.js:436:13) {
  cause: Error: read ECONNRESET
      at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
    errno: -104,
    code: 'ECONNRESET',
    syscall: 'read'
  }
}

no it’s not fixed. At least not for me. Kubernete pods are assigned default HOSTNAME image

however when in 13.4.19 app somehow listens an ip address instead of the hostname and many restarts after, liveness probe detects it’s somehow alive, next (node) begins to give ssl issues. image

does anybody know any workaround for this?

not using docker & we’re getting the same error on our repo. downgrading to 13.4.12 doesn’t cause this issue but then our vercel build fails because of some weird “out of space” errors. upgrading to 13.4.14-canary.1 gives the same error as in OP & then just goes into an infinite spam of:

[dx:next] TRPCClientError: Unexpected token < in JSON at position 0
[dx:next]     at TRPCClientError.from (file:///Users/sasha/housing_cloud/web/node_modules/@trpc/client/dist/TRPCClientError-fef6cf44.mjs:26:16)
[dx:next]     at file:///Users/sasha/housing_cloud/web/node_modules/@trpc/client/dist/httpUtils-ad76b802.mjs:125:36
[dx:next]     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
[dx:next]   meta: {
[dx:next]     response: Response {
[dx:next]       [Symbol(realm)]: null,
[dx:next]       [Symbol(state)]: [Object],
[dx:next]       [Symbol(headers)]: [HeadersList]
[dx:next]     }
[dx:next]   },
[dx:next]   shape: undefined,
[dx:next]   data: undefined,
[dx:next]   [cause]: SyntaxError: Unexpected token < in JSON at position 0
[dx:next]       at JSON.parse (<anonymous>)
[dx:next]       at parseJSONFromBytes (node:internal/deps/undici/undici:6498:19)
[dx:next]       at successSteps (node:internal/deps/undici/undici:6472:27)
[dx:next]       at node:internal/deps/undici/undici:1145:60
[dx:next]       at node:internal/process/task_queues:140:7
[dx:next]       at AsyncResource.runInAsyncScope (node:async_hooks:204:9)
[dx:next]       at AsyncResource.runMicrotask (node:internal/process/task_queues:137:8)
[dx:next]       at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[dx:next] }

We face the same issue beginning from 13.4.13-canary.0 - I tried to quickly reproduce with https://github.com/vercel/next.js/blob/canary/examples/with-docker/README.md and app dir. But it works every time.

We have quite a fancy setup (turborepo, multiple apps, multiple middlewares) but as stated here in the issue it seems to also happen without any of that.

Settings the hostname like in the example: https://github.com/vercel/next.js/blob/canary/examples/with-docker/Dockerfile#L60 didn’t work for me

Only thing I can share from the error docker container is the line which throws the error:

2023-07-28 at 17 09

2023-07-28 at 17 10

From https://github.com/vercel/next.js/blob/canary/packages/next/src/server/lib/start-server.ts#L332

Looks a lot like this change https://github.com/vercel/next.js/commit/c11bce5989073d135d63cfbbf63f71bd687c891b#diff-0ff576bd02e76f96680accff48ecbe4126a7f6bc4eafa547b6d44dee49bab770R246 from @shuding & @ijjk

@DuCanhGH We need to override it because our application use a runtime variable named “HOSTNAME” and conflicts with the one server.js try to use. It seems to work by replacing hostname with 0.0.0.0 inside server.js

This has been fixed ✅

Update - I was setting the HOSTNAME in the wrong file, once moved to the entrypoint script it all works now. Thanks!

reverting back down to 13.4.12 resolved the issue, along with setting ENV HOSTNAME localhost in Dockerfile. hoping this and the related env var issue when in output: standalone gets resolved so i can unpin

@charnog yeah they really thought replacing localhost with 127.0.0.1 is the way to go 💀 I still haven’t got any response by the way.

Btw great work with next-pwa working really nice.

This is the second time I’m letting you know, please follow the bug report template when opening an issue. #52621 (comment)

Please add a minimal reproduction so we can investigate.

Please, allow me to explain the situation more clearly. I have encountered a recurring issue with the server not starting in production. The problem does not have any clear reproduction steps, and it happens consistently. It’s important to note that the issue is not caused by anything on my end, as I’ve tested it on a completely empty Next.js project, and the problem persists.

This problem was first noticed by another person as well, and we have been trying to pinpoint the specific code that triggers this error. Despite our efforts, the root cause remains elusive. While I understand that you may want more information from me, I assure you that we have thoroughly investigated the situation.

I would appreciate any guidance or assistance you can provide to resolve this matter. As it is affecting the production environment, I’m eager to find a solution as soon as possible. Please let me know if you need any further details or if there’s anything specific you would like me to do to help with the investigation and resolution of this issue. Thank you for your support