workerd: 🐛 wrangler dev error: Received signal #11: Segmentation fault: 11

Hello!

Recently we’ve been seeing this issue with one of our worker scripts (we have three, the other two seem fine) where after some time being used in dev mode it will segfault and become unresponsive. This seems to happen anywhere from 10 seconds to 10 minutes after booting. I have not been able to find a way to reproduce it on demand, but it will happen every time, at some point.

This is on MacOS, using M1 chips.

The full error that is printed when this happens is (--log-level debug)

*** Received signal #11: Segmentation fault: 11
stack: 

There are always other errors logged out above, but they seem to change every time and are maybe unrelated since they seem to happen all the time regardless of how long it takes to hit the segfault. Here’s an example (again --log-level debug)

workerd/io/worker-entrypoint.c++:218: info: exception = kj/compat/http.c++:2673: disconnected: WebSocket disconnected between frames without sending `Close`.
stack: 105822c40 10689dd88 105824bd8 10529a004 10506a108 10506a517 104f40d78
workerd/io/worker-entrypoint.c++:218: info: exception = kj/compat/http.c++:2673: disconnected: WebSocket disconnected between frames without sending `Close`.
stack: 105822c40 10689dd88 105824bd8 105824d97 10689dd88 105842d8c 105824bd8 10529a004 10506a108 10506a517 104f40d78
workerd/io/worker-entrypoint.c++:218: info: exception = kj/compat/http.c++:2673: disconnected: WebSocket disconnected between frames without sending `Close`.
stack: 105822c40 10689dd88 105824bd8 105824d97 10689dd88 105842d8c 105824bd8 105824d97 10689dd88 105842d8c 10529a004 10506a108 10506a517 104f40d78
workerd/server/server.c++:2838: error: Uncaught exception: kj/compat/http.c++:2673: disconnected: worker_do_not_log; Request failed due to internal error
stack: 105822c40 10689dd88 105824bd8 105824d97 10689dd88 105842d8c 105824bd8 105824d97 10689dd88 105842d8c 10529a004 10506a108 10506a517 104f40d78 104a85c38 10584ebdc 10584fb84 10

workerd/io/worker-entrypoint.c++:218: info: exception = kj/async.c++:220: disconnected: other end of WebSocketPipe was destroyed

workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = P��-p��-.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = `�-(��@��-$(���|�.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104

workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = �

stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104

workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component

stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = 0�- -.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = (o� L2
P�-�0��-.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:306: failed: expected part.findFirst('/') == kj::none [(can't stringify) == (can't stringify)]; '/' character in path component; did you mean to use Path::parse()?; part = http://fake-host/w/27a79965-642c-44eb-8bd4-0e857bb0554e/presence.sqlite
stack: 1068e91a7 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = (o� L2
P�-�0��-��-.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = P��-`�-.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [0 == 71]; NUL character in path component; part = �ٙ-.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [5 == 71]; NUL character in path component; part = @\�-�ٙ-.sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104f51d07 104f5189f 104a6c59b 104a694eb 104a7143f 104a70f77 104a841c8 10529d3ec 10529e930 104d2f034 104d2f38c
workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [0 == 71]; NUL character in path component; part = .sqlite
stack: 1068e913b 1068e9357 104a6d39f 104a6d26f 104f72ddb 104f729a7 104
*** Received signal #11: Segmentation fault: 11
stack:

Let me know if any other info I can provide would be helpful.

For now I’m going to try to wrap the wrangler dev process in a retry loop.

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 4
  • Comments: 30 (10 by maintainers)

Most upvoted comments

Upvoted, +1! This one is particularly annoying, as there doesn’t seem to be a workaround, and it’s making unit testing waking up from hibernation impossible for us.

I’ve been able to make a minimal reliable replication of this crash: https://github.com/nvie/cloudflare-repros/tree/segfault#readme

It uses a Durable Object with the WebSocket Hibernation API, and pretty much nothing else.

I’m on an M3 Max, on Sonoma 14.4.1.

~Update from April 15, 2024: Still an issue, after using all the latest versions of all Cloudflare packages.~ Update from April 17, 2024: Still an issue, after using all the latest versions of all Cloudflare packages.

Yes! This is amazing — thanks so much, @MellowYarker! 🙌

Let me know if this is still a problem after the next release goes out. It looks like we aren’t putting one out this week, so I suspect it should go out next week.

Edit: Looks like a release has gone out now. Just need to wait for Miniflare to catch up.

@nvie thanks for the repro! I see a segfault on linux too (very surprising tbh).

It looks like it’s segfaulting in the ActorContainerRef destructor. Not sure why and will need to look into it further. I’ll give this a proper look over the next week or two.

in case anybody else needs it, here is the script I wrote to wrap wrangler and restart it after a segfault.

dev.ts

import { ChildProcessWithoutNullStreams, spawn } from 'child_process'
import stripAnsi from 'strip-ansi'

class WranglerMonitor {
	private process: ChildProcessWithoutNullStreams | null = null

	public start(): void {
		this.stop() // Ensure any existing process is stopped
		console.log(`Starting wrangler...`)
		this.process = spawn('wrangler', ['dev', '--env', 'dev'], {
			env: {
				NODE_ENV: 'development',
				...process.env,
			},
		})

		this.process.stdout.on('data', (data: Buffer) => {
			this.handleOutput(stripAnsi(data.toString().replace('\r', '').trim()))
		})

		this.process.stderr.on('data', (data: Buffer) => {
			this.handleOutput(stripAnsi(data.toString().replace('\r', '').trim()), true)
		})
	}

	private handleOutput(output: string, err = false): void {
		if (!output) return
		if (output.includes('Segmentation fault')) {
			console.error('Segfault detected. Restarting Wrangler...')
			this.restart()
		} else if (!err) {
			console.log(output.replace('[mf:inf]', ''))
		}
	}

	private restart(): void {
		console.log('Restarting wrangler...')
		this.stop()
		setTimeout(() => this.start(), 100) // Restart after a short delay
	}

	private stop(): void {
		if (this.process) {
			this.process.kill()
			this.process = null
		}
	}
}

new WranglerMonitor().start()

Run it with npx tsx <filename>. bun might work too.

There’s probably a bash one-liner that can do the same thing but i am not bashfully gifted.