workerd: šŸ› Bug Report — Runtime APIs: `alarm-scheduler` fails for inscrutable reason

Sometimes when I schedule alarms in a Durable Object, I get the following error:

workerd/server/alarm-scheduler.c++:203: warning: exception = kj/filesystem.c++:304: failed: expected strlen(part.begin()) == part.size() [6 == 71]; NUL character in path component; part = ��zUdc5238b9�$�v$7559429.sqlite

I tried looking through the source code, but it’s very hard to determine why this is occurring, especially because I don’t think KJ is open source. The behaviour appears inconsistently (restarting usually fixes it), but if it helps the kinds of calls I’m making are:

  • Schedule around 500ms in the future
  • Schedule at the current date (via Temporal.Now.instant(), which uses Date.now() under the hood)

For both, I sometimes check for the presence of an existing alarm first, but not always. Any hints?

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 5
  • Comments: 16 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Okay, I have a reproduction!

Use the code from above:

index.ts:

export class Test implements DurableObject {
  constructor(private readonly state: DurableObjectState) {}

  async fetch() {
    await this.state.storage.setAlarm(Date.now() - 1000);
    return new Response("OK");
  };

  async alarm() {
    console.log("Alarm execution succeeded");
  }
}

const index: ExportedHandler<{ TEST: DurableObjectNamespace }> = {
    fetch: async (_, { TEST }) => await TEST.get(TEST.idFromName("test")).fetch("https://do")
};

export default index;

wrangler.toml:

compatibility_date = "2023-11-23"
name = "debugging"
workers_dev = true
main = "./index.ts"
minify = true

[[durable_objects.bindings]]
name = "TEST"
class_name = "Test"

.vscode/launch.json:

{
	"configurations": [
		{
			"name": "Wrangler",
			"type": "node",
			"request": "attach",
			"port": 9229,
			"attachExistingChildren": false,
			"autoAttachChildProcesses": false
		}
	]
}

I’ve only reproduced in VS Code. The steps are:

  1. Bind a Logpoint to whatever line you’d like (I bound to the console.log line)
  2. wrangler dev
  3. Attach a debugger via VS Code and wait for it to bind
  4. curl localhost:8787
  5. If you don’t see the error, disconnect the debugger, wait a second, and re-attach it. Repeat steps 3–5 until you see it (should show up frequently enough)
  6. If you still don’t see the error, restart Wrangler and repeat steps 2–5.

I don’t know how many of these steps are required, but I am fairly confident this only shows up when a debugger is attached. I was able to get that repro working within 1–3 tries most of the time, occasionally it wouldn’t reproduce at all and I’d have to try step 6. I assume it doesn’t reliably reproduce because it’s memory-related?

I’m not super confident about my ability to build a workerd branch and host it inside miniflare inside wrangler but if you need me to do it I could give it a crack.

I just looked quickly at alarm-scheduler.c++ and there’s currently only one place we’re passing along a kj::PathPtr that could potentially be a problem here. I just opened https://github.com/cloudflare/workerd/pull/1442 that ensures the kj::Path is attached and kept alive. I haven’t repro’d the actual segfault locally yet so it would be helpful if someone who has repro’d it could give the potential fix in #1442 a try.

Thanks for that—sorry! I tried looking a bit further but knowing the KJ code didn’t enlighten me much—it is hard to find where workerd/wrangler is calling that deep into the stack without a useful stack trace, which I’m not getting (mine just spits out an LLVM_SYMBOLIZER error a la cloudflare/workers-sdk#3631). I would love to debug this myself but it’s just too hard. Should I move this to workers-sdk?