parcel: Parcel 2.3.1, unknown reason segmentation fault on first build on m1 macos
🐛 bug report
🎛 Configuration (.babelrc, package.json, cli command)
platform: node 17.5.0 aarch64, macos 12.2 apple m1
package.json:
{
"private": true,
"source": "src/index.html",
"scripts": {
"build": "rm -rf dist/ && parcel build --no-source-maps --log-level verbose",
"dev": "parcel --port 3000",
"start": "serve dist"
},
"devDependencies": {
"@parcel/transformer-sass": "latest",
"parcel": "latest",
"serve": "latest"
}
}
no other config files.
cli command: npx parcel --version; rm -rf .parcel-cache; npm run build
output:
2.3.1
> build
> rm -rf dist/ && parcel build --no-source-maps --log-level verbose
✨ Built in 758ms
dist/index.html 2.62 KB 156ms
dist/favicon.4397b7fe.svg 787 B 147ms
dist/index.eb21fdc4.css 3.18 KB 310ms
dist/index.35aa3a8b.js 6.25 KB 113ms
sh: line 1: 10246 Segmentation fault: 11 parcel build --no-source-maps --log-level verbose
It seems to generate the correct result though.
🤔 Expected Behavior
There should be no segmentation fault.
😯 Current Behavior
It crashes with segfault, although the generated results seem good.
💁 Possible Solution
idk
🔦 Context
It does not seem to affect me though.
💻 Code Sample
It is a simple demo site with plain html + sass + vanilla typescript, no extra dependencies as you can see.
🌍 Your Environment
Software | Version(s) |
---|---|
Parcel | 2.3.1 |
Node | 17.5.0 aarch64 |
npm/Yarn | 8.5.0 |
Operating System | macos 12.2 on apple m1 |
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 24 (8 by maintainers)
Commits related to this issue
- Default to not using overlapping sync for now in worker threads until deeper NodeJS worker thread termination issues can be solved, https://github.com/parcel-bundler/parcel/issues/7702 — committed to kriszyp/lmdb-js by kriszyp 2 years ago
- Workaround for segfault on linux See https://github.com/parcel-bundler/parcel/issues/7702 — committed to mivade/htcondor_status by mivade 2 years ago
@LekoArts that’s great to hear, and yes, tl;dr, hopefully v2.2.2 addresses this issue for now. For a little longer story…
lmdb-js@v2 introduced a faster mechanism for committing transactions whereby commits can be written and proceed, and then OS cached data is flushed to disk asynchronously, and a later event indicates when this is completed. Other users found this to be extremely performant and effective and so this was turned on by default in v2.2. However, this is when these segfaults started occurring in parcel.
Initially I had assumed there must be some memory handling fault in this new async flush mechanism that was corrupting memory and leading to these segfaults. Many rabbit trails into verifying memory handling before the segfault, showed no problems with memory handling, everything was solid. Eventually I realized that there was no prior memory corruption, the error was occurring exactly where the reported segfault stack trace (that you/LekoArts reported) said it was occurring 😱 !
This stack trace shows that the segfault occurs in creating a V8 handle/scope. Why would that segfault? This goes deep into how NodeJS handles async tasks in worker threads. When a write transaction in LMDB is completed, there is a second task that goes into NodeJS/uv_lib’s task queue to flush disk. In the meantime, since the transaction is committed, parcel can (rightly) declare the job is done, and asks to terminate the threads. Thread termination is a pretty perilous and complicated action though; it is not like terminating a process where the OS knows exactly what the process owns and can automatically completely clean it up, thread termination requires application level thread cooperation, and in the case of NodeJS, the thread termination has a specific procedure for what it will stop doing and what it won’t. NodeJS’s conception of thread termination means that it will finish executing its current JS task(s), and then end and free the V8 isolate associated with the worker thread, but does not wait for pending tasks in the task queue to finish. However, these tasks in this queue still continue to execute since it is a part of uv_lib’s shared worker pool. Consequently, when one of these tasks completes (specifically the disk flush task), it then queues up the completion callback to execute, but that completion (JS) callback is set to execute against a V8 isolate that no longer exists (has been freed), which leads to the segmentation fault. These seems like a NAN bug, in that it attempts to call the callback regardless of the associated isolate’s state.
So what can be done about this? The most direct solution would be to override the NAN functions to avoid calling the callback when the worker thread is terminated (there is also a persistent handle that has to be nulled out as well), and this does actually seem to prevent the segfault in the provided test case. However, this solution does not seem to be foolproof; if the task goes long enough, not only does it extend beyond the life of the V8 isolate, but the thread termination procedure that shuts down the uv_lib event loop will sometimes crash reporting that there are open uv_lib handles. More research is needed, but using NAN’s async tasks just doesn’t seem capable of working well with thread termination. However, for lmdb@v2.3, I have been working on porting all the code from NAN to NAPI (which has a more stable API and requires distribution of far fewer binaries), and this seems like an appropriate place to potentially replace the NAN async tasks with direct NAPI based async tasks that hopefully work better.
As for the v2.2.x line, I have simply turned the new overlapping sync option off by default in worker threads. This is a temporary measure; I certainly hope to fully enable this by default in the future, but only after ensuring that the async tasks can really work reliably in conjunction with thread termination.
I have been able to reproduce this now (with @artnez repo), and debugging it, so hopefully narrowing in on cause/fix.
Try adding
PARCEL_WORKERS=0
to your commands,If your project is large you might have better luck with
PARCEL_WORKER_BACKEND=process
so that you get some multi processing.PARCEL_WORKERS=0
will probably do everything serially.Going to close this issue since it appears to be fixed by newer lmdb. If you are still seeing it, make sure you update to lmdb 2.2.2 in your lock file.
I just tested with https://github.com/DoctorEvidence/lmdb-js/commit/544b3fda402f24a70a0e946921e4c9134c5adf85 and I was still getting the segfault. My test environment is an M1 Mac (ARM64) on macOS Monterey 12.2.1.
I’m also seeing this on my M1 Macbook Pro (but other colleagues are not on an M1 yet) and here’s a reproduction: https://github.com/LekoArts/parcel-segfault-repro
It also has a segfault log from
segfault-handler
:edit: I changed the repo to only use the Parcel JS API, not the whole gatsby process. The old repro is still accessible at the gatsby-version branch
I’ve been dealing with this issue as well. Here is a minimal repro case for testing: https://github.com/artnez/parcel-segfault-repro
It definitely has something to do with multithreading because
PARCEL_WORKER_BACKEND=process
(switching to subprocess workers) fixes it. The trace also indicates so:Thanks. This worked for me. Is
PARCEL_WORKERS=0
discussed in the documentation?