parcel: Parcel 2.3.1, unknown reason segmentation fault on first build on m1 macos

🐛 bug report

🎛 Configuration (.babelrc, package.json, cli command)

platform: node 17.5.0 aarch64, macos 12.2 apple m1

package.json:

{
  "private": true,
  "source": "src/index.html",
  "scripts": {
    "build": "rm -rf dist/ && parcel build --no-source-maps --log-level verbose",
    "dev": "parcel --port 3000",
    "start": "serve dist"
  },
  "devDependencies": {
    "@parcel/transformer-sass": "latest",
    "parcel": "latest",
    "serve": "latest"
  }
}

no other config files.

cli command: npx parcel --version; rm -rf .parcel-cache; npm run build

output:

2.3.1

> build
> rm -rf dist/ && parcel build --no-source-maps --log-level verbose

✨ Built in 758ms

dist/index.html              2.62 KB    156ms
dist/favicon.4397b7fe.svg      787 B    147ms
dist/index.eb21fdc4.css      3.18 KB    310ms
dist/index.35aa3a8b.js       6.25 KB    113ms
sh: line 1: 10246 Segmentation fault: 11  parcel build --no-source-maps --log-level verbose

It seems to generate the correct result though.

🤔 Expected Behavior

There should be no segmentation fault.

😯 Current Behavior

It crashes with segfault, although the generated results seem good.

💁 Possible Solution

idk

🔦 Context

It does not seem to affect me though.

💻 Code Sample

It is a simple demo site with plain html + sass + vanilla typescript, no extra dependencies as you can see.

🌍 Your Environment

Software	Version(s)
Parcel	2.3.1
Node	17.5.0 aarch64
npm/Yarn	8.5.0
Operating System	macos 12.2 on apple m1

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 24 (8 by maintainers)

Commits related to this issue

Default to not using overlapping sync for now in worker threads until deeper NodeJS worker thread termination issues can be solved, https://github.com/parcel-bundler/parcel/issues/7702 — committed to kriszyp/lmdb-js by kriszyp 2 years ago
Workaround for segfault on linux See https://github.com/parcel-bundler/parcel/issues/7702 — committed to mivade/htcondor_status by mivade 2 years ago

Most upvoted comments

@LekoArts that’s great to hear, and yes, tl;dr, hopefully v2.2.2 addresses this issue for now. For a little longer story…

lmdb-js@v2 introduced a faster mechanism for committing transactions whereby commits can be written and proceed, and then OS cached data is flushed to disk asynchronously, and a later event indicates when this is completed. Other users found this to be extremely performant and effective and so this was turned on by default in v2.2. However, this is when these segfaults started occurring in parcel.

Initially I had assumed there must be some memory handling fault in this new async flush mechanism that was corrupting memory and leading to these segfaults. Many rabbit trails into verifying memory handling before the segfault, showed no problems with memory handling, everything was solid. Eventually I realized that there was no prior memory corruption, the error was occurring exactly where the reported segfault stack trace (that you/LekoArts reported) said it was occurring 😱 !

This stack trace shows that the segfault occurs in creating a V8 handle/scope. Why would that segfault? This goes deep into how NodeJS handles async tasks in worker threads. When a write transaction in LMDB is completed, there is a second task that goes into NodeJS/uv_lib’s task queue to flush disk. In the meantime, since the transaction is committed, parcel can (rightly) declare the job is done, and asks to terminate the threads. Thread termination is a pretty perilous and complicated action though; it is not like terminating a process where the OS knows exactly what the process owns and can automatically completely clean it up, thread termination requires application level thread cooperation, and in the case of NodeJS, the thread termination has a specific procedure for what it will stop doing and what it won’t. NodeJS’s conception of thread termination means that it will finish executing its current JS task(s), and then end and free the V8 isolate associated with the worker thread, but does not wait for pending tasks in the task queue to finish. However, these tasks in this queue still continue to execute since it is a part of uv_lib’s shared worker pool. Consequently, when one of these tasks completes (specifically the disk flush task), it then queues up the completion callback to execute, but that completion (JS) callback is set to execute against a V8 isolate that no longer exists (has been freed), which leads to the segmentation fault. These seems like a NAN bug, in that it attempts to call the callback regardless of the associated isolate’s state.

So what can be done about this? The most direct solution would be to override the NAN functions to avoid calling the callback when the worker thread is terminated (there is also a persistent handle that has to be nulled out as well), and this does actually seem to prevent the segfault in the provided test case. However, this solution does not seem to be foolproof; if the task goes long enough, not only does it extend beyond the life of the V8 isolate, but the thread termination procedure that shuts down the uv_lib event loop will sometimes crash reporting that there are open uv_lib handles. More research is needed, but using NAN’s async tasks just doesn’t seem capable of working well with thread termination. However, for lmdb@v2.3, I have been working on porting all the code from NAN to NAPI (which has a more stable API and requires distribution of far fewer binaries), and this seems like an appropriate place to potentially replace the NAN async tasks with direct NAPI based async tasks that hopefully work better.

As for the v2.2.x line, I have simply turned the new overlapping sync option off by default in worker threads. This is a temporary measure; I certainly hope to fully enable this by default in the future, but only after ensuring that the async tasks can really work reliably in conjunction with thread termination.

kriszyp on Feb 22, 2022

I have been able to reproduce this now (with @artnez repo), and debugging it, so hopefully narrowing in on cause/fix.

kriszyp on Feb 17, 2022

Try adding PARCEL_WORKERS=0 to your commands,

"scripts": {
  "build": "rm -rf dist/ && PARCEL_WORKERS=0 parcel build --no-source-maps --log-level verbose",
  "dev": "PARCEL_WORKERS=0 parcel --port 3000",
  "start": "serve dist"
 }

SuttonJack on Feb 13, 2022

PARCEL_WORKER_BACKEND=process

If your project is large you might have better luck with PARCEL_WORKER_BACKEND=process so that you get some multi processing. PARCEL_WORKERS=0 will probably do everything serially.

artnez on Feb 13, 2022

Going to close this issue since it appears to be fixed by newer lmdb. If you are still seeing it, make sure you update to lmdb 2.2.2 in your lock file.

devongovett on Feb 22, 2022

I just tested with https://github.com/DoctorEvidence/lmdb-js/commit/544b3fda402f24a70a0e946921e4c9134c5adf85 and I was still getting the segfault. My test environment is an M1 Mac (ARM64) on macOS Monterey 12.2.1.

artnez on Feb 17, 2022

I’m also seeing this on my M1 Macbook Pro (but other colleagues are not on an M1 yet) and here’s a reproduction: https://github.com/LekoArts/parcel-segfault-repro

It also has a segfault log from segfault-handler:

PID 28527 received SIGSEGV for address: 0xb428
0   segfault-handler.node               0x000000010cecd458 _ZL16segfault_handleriP9__siginfoPv + 272
1   libsystem_platform.dylib            0x00000001bc2204e4 _sigtramp + 56
2   node                                0x00000001005e5250 _ZN2v811HandleScope10InitializeEPNS_7IsolateE + 40
3   node                                0x00000001005e531c _ZN2v811HandleScopeC1EPNS_7IsolateE + 20
4   node.abi93.glibc.node               0x000000010e657b74 _ZN3Nan11AsyncWorker12WorkCompleteEv + 36
5   node.abi93.glibc.node               0x000000010e657ee4 _ZN3Nan20AsyncExecuteCompleteEP9uv_work_si + 32
6   node                                0x0000000100cff144 uv__work_done + 192
7   node                                0x0000000100d028a4 uv__async_io + 320
8   node                                0x0000000100d145b8 uv__io_poll + 1052
9   node                                0x0000000100d02d34 uv_run + 380
10  node                                0x000000010052ac48 _ZN4node6worker16WorkerThreadDataD2Ev + 204
11  node                                0x00000001005279a8 _ZN4node6worker6Worker3RunEv + 684
12  node                                0x000000010052acfc _ZZN4node6worker6Worker11StartThreadERKN2v820FunctionCallbackInfoINS2_5ValueEEEEN3$_38__invokeEPv + 56
13  libsystem_pthread.dylib             0x00000001bc209240 _pthread_start + 148
14  libsystem_pthread.dylib             0x00000001bc204024 thread_start + 8

edit: I changed the repo to only use the Parcel JS API, not the whole gatsby process. The old repro is still accessible at the gatsby-version branch

LekoArts on Feb 14, 2022

I’ve been dealing with this issue as well. Here is a minimal repro case for testing: https://github.com/artnez/parcel-segfault-repro

$ sw_vers
ProductName:	macOS
ProductVersion:	12.2.1
BuildVersion:	21D62

$ node --version
v17.5.0

$ npx parcel --version
2.3.1

It definitely has something to do with multithreading because PARCEL_WORKER_BACKEND=process (switching to subprocess workers) fixes it. The trace also indicates so:

PID 69840 received SIGSEGV for address: 0xb4f8
0   segfault-handler.node               0x00000001035725f8 _ZL16segfault_handleriP9__siginfoPv + 252
1   libsystem_platform.dylib            0x00000001a22144e4 _sigtramp + 56
2   node                                0x000000010129c160 _ZN2v811HandleScopeC1EPNS_7IsolateE + 20
3   node                                0x000000010129c160 _ZN2v811HandleScopeC1EPNS_7IsolateE + 20
4   node.abi102.glibc.node              0x0000000103d70018 _ZN3Nan11AsyncWorker12WorkCompleteEv + 36
5   node.abi102.glibc.node              0x0000000103d70388 _ZN3Nan20AsyncExecuteCompleteEP9uv_work_si + 32
6   libuv.1.dylib                       0x000000010346b8c0 uv__work_done + 192
7   libuv.1.dylib                       0x000000010346ec38 uv__async_io + 320
8   libuv.1.dylib                       0x000000010347e458 uv__io_poll + 1592
9   libuv.1.dylib                       0x000000010346f058 uv_run + 320
10  node                                0x00000001011dd17c _ZN4node6worker16WorkerThreadDataD2Ev + 212
11  node                                0x00000001011dc914 _ZN4node6worker6Worker3RunEv + 1316
12  node                                0x00000001011dedd0 _ZZN4node6worker6Worker11StartThreadERKN2v820FunctionCallbackInfoINS2_5ValueEEEEN3$_38__invokeEPv + 56
13  libsystem_pthread.dylib             0x00000001a21fd240 _pthread_start + 148
14  libsystem_pthread.dylib             0x00000001a21f8024 thread_start + 8
Segmentation fault: 11

artnez on Feb 13, 2022

Try adding PARCEL_WORKERS=0 to your commands,

"scripts": {
  "build": "rm -rf dist/ && PARCEL_WORKERS=0 parcel build --no-source-maps --log-level verbose",
  "dev": "PARCEL_WORKERS=0 parcel --port 3000",
  "start": "serve dist"
 }

Thanks. This worked for me. Is PARCEL_WORKERS=0 discussed in the documentation?

cynthiateeters on Feb 13, 2022