node-addon-api: Segmentation faults - Inconsistent errors with lots of async workers and objects
Hello,
I’m working on a pull request for a project using node-addon-api. When I create a lot of objects via AsyncWorker I get Segmentation faults.
I traced some of them down to create a fresh Napi::Env
via info.Env()
vs the persistent object’s Env()
call. Is there a difference here or a reason why I can’t do that?
I’m now running into other issues. These problems only occur after over 10,000 objects are created and destroyed.
This occurs when I call makeLargeZim.ts
which calls creator.addItem
.
addItem
creates an AsyncWorker
which stores a promise which is resolved once the libzim call completes. This is done because libzim will wait until other threads within its library are finished in the queue which blocks the nodejs / libuv main event thread.
This works fine most of the time with 10, 100, 1000, 10000, but when I try to do ~1,000,000 I get an error. Its not a memory issue, plenty of ram. Any guidance would be helpful. Thank you.
The backtrace from gdb is here
#0 0x0000555555edbe10 in v8::internal::GlobalHandles::Create(v8::internal::Object) () [0/14875]
#1 0x0000555555d7aee4 in v8::V8::GlobalizeReference(v8::internal::Isolate*, unsigned long*) ()
#2 0x0000555555b2c171 in v8impl::Reference::New(napi_env__*, v8::Local<v8::Value>, unsigned int, bool, void (*)(napi_env__*, void*, void*), void*, void*) ()
#3 0x0000555555b3356c in napi_wrap ()
#4 0x00007fffe674e91a in Napi::ObjectWrap<Blob>::ObjectWrap (this=0x555557f48400, callbackInfo=...) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:4151
#5 0x00007fffe67341fe in Blob::Blob (this=0x555557f48400, info=...) at ../src/blob.h:17
#6 0x00007fffe67737ca in Napi::ObjectWrap<Blob>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#2}::operator()() const (__closure=0x7fffffff8550)
at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:4519
#7 0x00007fffe6789e54 in Napi::details::WrapCallback<Napi::ObjectWrap<Blob>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#2}>(Napi::ObjectWrap<Blob>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info
__*)::{lambda()#2}) (callback=...) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:79
#8 0x00007fffe6773917 in Napi::ObjectWrap<Blob>::ConstructorCallbackWrapper (env=0x55555764dfb0, info=0x7fffffff8620) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:4517
#9 0x0000555555b2b9df in ?? ()
#10 0x0000555555dd7738 in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) ()
#11 0x0000555555dd7c08 in ?? ()
#12 0x0000555555dd8611 in v8::internal::Builtins::InvokeApiFunction(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*
, v8::internal::Handle<v8::internal::HeapObject>) ()
#13 0x0000555555e9e618 in ?? ()
#14 0x0000555555e9f004 in v8::internal::Execution::New(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) ()
#15 0x0000555555d99f49 in v8::Function::NewInstanceWithSideEffectType(v8::Local<v8::Context>, int, v8::Local<v8::Value>*, v8::SideEffectType) const ()
#16 0x0000555555b34e63 in napi_new_instance ()
#17 0x00007fffe673011a in Napi::Function::New (this=0x7fffffff8e00, argc=1, args=0x7fffffff8e90) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:2384
#18 0x00007fffe67300c7 in Napi::Function::New (this=0x7fffffff8e00, args=...) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:2373
#19 0x00007fffe673175e in Napi::FunctionReference::New (this=0x5555579e6de8, args=...) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:3408
#20 0x00007fffe67346fd in Blob::New (env=..., blob=...) at ../src/blob.h:47
#21 0x00007fffe673d2b6 in StringProvider::feed (this=0x555557fa2ac0, info=...) at ../src/contentProvider.h:152
#22 0x00007fffe6724153 in operator() (__closure=0x7fffffff9010) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:170
#23 0x00007fffe672ae2f in Napi::details::WrapCallback<Napi::details::TemplatedInstanceCallback<StringProvider, &StringProvider::feed>(napi_env, napi_callback_info)::<lambda()> >(struct {...}) (callback=...)
at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:79
#24 0x00007fffe67241e3 in Napi::details::TemplatedInstanceCallback<StringProvider, &StringProvider::feed> (env=0x55555764dfb0, info=0x7fffffff90a0) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:167
#25 0x0000555555b2b9df in ?? ()
#26 0x0000555555dd7738 in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) ()
#27 0x0000555555dd7e32 in ?? ()
#28 0x0000555555dd8552 in v8::internal::Builtins::InvokeApiFunction(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*
, v8::internal::Handle<v8::internal::HeapObject>) ()
#29 0x0000555555e9e618 in ?? ()
#30 0x0000555555e9ee12 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) ()
#31 0x0000555555d9a469 in v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) ()
#32 0x0000555555b31209 in napi_call_function ()
#33 0x00007fffe672ffeb in Napi::Function::Call (this=0x7fffffff98d0, recv=0x555557565a38, argc=0, args=0x0) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:2339
#34 0x00007fffe672feee in Napi::Function::Call (this=0x7fffffff98d0, recv=0x555557565a38, args=...) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:2306
#35 0x00007fffe673c9a8 in ContentProviderWrapper::feed()::{lambda(Napi::Env, Napi::Function)#1}::operator()(Napi::Env, Napi::Function) const (__closure=0x7fffcc4b83a0, env=...) at ../src/contentProvider.h:74
#36 0x00007fffe67a27de in std::__invoke_impl<void, ContentProviderWrapper::feed()::{lambda(Napi::Env, Napi::Function)#1}&, Napi::Env, Napi::Function>(std::__invoke_other, ContentProviderWrapper::feed()::{lambda(Napi::Env, Napi::Function)#
1}&, Napi::Env&&, Napi::Function&&) (__f=...) at /usr/include/c++/11.2.0/bits/invoke.h:61
#37 0x00007fffe678f69c in std::__invoke_r<void, ContentProviderWrapper::feed()::{lambda(Napi::Env, Napi::Function)#1}&, Napi::Env, Napi::Function>(ContentProviderWrapper::feed()::{lambda(Napi::Env, Napi::Function)#1}&, Napi::Env&&, Napi::
Function&&) (__fn=...) at /usr/include/c++/11.2.0/bits/invoke.h:111
#38 0x00007fffe67783e3 in std::_Function_handler<void (Napi::Env, Napi::Function), ContentProviderWrapper::feed()::{lambda(Napi::Env, Napi::Function)#1}>::_M_invoke(std::_Any_data const&, Napi::Env&&, Napi::Function&&) (__functor=...,
__args#0=..., __args#1=...) at /usr/include/c++/11.2.0/bits/std_function.h:291
#39 0x00007fffe674cf89 in std::function<void (Napi::Env, Napi::Function)>::operator()(Napi::Env, Napi::Function) const (this=0x7fffcc4b83a0, __args#0=..., __args#1=...) at /usr/include/c++/11.2.0/bits/std_function.h:560
#40 0x00007fffe6732c30 in Napi::ThreadSafeFunction::CallJS (env=0x55555764dfb0, jsCallback=0x555557565a28, data=0x7fffcc4b83a0) at ./code/node-libzim/node_modules/node-addon-api/napi-inl.h:5780
#41 0x0000555555b4adae in ?? ()
#42 0x00007ffff7f3d92d in ?? () from /usr/lib/libuv.so.1
#43 0x00007ffff7f59d0e in ?? () from /usr/lib/libuv.so.1
#44 0x00007ffff7f43438 in uv_run () from /usr/lib/libuv.so.1
--Type <RET> for more, q to quit, c to continue without paging--
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 44 (34 by maintainers)
@KevinEady Thank you. Any hint for you in this Valgrind report?
We discussed in the 10 June Node API meeting to try to make a standalone reproduction of this problem that would be a little easier to debug / troubleshoot. If we mimic the several-thousand creation of
AsyncWorker
s in our standalone test and we do not see the crash, we can (hopefully) assume there’s something incorrect in your code that’s causing this crash.I will continue working on this standalone reproduction in order to try to get a similar crash or rule out core issues.
AFAIK after talking to @kelvinhammond, the bug/problem is still there.
@kelvinhammond You can join through https://join.slack.com/t/kiwixoffline/shared_invite/zt-19s7tsi68-xlgHdmDr5c6MJ7uFmJuBkg
Hi @kelvinhammond , @kelson42 ,
I’m not too sure with the internal workings of the addon or libzim, but looking at the stacktrace it seems like a use-after-free error. The addon’s
Blob
is freed through the garbage collector, running its destructor, decreasing the refcount of thestd::shared_ptr<zim::Blob> blob_
. It also appears that the shared_ptr’szim::Blob
contents are being directly returned here. Since this doesn’t make a copy of the shared pointer, is it possible that thezim::Blob
ends up freeing before libzim has had a chance to act on it? Just a guess here.I’m running valgrind again with a higher
--num-callers=50
argument to try to get more information in the use-after-free stack. I’m getting some different – yet similar – valgrind warnings:So it seems like there is some invalid thread safety regarding the
Blob
on the Node thread and the task runner on the thread created bylibzim
.Does this provide any information / clues?
EDIT: Here’s a link to the latest action run: https://github.com/KevinEady/node-libzim/actions/runs/4128364660/jobs/7132716816
So the latest update I have… Running with
valgrind
(action) displays some memory read errors but does complete successfully, whereas the non-valgrind
(action) run gives the segfault.The valgrind warnings are of this fashion:
Not sure if these memory warnings could cause the behavior witnessed with the segfaulting
@KevinEady Yes, the code that triggers it is commented out in the
test-mem-leak
script. This is still an issue.Sorry, forgot to answer this question. We merged but we are exactly at the same stage and have open a dedicated ticket at https://github.com/openzim/node-libzim/issues/80
I have been pulling my hair out trying to understand what’s going on. In a previous meeting, there was a suggestion that maybe there is some buffer overflow in the stack / heap, but I’m really not sure how to verify this.
I just wanted to see what would happen in a later version of node, and using 18.4.0 I get different stack trace errors, eg:
Unfortunately I had to wipe my machine and lost my debug build of node 16.15.0 … I am going to build debug version of 18.14.0 and see if there are any significant changes
@KevinEady v16.15.0