bluebird: Segmentation Fault with BlueBird 3.6/3.7 and Node 10

TL;DR – We’re seeing an intermittent segmentation fault with BlueBird in one of our tap based test suites, and have a reproduction over here.

  1. What version of bluebird is the issue happening on?

3.6 and 3.7

  1. What platform and version? (For example Node.js 0.12 or Google Chrome 32)

NodeJS version 10 on MacOS 10 – it does not appear to happen on Node 12. Other plaforms not tested.

  1. Did this issue happen with earlier version of bluebird?

It does not happen with BlueBird 3.5.

Details

One of the tests in our test suite has been failing randomly with a segmentation fault recently. We found some time to track it down, and it seems like it started happening with BlueBird 3.6 (maybe the async_hooks stuff?). The segmentation fault doesn’t happen on every run – if you checkout this repository we’ve setup a small “run until fail” reproduction case. The code in the repro is significantly stripped down form the code in our actual test suite.

It’s very “spooky action at a distance” style bug – sometimes the test runs fine, sometimes it fails with something like the following

TAP version 13
# Subtest: bluebird
    # Subtest: new Promise() throw
        1..0
    ok 1 - new Promise() throw # time=4.55ms

./run-until-fail.sh: line 3: 36728 Segmentation fault: 11  $@

We also captured the crash in lldb with a debug build of node 10.

(lldb) r methods.tap.js
Process 36969 launched: '/Users/astorm/bin/node-debug' (x86_64)
TAP version 13
# Subtest: bluebird
    # Subtest: new Promise() throw
        1..0
    ok 1 - new Promise() throw # time=63.303ms

Process 36969 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000101cb25de node-debug`heap_insert(heap=0x00000001035094d8, newnode=0x0000000105c11498, less_than=(node-debug`timer_less_than at timer.c:39)) at heap-inl.h:140
   137 	
   138 	  /* Insert the new node. */
   139 	  newnode->parent = *parent;
-> 140 	  *child = newnode;
   141 	  heap->nelts += 1;
   142 	
   143 	  /* Walk up the tree and check at each node if the heap property holds.
Target 0: (node-debug) stopped.

Full backtrace from same follows

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000101cb25de node-debug`heap_insert(heap=0x00000001035094d8, newnode=0x0000000105c0ba48, less_than=(node-debug`timer_less_than at timer.c:39)) at heap-inl.h:140
    frame #1: 0x0000000101cb23b4 node-debug`uv_timer_start(handle=0x0000000105c0b9e0, cb=(node-debug`node::(anonymous namespace)::TimerWrap::OnTimeout(uv_timer_s*) at timer_wrap.cc:134), timeout=1, repeat=0) at timer.c:93
    frame #2: 0x00000001002f2af1 node-debug`node::(anonymous namespace)::TimerWrap::Start(args=0x00007ffeefbfc670) at timer_wrap.cc:120
    frame #3: 0x00000001007f4e21 node-debug`v8::internal::FunctionCallbackArguments::Call(this=0x00007ffeefbfc7e0, handler=0x00003c3547afc889) at api-arguments-inl.h:94
    frame #4: 0x00000001007f2aae node-debug`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(isolate=0x0000000106004c00, function=Handle<v8::internal::HeapObject> @ 0x00007ffeefbfc8f0, new_target=Handle<v8::internal::HeapObject> @ 0x00007ffeefbfc8e8, fun_data=Handle<v8::internal::FunctionTemplateInfo> @ 0x00007ffeefbfc8e0, receiver=Handle<v8::internal::Object> @ 0x00007ffeefbfc8d8, args=BuiltinArguments @ 0x00007ffeefbfc910) at builtins-api.cc:109
    frame #5: 0x00000001007f111a node-debug`v8::internal::Builtin_Impl_HandleApiCall(args=BuiltinArguments @ 0x00007ffeefbfca40, isolate=0x0000000106004c00) at builtins-api.cc:139
    frame #6: 0x00000001007f0c4d node-debug`v8::internal::Builtin_HandleApiCall(args_length=6, args_object=0x00007ffeefbfcb28, isolate=0x0000000106004c00) at builtins-api.cc:127
    frame #7: 0x000022036e6c1d64
    frame #8: 0x000022036e5a0ab6
    frame #9: 0x000022036e59336f
    frame #10: 0x000022036e719b42
    frame #11: 0x000022036e5a0ab6
    frame #12: 0x000022036e58f303
    frame #13: 0x000022036e5a0ab6
    frame #14: 0x000022036e5a0ab6
    frame #15: 0x000022036e58f303
    frame #16: 0x000022036e5a0ab6
    frame #17: 0x000022036e5a0ab6
    frame #18: 0x000022036e5a0ab6
    frame #19: 0x000022036e5a0ab6
    frame #20: 0x000022036e5a0ab6
    frame #21: 0x000022036e5a0ab6
    frame #22: 0x000022036e58f303
    frame #23: 0x000022036e5a0ab6
    frame #24: 0x000022036e5a0ab6
    frame #25: 0x000022036e5a0ab6
    frame #26: 0x000022036e58f303
    frame #27: 0x000022036e5a0ab6
    frame #28: 0x000022036e5a0ab6
    frame #29: 0x000022036e5a0ab6
    frame #30: 0x000022036e58f303
    frame #31: 0x000022036e5a0ab6
    frame #32: 0x000022036e5a0ab6
    frame #33: 0x000022036e5a0ab6
    frame #34: 0x000022036e5a0ab6
    frame #35: 0x000022036e5a0ab6
    frame #36: 0x000022036e58f303
    frame #37: 0x000022036e5a0ab6
    frame #38: 0x000022036e5a0ab6
    frame #39: 0x000022036e5a0ab6
    frame #40: 0x000022036e58f303
    frame #41: 0x000022036e5a0ab6
    frame #42: 0x000022036e5a0ab6
    frame #43: 0x000022036e5a0ab6
    frame #44: 0x000022036e5a0ab6
    frame #45: 0x000022036e5a0ab6
    frame #46: 0x000022036e58f303
    frame #47: 0x000022036e5a0ab6
    frame #48: 0x000022036e5a0ab6
    frame #49: 0x000022036e597015
    frame #50: 0x000022036e58cfa1
    frame #51: 0x00000001010f11bd node-debug`v8::internal::GeneratedCode<v8::internal::Object*, v8::internal::Object*, v8::internal::Object*, v8::internal::Object*, int, v8::internal::Object***>::Call(this=0x00007ffeefbfdf88, args=0x00003c35e57826f1, args=0x00003c35bea07191, args=0x00003c353ee9f7a1, args=1, args=0x00007ffeefbfe740) at simulator.h:113
    frame #52: 0x00000001010eebe6 node-debug`v8::internal::(anonymous namespace)::Invoke(isolate=0x0000000106004c00, is_construct=false, target=Handle<v8::internal::Object> @ 0x00007ffeefbfe0a0, receiver=Handle<v8::internal::Object> @ 0x00007ffeefbfe098, argc=1, args=0x00007ffeefbfe740, new_target=Handle<v8::internal::Object> @ 0x00007ffeefbfe090, message_handling=kReport, execution_target=kCallable) at execution.cc:155
    frame #53: 0x00000001010ee439 node-debug`v8::internal::(anonymous namespace)::CallInternal(isolate=0x0000000106004c00, callable=Handle<v8::internal::Object> @ 0x00007ffeefbfe170, receiver=Handle<v8::internal::Object> @ 0x00007ffeefbfe168, argc=1, argv=0x00007ffeefbfe740, message_handling=kReport, target=kCallable) at execution.cc:191
    frame #54: 0x00000001010ee2cd node-debug`v8::internal::Execution::Call(isolate=0x0000000106004c00, callable=Handle<v8::internal::Object> @ 0x00007ffeefbfe1d0, receiver=Handle<v8::internal::Object> @ 0x00007ffeefbfe1c8, argc=1, argv=0x00007ffeefbfe740) at execution.cc:202
    frame #55: 0x0000000100683828 node-debug`v8::Function::Call(this=0x0000000107050640, context=(val_ = 0x000000010704f700), recv=(val_ = 0x0000000106021c40), argc=1, argv=0x00007ffeefbfe740) at api.cc:5206
    frame #56: 0x00000001000cbd0a node-debug`node::InternalMakeCallback(env=0x00007ffeefbfe9f8, recv=(val_ = 0x0000000106021c40), callback=(val_ = 0x0000000107050640), argc=1, argv=0x00007ffeefbfe740, asyncContext=(async_id = 35, trigger_async_id = 31)) at node.cc:672
    frame #57: 0x00000001000158b9 node-debug`node::AsyncWrap::MakeCallback(this=0x0000000105f020f0, cb=(val_ = 0x0000000107050640), argc=1, argv=0x00007ffeefbfe740) at async_wrap.cc:683
    frame #58: 0x00000001002f320c node-debug`node::(anonymous namespace)::TimerWrap::OnTimeout(handle=0x0000000105f02140) at timer_wrap.cc:143
    frame #59: 0x0000000101cb2c4f node-debug`uv__run_timers(loop=0x0000000103509220) at timer.c:174
    frame #60: 0x0000000101cba014 node-debug`uv_run(loop=0x0000000103509220, mode=UV_RUN_DEFAULT) at core.c:355
    frame #61: 0x00000001000e1bbe node-debug`node::Start(isolate=0x0000000106004c00, isolate_data=0x000000010601e200, args=size=2, exec_args=size=0) at node.cc:2895
    frame #62: 0x00000001000dcb59 node-debug`node::Start(event_loop=0x0000000103509220, args=size=2, exec_args=size=0) at node.cc:2994
    frame #63: 0x00000001000dbf4f node-debug`node::Start(argc=2, argv=0x0000000105f00000) at node.cc:3053
    frame #64: 0x0000000101f9015e node-debug`main(argc=2, argv=0x00007ffeefbff888) at node_main.cc:124
    frame #65: 0x0000000100001034 node-debug`start + 52

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 1
  • Comments: 34

Commits related to this issue

Most upvoted comments

@astormnewrelic As far as I know, the core dump doesn’t contain symbols, which are necessary to figure out what code actually corresponds to which functions in the original binary.

I might be wrong, but for me, running bt inside lldb -c core.30607 gives only the raw function pointers, whereas the backtrace for lldb build/node/out/Debug/node -c core.30607 gives the actual corresponding function names (and line numbers) 🙂

Right, this still looks like a bug in core and I am still not sure why it happens. Why would OpenSSL even be related to timer.refresh() causing a segfault?

Just to be clear, the OpenSSL issue does really only cause nothing besides valgrind warnings. It’s unrelated.

@benjamingr That still shouldn’t be an issue, right? Like, if calling timer.refresh() makes the process crash, that’s not “just” a bluebird issue?

Right, this still looks like a bug in core and I am still not sure why it happens. Why would OpenSSL even be related to timer.refresh() causing a segfault?

@AndreMaz thanks for the update and for the thorough debugging! It is very appreciated.

@benjamingr after removing the timer.refresh(); I got zero segfaults 😄 Everything seems to be working properly

@addaleax you might be right regarding this being unrelated to async_hooks

The changes also include https://github.com/petkaantonov/bluebird/commit/60ef7a0e23fd320a11281f67c64a39ff95612ce9#diff-d1f5ad3087cc94d3dc6651e3265219b7R59 which calls timer.refresh()

@AndreMaz can you see if removing the timer.refresh() call there fixes the issue?

Hey @addaleax my Linux machine is at my work so I’ll do it tomorrow

Just more talking out loud to myself – people on holiday should not respond 😉

I’m going to give the following script a try with git bisect run between v10.2 and v10.1. I’ve got things screened off in a medium EC2 instance where builds are taking around an hour, and we’re in roughly 7 steps territory between bisects.

We’ll see what falls over in the (PST/Portland) morning 😃

make clean || exit 125
./configure || exit 125
make -j4 || exit 125

cp out/Release/node ~/binaries/`git rev-parse HEAD`

for i in `seq 100`; do
  ./out/Release/node /home/ec2-user/repro-bluebird-segfault/methods.tap.js
  if [ $? -ne 0 ]
  then
    exit 1
  fi
done

Thanks @benjamingr – I’ll see what I can do w/r/t to a bisect – although I’ve never done it on the node source tree before with a long C++ build in there, so we’ll see how things go 😃. If anyone reading has known science for setting something like that up links/info would be appreciated.

This looks like a Node.js core bug in v10 with async_hooks most likely. Thanks a lot for the repro!