rippled: Segmentation fault in txs_iter_impl function (Version: 1.6.0)

Issue Description

We’re running a full history Rippled 1.6.0 and we have multiple processes calling getLedger API (by incrementing the block number). Randomly, we get rippled crashes / segmentation faults. This is potentially a very dangerous issue.

We have the impression that it’s the result of bad handling of some concurrent memory access.

Steps to Reproduce

Here is the way we call the API from Javascript:

api.getLedger({
      ledgerVersion: number,
      includeTransactions: true,
      includeAllData: true,
}).then((res) => {});

Expected Result

We except not to crash the node.

Actual Result

Here is the complete backtrace obtained with gdb:

(gdb) bt
#0  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (__r=..., this=0x7ffd3ca78b00) at /usr/include/c++/7/bits/shared_ptr_base.h:691
#1  std::__shared_ptr<ripple::SHAMapAbstractNode, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (this=0x7ffd3ca78af8) at /usr/include/c++/7/bits/shared_ptr_base.h:1121
#2  std::shared_ptr<ripple::SHAMapAbstractNode>::shared_ptr (this=<optimized out>) at /usr/include/c++/7/bits/shared_ptr.h:119
#3  std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>::pair (this=<optimized out>) at /usr/include/c++/7/bits/stl_pair.h:303
#4  std::_Construct<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> const&> (__p=<optimized out>)
    at /usr/include/c++/7/bits/stl_construct.h:75
#5  std::__uninitialized_copy<false>::__uninit_copy<std::_Deque_iterator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> const&, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> const*>, std::_Deque_iterator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>&, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>*> > (__result=..., __first=..., __last=...)
    at /usr/include/c++/7/bits/stl_uninitialized.h:83
#6  std::uninitialized_copy<std::_Deque_iterator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> const&, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> const*>, std::_Deque_iterator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>&, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>*> > (__result=..., __first=..., __last=...) at /usr/include/c++/7/bits/stl_uninitialized.h:134
#7  std::__uninitialized_copy_a<std::_Deque_iterator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> const&, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> const*>, std::_Deque_iterator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>&, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>*>, std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> > (__result=...,
    __first=..., __last=...) at /usr/include/c++/7/bits/stl_uninitialized.h:289
#8  std::deque<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::allocator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> > >::deque (this=0x7ffd3c3d4530,
    __x=...) at /usr/include/c++/7/bits/stl_deque.h:950
#9  0x00005555571156d5 in std::stack<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::deque<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID>, std::allocator<std::pair<std::shared_ptr<ripple::SHAMapAbstractNode>, ripple::SHAMapNodeID> > > >::stack (this=0x7ffd3c3d4530) at /usr/include/c++/7/bits/stl_stack.h:99
#10 ripple::SHAMap::const_iterator::const_iterator (this=0x7ffd3c3d4530) at /root/rippled/src/ripple/shamap/SHAMap.h:535
#11 ripple::Ledger::txs_iter_impl::txs_iter_impl (this=0x7ffd3c3d4520) at /root/rippled/src/ripple/app/ledger/Ledger.cpp:135

Environment

Ubuntu 18.04.5 LTS Intel® Xeon® CPU E5-2620 v3 RAM 80GB 20TB (SSDs)

Supporting Files

rippled.cfg

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (5 by maintainers)

Most upvoted comments

If you built the binary yourself, the size difference may just be a result of the difference between a Debug & Release build, or a static & non-static build. I don’t know offhand which options are used to build the packages.

@cjcobb23 After compiling 1.7-b8, I tested it intensively for more than 30 minutes with aggressive parameters and I can’t reproduce the bug anymore. Well done guys!

@madshell There should be more to this stack trace. I don’t think frame 11 could ever possibly be the bottom of the stack. There should be some mention of the JobQueue at the very least, since the JobQueue is what calls the functions to handle the RPC. Any chance you can get a full stack trace for the thread that segfaults?

crash_rippled.txt