jormungandr: 'stack overflow - fatal runtime error' when /utxo REST api is queried 0.7.0

Describe the bug Node crashes instantly when jcli rest v0 utxo get -h http://127.0.0.1:3101/api is queried. Testet in the private Testnet, where lots of transactions were made. First it just gave nothing back, but after a few thousand txs it just crashed with following error:

Oct 24 01:42:41.423 INFO Starting jormungandr 0.7.0-alpha.dev.1 (HEAD-3af63d09+, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)], task: init
Oct 24 01:42:41.423 WARN Node started without path to the stored secret keys (not a stake pool or a BFT leader), task: init
Oct 24 01:42:41.423 INFO Starting 1 workers, scope: global
Oct 24 01:42:41.424 INFO Starting server on 127.0.0.1:3101, scope: global
Oct 24 01:42:41.424 INFO storing blockchain in '"/home/consuman/CARDANO/testnet_alt/storage/blocks.sqlite"', task: init
Oct 24 01:42:41.426 INFO fetching block c63a07f3e0db52c9abf886453316a1698d41a77023972c708345acf6645a8c0c, peer_address: 45.32.188.204:4000, block: c63a07f3e0db52c9abf886453316a1698d41a77023972c708345acf6645a8c0c, task: init
Oct 24 01:42:42.054 INFO initial bootstrap completed, peer_address: 45.32.188.204:4000, block: c63a07f3e0db52c9abf886453316a1698d41a77023972c708345acf6645a8c0c, task: init
Oct 24 01:42:42.140 INFO connecting to bootstrap peer 51.91.96.237:3100, peer_addr: 51.91.96.237:3100, task: bootstrap
Oct 24 01:42:54.629 ERRO Error occurred during request handling, status: 503 Service Unavailable Full REST context not available yet, scope: global
Oct 24 01:43:00.967 ERRO Error occurred during request handling, status: 503 Service Unavailable Full REST context not available yet, scope: global
Oct 24 01:43:07.976 INFO switching to new candidate branch, peer_addr: 51.91.96.237:3100, task: bootstrap
Oct 24 01:43:07.988 INFO initial bootstrap completed, peer_addr: 51.91.96.237:3100, task: bootstrap
Oct 24 01:43:07.988 INFO starting task, task: client-query
Oct 24 01:43:07.989 INFO adding P2P Topology module: trusted-peers, task: network
Oct 24 01:43:07.989 INFO start listening and accepting gRPC connections on 0.0.0.0:8855, task: network
Oct 24 01:43:07.989 INFO connecting to peer, node_id: 2da766df9d16682e45119194cbe5ccb92e03020625558e821201d6e319a37673, peer_addr: 51.91.96.237:3100, task: network
Oct 24 01:43:07.989 INFO connecting to peer, node_id: 7ae7730a9f19b8398e415c199d22035e48c4e220f75ea0a12870ad2ab309b7a0, peer_addr: 45.32.188.204:4000, task: network
Oct 24 01:43:07.989 INFO connecting to peer, node_id: e277491e0dfc7c23e51258cd84f66fc77b15fe9e1b71b8b3cb0ddce0b6de54b4, peer_addr: 35.235.111.217:9000, task: network
Oct 24 01:43:11.112 INFO incoming P2P connection on 138.201.181.94:8855, peer_addr: 35.235.111.217:39440, task: network
Oct 24 01:43:11.265 INFO client P2P connection closed, node_id: e277491e0dfc7c23e51258cd84f66fc77b15fe9e1b71b8b3cb0ddce0b6de54b4, peer_addr: 35.235.111.217:9000, task: network
Oct 24 01:43:13.556 INFO incoming P2P connection on 138.201.181.94:8855, peer_addr: 54.66.255.167:50069, task: network
Oct 24 01:43:13.581 INFO incoming P2P connection on 138.201.181.94:8855, peer_addr: 54.66.255.167:50068, task: network
Oct 24 01:43:14.106 INFO incoming P2P connection on 138.201.181.94:8855, peer_addr: 176.25.248.202:42848, task: network
Oct 24 01:43:14.106 INFO incoming P2P connection on 138.201.181.94:8855, peer_addr: 176.25.248.202:42846, task: network
Oct 24 01:43:14.179 INFO incoming P2P connection closed, peer_addr: 176.25.248.202:42848, task: network
Oct 24 01:43:14.673 INFO incoming P2P HTTP/2 connection error, reason: connection error: Connection reset by peer (os error 104), peer_addr: 54.66.255.167:50069, task: network
Oct 24 01:43:15.131 INFO incoming P2P connection on 138.201.181.94:8855, peer_addr: 94.195.82.31:43388, task: network
Oct 24 01:43:15.157 INFO incoming P2P connection on 138.201.181.94:8855, peer_addr: 94.195.82.31:43390, task: network
Oct 24 01:43:15.225 INFO incoming P2P connection closed, peer_addr: 94.195.82.31:43388, task: network

thread 'arbiter:f73a75ad-6a0a-40a2-8dbb-dbcd7a3e347c:actix-net-worker-0' has overflowed its stack
fatal runtime error: stack overflow

Mandatory Information

jcli 0.7.0-alpha.dev.1 (master-e8833bce, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]
jormungandr 0.7.0-alpha.dev.1 (master-e8833bce, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]

To Reproduce Query jcli rest v0 utxo get -h http://127.0.0.1:3101/api in a network with a high tx amount.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 16 (13 by maintainers)

Most upvoted comments

@consuman this end point should have been removed a long time ago. This is the least reliable way to obtain the current UTxO list. Better to play with the explorer mode if you wish to get the whole list of UTxO.

documentation is here if you want to run the explorer mode: https://input-output-hk.github.io/jormungandr/quickstart/04_explorer.html

NicolasDP on Oct 29, 2019

@CodeSandwich Under the same scenario as before, there is NO Segmentation Fault anymore.

rinor on Oct 29, 2019

Clean one:

(gdb) frame
#0  0x0000555556d0cba3 in <chain_impl_mockchain::utxo::Iter<V> as core::iter::traits::iterator::Iterator>::next (self=0x7ffff747dca8)
    at chain-deps/chain-impl-mockchain/src/utxo.rs:169
169	            None => match self.hamt_iter.next() {
(gdb) continue
Continuing.

thread 'arbiter:55a596ba-f7af-4a3d-b5c5-a6287abfe11b:actix-net-worker-0' has overflowed its stack
fatal runtime error: stack overflow

Thread 4 "arbiter:55a596b" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

gdb-info_threads.txt gdb-bt_full.txt gdb-thread_apply_all_bt_full.txt

rinor on Oct 29, 2019

under lldb

rocess 3316 stopped
* thread #2, name = 'arbiter:4e3e544', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7295ff8)
    frame #0: 0x0000555556d0cba3 jormungandr`_$LT$chain_impl_mockchain..utxo..Iter$LT$V$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::next::h4f2c4804c986a977(self=0x00007ffff747dca8) at utxo.rs:169
   166 	
   167 	    fn next(&mut self) -> Option<Self::Item> {
   168 	        match &mut self.unspents_iter {
-> 169 	            None => match self.hamt_iter.next() {
   170 	                None => None,
   171 	                Some(unspent) => {
   172 	                    self.unspents_iter = Some((unspent.0, (unspent.1).0.iter()));
(lldb) continue
Process 3316 resuming

thread 'arbiter:4e3e544c-e052-483e-b6f5-2872229eece4:actix-net-worker-0' has overflowed its stack
fatal runtime error: stack overflow
Process 3316 stopped
* thread #2, name = 'arbiter:4e3e544', stop reason = signal SIGABRT
    frame #0: 0x00007ffff7a587bb libc.so.6`__GI_raise(sig=6) at raise.c:51

rinor on Oct 29, 2019

@CodeSandwich , I got this running under gdb (will try lldb shortly). While I check further the bt, does this tells you anything?

Thread 4 "arbiter:efd4e90" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7495700 (LWP 1847)]
0x0000555556d0cba3 in <chain_impl_mockchain::utxo::Iter<V> as core::iter::traits::iterator::Iterator>::next (self=0x7ffff747dca8)
    at chain-deps/chain-impl-mockchain/src/utxo.rs:169
169	            None => match self.hamt_iter.next() {
(gdb) continue
Continuing.

thread 'arbiter:efd4e90d-7bff-48c4-ac9e-c5acde897e2a:actix-net-worker-0' has overflowed its stack
fatal runtime error: stack overflow
Oct 29 14:27:34.362 INFO leader event starting, date: 73017.366, leader: 1, task: leadership

Thread 4 "arbiter:efd4e90" received signal SIGABRT, Aborted.

rinor on Oct 29, 2019

@NicolasDP The bug is still there and we have a chance to track it down. I’m 99% confident that it’s not in UTxO REST endpoint, because it’s not doing anything except forming a JSON. I’ve been able to generate 500 MB UTxO JSON when I was looking for the bug and while it took some time, it was handled perfectly. This bug is somewhere deeper, which means that it’s probably much more severe.

CodeSandwich on Oct 29, 2019

@CodeSandwich Sorry, I didn’t mean a run-time Rust backtrace, but a postmortem from core. LLDB, anyone?

mzabaluev on Oct 28, 2019