core: Postgres Sync halts at block 46631

Using latest backend and Postgres which Hypersync set to false causes the backend to restarted every x seconds. Last processed block is 39120. No exceptions spotted so far

part of log ` backend | I0508 10:10:03.886061 backend | I0508 10:10:03.886365 backend | I0508 10:10:03.886526 backend | I0508 10:10:03.886771 backend | I0508 10:10:03.887080 backend | I0508 10:10:03.887215 backend | I0508 10:10:03.887326 backend | I0508 10:10:03.887417 backend | I0508 10:10:03.887529 backend | I0508 10:10:03.903783 backend | I0508 10:10:03.903805 backend | badger 2022/05/08 backend | badger 2022/05/08 backend | badger 2022/05/08 backend | badger 2022/05/08 backend | Migrations already backend | I0508 10:10:24.588252 backend | I0508 10:10:24.588354 backend | I0508 10:10:24.589516 backend | I0508 10:10:24.590674 backend | I0508 10:10:24.590744 backend exited with backend | I0508 10:10:28.736972 backend | I0508 10:10:28.738364 backend | I0508 10:10:28.738384 1 config.go:168] SnapshotBlockHeightPeriod: 1000 1 config.go:172] ArchivalMode: ON 1 config.go:178] DisableSlowSync: OFF 1 config.go:198] READ ONLY MODE 1 config.go:206] IGNORING INBOUND INVS 1 config.go:209] Max Inbound Peers: 125 1 config.go:210] Protocol listening on port 17000 1 config.go:216] Rate Limit Feerate: 0 1 config.go:217] Min Feerate: 1000 1 node.go:126] Looking for AddIPs: 0 1 node.go:131] Looking for DNSSeeds: 10 10:10:22 INFO: All 0 tables opened in 0s 10:10:22 INFO: Discard stats nextEmptySlot: 0 10:10:22 INFO: Set nextTxnTs to 78238 10:10:22 INFO: Deleting empty file: /db/v-00000/badgerdb/000258.vlog up to date 1 mempool.go:2262] Calling StartReadOnlyUtxoViewRegenerator… 1 server.go:2304] Server.Start: Starting Server 1 connection_manager.go:828] Full node socket initialized 1 node.go:40] GlobalState BadgerDB Dir: /db/v-00000/badgerdb/global_state 1 node.go:41] GlobalState BadgerDB ValueDir: /db/v-00000/badgerdb/global_state/badgerdb code 137 1 config.go:151] Logging to directory /db/v-00000 1 config.go:152] Running node in MAINNET mode 1 config.go:153] Data Directory: /db/v-00000

` Also see some SSLerrors with "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD

backend | E0508 10:09:20.123602 1 base.go:142] GetBlockchainDotComExchangeRate: Problem with HTTP request https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD: Get "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD": net/http: TLS handshake timeout Problem with HTTP request https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD: Get "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD": net/http: TLS handshake timeout

Not sure with exited with code 137 means.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 51 (46 by maintainers)

Most upvoted comments

@tijno @mvanhalen - another update: so we got PAST block 130970 without the syntax issue - YAY! So the PR mentioned above does resolve SOME issues.

However, at block 132804, this transaction was causing postgres to throw a RuleErrorCoinTransferBalanceEntryDoesNotExist. This transaction is transferring zero DAO coins, which isn’t against the consensus rules (at the moment). After hypersync, when querying for a BalanceEntry from badger, if the balance entry does not exist, we return a BalanceEntry initialized with a balance of zero coins. However, Postgres was simply returning nil, thus causing the rule error. I’ve updated the PR above with this commit to make Postgres behave the same as badger.

I’m syncing again now and will update you once this sync completes or hits another issue.

lazynina on Jun 8, 2022

@tijno @mvanhalen - quick update: I was able to reproduce the syntax error running a regtest node locally and could trigger the same syntax issue when inserting a derived key with a transaction spending limit. I’ve put up this PR which will insert a byte slice representing the TransactionSpendingLimitTracker instead of the struct itself. I am currently syncing a prod node from scratch based off that PR and will report when it finishes syncing.

lazynina on Jun 6, 2022

Hi @mvanhalen - the issue isn’t actually in block 130972, but rather 130970. I was syncing a postgres node again yesterday and managed to find this in the logs:

I0604 05:27:02.540805  649742 server.go:1731] Server._handleBlock: Received block ( 130970 / 135333 ) from Peer [ Remote Address: 34.123.41.111:17000 PeerID=4 ]                            
I0604 05:27:02.553050  649742 base.go:292] Refreshing USD to ETH exchange rate                                                                                                              
E0604 05:27:02.670763  649742 server.go:1725] Server._handleBlock: Encountered an error processing block <Header: < 130970, 00000000000016027d8104a31980eff4e973c61d82ca8e7f13c843cac5603eca
, 1 >, Signer Key: BC1YLh768bVj2R3QpSiduxcvn7ipxF3L3XHsabZYtCGtsinUnNrZvNN>. Disconnecting from peer [ Remote Address: 34.123.41.111:17000 PeerID=4 ]: Error while processing block: : Proce
ssBlock: Problem flushing view to db: ERROR #42601 syntax error at or near ":"                                                                                                              
E0604 05:27:02.670897  649742 peer.go:1229] ReadDeSoMessage: : ReadMessage: Problem decoding NetworkType: read tcp 192.168.86.63:55678->34.123.41.111:17000: use of closed network connectio
n                                                                                                                                                                                           
E0604 05:27:02.670917  649742 peer.go:1057] Peer.inHandler: Can't read message from peer [ Remote Address: 34.123.41.111:17000, DISCONNECTED PeerID=4 ]: ReadDeSoMessage: : ReadMessage: Pro
blem decoding NetworkType: read tcp 192.168.86.63:55678->34.123.41.111:17000: use of closed network connection

It looks like it’s hitting some sort of syntax error when trying to dump the view into postgres. Unfortunately, it’s unclear which insert/delete caused this issue as the logs aren’t that helpful here. So I’ll be adding some better error messaging here and syncing again.

lazynina on Jun 4, 2022

@tijno - we are still investigating how the UTXO in that transaction is not being found.

lazynina on May 31, 2022

Hey guys, wanted to give an update here. It took us a while, but we were able to find and patch the bug that was causing the sync to fail on block 46631. It was due to a slight difference in how Postgres was handling block rewards vs Badger.

My node, with the patch included, is currently at block 82732. We should wait until we hit a full sync before we declare victory-- but at this point I would recommend you guys try and continue your sync with version 2.2.3. It should pick up right where it left off at block 46631 (but make it past this block this time).

I’ll report back once I get a full sync. Again sorry this took so long guys, but Hypersync was a big change, with a lot of moving parts, and the bright side is that it paves the way for a much faster Postgres sync that doesn’t take many hours to complete.

diamondhands0 on May 23, 2022

Looking into this today @mvanhalen !

Just to confirm: Are you running with --hypersync=false and --sync-type=blocksync? These two flags are needed for Postgres.

diamondhands0 on May 13, 2022

We are fully in sync again. Thanks @lazynina

mvanhalen on Jun 19, 2022

@mvanhalen - so there were still some issues with how matching orders were pulled for DAO_COIN_LIMIT_ORDER and storing MESSAGING_GROUP transaction in postgres. To get around this, I’ve written these to badger - we write some other data to badger even when running a postgres node already. The latest commit to main synced a postgres node for me so you should be good to go now!

I’ll continue to look at getting both MESSAGING_GROUP and DAO_COIN_LIMIT_ORDER fully supported in postgres and will keep you apprised of updates on that front as I have them.

lazynina on Jun 15, 2022

@tijno - I believe it is 3JuEUd2owMuXAkMnvh2JzgigVKGor8W2Ws4aDrRGSMH8hsuVVJfq5z

The logs @diamondhands0 posted above state that the block was marked invalid because of a Creator Coin transaction that was spending non-existent UTXOs and this is the only CREATOR_COIN transaction in block 130972

lazynina on May 31, 2022

@diamondhands0 any updates on this?

cc @mvanhalen

tijno on May 30, 2022

Hey guys, @lazynina and I have been degugging here. We’re hoping to have everything sorted out this weekend, with hopefully an update tonight!

diamondhands0 on May 20, 2022

lol 😃 that was a fun block

tijno on May 20, 2022

Ive not retried as ive not seen a relevant fix. There was a postgres fix for a dao trade related bug.

it would be a shame if hardfork will go live again without postgres fix in place

tijno on May 17, 2022

Mine gut stuck again with 16GB RAM around same block. node issue

See some IO reads spiking. Mem and CPU are well within limits. The odd thing is the blocks that make it hang do not have a huge amount of transactions.

mvanhalen on May 13, 2022

Mine crashed at 39139

8gb server for backend with 16gb DB server . Previous postgres nodes did not have issues with this. No OOM errors in logs

No logs either

I0512 00:35:56.208076       1 server.go:1731] Server._handleBlock: Received block ( 39138 / 128763 ) from Peer [ Remote Address: 34.123.41.111:17000 PeerID=2 ]
I0512 00:35:56.318618       1 server.go:1731] Server._handleBlock: Received block ( 39139 / 128763 ) from Peer [ Remote Address: 34.123.41.111:17000 PeerID=2 ]
I0512 00:35:57.279873       1 base.go:228] Refreshing exchange rate...

<< CRASHED >>>

I0512 00:35:59.832240       1 config.go:149] Logging to directory /deso/db/v-00000

There was a huge CPU spike at the crash though on the backend server.

Memory of that server peaked at 85%

DB Server no CPU spike.

Memory usage at 10%

Restarting the backend container just now (11 hours after crash) it seemded to startup fine but server is now unavailable with memory maxing 100%.

Looks like it was deleting “empty files” and crashing repeatedly.

Like this:

badger 2022/05/12 10:37:43 INFO: All 0 tables opened in 1ms
badger 2022/05/12 10:37:43 INFO: Discard stats nextEmptySlot: 0
badger 2022/05/12 10:37:43 INFO: Set nextTxnTs to 78236
badger 2022/05/12 10:37:43 INFO: Deleting empty file: /deso/db/v-00000/badgerdb/000069.vlog
Migrations already up to date
I0512 10:37:43.765678       1 mempool.go:2262] Calling StartReadOnlyUtxoViewRegenerator...
I0512 10:37:43.766803       1 server.go:2344] Server.Start: Starting Server
I0512 10:37:43.769170       1 node.go:40] GlobalState BadgerDB Dir: /deso/db/v-00000/badgerdb/global_state
I0512 10:37:43.769196       1 node.go:41] GlobalState BadgerDB ValueDir: /deso/db/v-00000/badgerdb/global_state/badgerdb
I0512 10:37:43.819180       1 connection_manager.go:833] Full node socket initialized

<< CRASHED >>>

I0512 10:37:46.057936       1 config.go:149] Logging to directory /deso/db/v-00000
I0512 10:37:46.058947       1 config.go:150] Running node in MAINNET mode

tijno on May 12, 2022