core: Postgres Sync halts at block 46631

Using latest backend and Postgres which Hypersync set to false causes the backend to restarted every x seconds. Last processed block is 39120. No exceptions spotted so far

part of log ` backend | I0508 10:10:03.886061 1 config.go:168] SnapshotBlockHeightPeriod: 1000 backend | I0508 10:10:03.886365 1 config.go:172] ArchivalMode: ON backend | I0508 10:10:03.886526 1 config.go:178] DisableSlowSync: OFF backend | I0508 10:10:03.886771 1 config.go:198] READ ONLY MODE backend | I0508 10:10:03.887080 1 config.go:206] IGNORING INBOUND INVS backend | I0508 10:10:03.887215 1 config.go:209] Max Inbound Peers: 125 backend | I0508 10:10:03.887326 1 config.go:210] Protocol listening on port 17000 backend | I0508 10:10:03.887417 1 config.go:216] Rate Limit Feerate: 0 backend | I0508 10:10:03.887529 1 config.go:217] Min Feerate: 1000 backend | I0508 10:10:03.903783 1 node.go:126] Looking for AddIPs: 0 backend | I0508 10:10:03.903805 1 node.go:131] Looking for DNSSeeds: 10 backend | badger 2022/05/08 10:10:22 INFO: All 0 tables opened in 0s backend | badger 2022/05/08 10:10:22 INFO: Discard stats nextEmptySlot: 0 backend | badger 2022/05/08 10:10:22 INFO: Set nextTxnTs to 78238 backend | badger 2022/05/08 10:10:22 INFO: Deleting empty file: /db/v-00000/badgerdb/000258.vlog backend | Migrations already up to date backend | I0508 10:10:24.588252 1 mempool.go:2262] Calling StartReadOnlyUtxoViewRegenerator… backend | I0508 10:10:24.588354 1 server.go:2304] Server.Start: Starting Server backend | I0508 10:10:24.589516 1 connection_manager.go:828] Full node socket initialized backend | I0508 10:10:24.590674 1 node.go:40] GlobalState BadgerDB Dir: /db/v-00000/badgerdb/global_state backend | I0508 10:10:24.590744 1 node.go:41] GlobalState BadgerDB ValueDir: /db/v-00000/badgerdb/global_state/badgerdb backend exited with code 137 backend | I0508 10:10:28.736972 1 config.go:151] Logging to directory /db/v-00000 backend | I0508 10:10:28.738364 1 config.go:152] Running node in MAINNET mode backend | I0508 10:10:28.738384 1 config.go:153] Data Directory: /db/v-00000

` Also see some SSLerrors with "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD

backend | E0508 10:09:20.123602 1 base.go:142] GetBlockchainDotComExchangeRate: Problem with HTTP request https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD: Get "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD": net/http: TLS handshake timeout Problem with HTTP request https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD: Get "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD": net/http: TLS handshake timeout

Not sure with exited with code 137 means.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 51 (46 by maintainers)

Most upvoted comments

@tijno @mvanhalen - another update: so we got PAST block 130970 without the syntax issue - YAY! So the PR mentioned above does resolve SOME issues.

However, at block 132804, this transaction was causing postgres to throw a RuleErrorCoinTransferBalanceEntryDoesNotExist. This transaction is transferring zero DAO coins, which isn’t against the consensus rules (at the moment). After hypersync, when querying for a BalanceEntry from badger, if the balance entry does not exist, we return a BalanceEntry initialized with a balance of zero coins. However, Postgres was simply returning nil, thus causing the rule error. I’ve updated the PR above with this commit to make Postgres behave the same as badger.

I’m syncing again now and will update you once this sync completes or hits another issue.

@tijno @mvanhalen - quick update: I was able to reproduce the syntax error running a regtest node locally and could trigger the same syntax issue when inserting a derived key with a transaction spending limit. I’ve put up this PR which will insert a byte slice representing the TransactionSpendingLimitTracker instead of the struct itself. I am currently syncing a prod node from scratch based off that PR and will report when it finishes syncing.

Hi @mvanhalen - the issue isn’t actually in block 130972, but rather 130970. I was syncing a postgres node again yesterday and managed to find this in the logs:

I0604 05:27:02.540805  649742 server.go:1731] Server._handleBlock: Received block ( 130970 / 135333 ) from Peer [ Remote Address: 34.123.41.111:17000 PeerID=4 ]                            
I0604 05:27:02.553050  649742 base.go:292] Refreshing USD to ETH exchange rate                                                                                                              
E0604 05:27:02.670763  649742 server.go:1725] Server._handleBlock: Encountered an error processing block <Header: < 130970, 00000000000016027d8104a31980eff4e973c61d82ca8e7f13c843cac5603eca
, 1 >, Signer Key: BC1YLh768bVj2R3QpSiduxcvn7ipxF3L3XHsabZYtCGtsinUnNrZvNN>. Disconnecting from peer [ Remote Address: 34.123.41.111:17000 PeerID=4 ]: Error while processing block: : Proce
ssBlock: Problem flushing view to db: ERROR #42601 syntax error at or near ":"                                                                                                              
E0604 05:27:02.670897  649742 peer.go:1229] ReadDeSoMessage: : ReadMessage: Problem decoding NetworkType: read tcp 192.168.86.63:55678->34.123.41.111:17000: use of closed network connectio
n                                                                                                                                                                                           
E0604 05:27:02.670917  649742 peer.go:1057] Peer.inHandler: Can't read message from peer [ Remote Address: 34.123.41.111:17000, DISCONNECTED PeerID=4 ]: ReadDeSoMessage: : ReadMessage: Pro
blem decoding NetworkType: read tcp 192.168.86.63:55678->34.123.41.111:17000: use of closed network connection       

It looks like it’s hitting some sort of syntax error when trying to dump the view into postgres. Unfortunately, it’s unclear which insert/delete caused this issue as the logs aren’t that helpful here. So I’ll be adding some better error messaging here and syncing again.

@tijno - we are still investigating how the UTXO in that transaction is not being found.

Hey guys, wanted to give an update here. It took us a while, but we were able to find and patch the bug that was causing the sync to fail on block 46631. It was due to a slight difference in how Postgres was handling block rewards vs Badger.

My node, with the patch included, is currently at block 82732. We should wait until we hit a full sync before we declare victory-- but at this point I would recommend you guys try and continue your sync with version 2.2.3. It should pick up right where it left off at block 46631 (but make it past this block this time).

I’ll report back once I get a full sync. Again sorry this took so long guys, but Hypersync was a big change, with a lot of moving parts, and the bright side is that it paves the way for a much faster Postgres sync that doesn’t take many hours to complete.

Looking into this today @mvanhalen !

Just to confirm: Are you running with --hypersync=false and --sync-type=blocksync? These two flags are needed for Postgres.

We are fully in sync again. Thanks @lazynina

@mvanhalen - so there were still some issues with how matching orders were pulled for DAO_COIN_LIMIT_ORDER and storing MESSAGING_GROUP transaction in postgres. To get around this, I’ve written these to badger - we write some other data to badger even when running a postgres node already. The latest commit to main synced a postgres node for me so you should be good to go now!

I’ll continue to look at getting both MESSAGING_GROUP and DAO_COIN_LIMIT_ORDER fully supported in postgres and will keep you apprised of updates on that front as I have them.

@tijno - I believe it is 3JuEUd2owMuXAkMnvh2JzgigVKGor8W2Ws4aDrRGSMH8hsuVVJfq5z

The logs @diamondhands0 posted above state that the block was marked invalid because of a Creator Coin transaction that was spending non-existent UTXOs and this is the only CREATOR_COIN transaction in block 130972

@diamondhands0 any updates on this?

cc @mvanhalen

Hey guys, @lazynina and I have been degugging here. We’re hoping to have everything sorted out this weekend, with hopefully an update tonight!

lol 😃 that was a fun block

Ive not retried as ive not seen a relevant fix. There was a postgres fix for a dao trade related bug.

it would be a shame if hardfork will go live again without postgres fix in place

Mine gut stuck again with 16GB RAM around same block. node issue

See some IO reads spiking. Mem and CPU are well within limits. The odd thing is the blocks that make it hang do not have a huge amount of transactions.

Mine crashed at 39139

8gb server for backend with 16gb DB server . Previous postgres nodes did not have issues with this. No OOM errors in logs

No logs either

I0512 00:35:56.208076       1 server.go:1731] Server._handleBlock: Received block ( 39138 / 128763 ) from Peer [ Remote Address: 34.123.41.111:17000 PeerID=2 ]
I0512 00:35:56.318618       1 server.go:1731] Server._handleBlock: Received block ( 39139 / 128763 ) from Peer [ Remote Address: 34.123.41.111:17000 PeerID=2 ]
I0512 00:35:57.279873       1 base.go:228] Refreshing exchange rate...

<< CRASHED >>>

I0512 00:35:59.832240       1 config.go:149] Logging to directory /deso/db/v-00000

There was a huge CPU spike at the crash though on the backend server.

CleanShot 2022-05-12 at 11 31 06@2x

Memory of that server peaked at 85%

DB Server no CPU spike.

Memory usage at 10%

CleanShot 2022-05-12 at 11 32 54@2x

Restarting the backend container just now (11 hours after crash) it seemded to startup fine but server is now unavailable with memory maxing 100%.

Looks like it was deleting “empty files” and crashing repeatedly.

Like this:

badger 2022/05/12 10:37:43 INFO: All 0 tables opened in 1ms
badger 2022/05/12 10:37:43 INFO: Discard stats nextEmptySlot: 0
badger 2022/05/12 10:37:43 INFO: Set nextTxnTs to 78236
badger 2022/05/12 10:37:43 INFO: Deleting empty file: /deso/db/v-00000/badgerdb/000069.vlog
Migrations already up to date
I0512 10:37:43.765678       1 mempool.go:2262] Calling StartReadOnlyUtxoViewRegenerator...
I0512 10:37:43.766803       1 server.go:2344] Server.Start: Starting Server
I0512 10:37:43.769170       1 node.go:40] GlobalState BadgerDB Dir: /deso/db/v-00000/badgerdb/global_state
I0512 10:37:43.769196       1 node.go:41] GlobalState BadgerDB ValueDir: /deso/db/v-00000/badgerdb/global_state/badgerdb
I0512 10:37:43.819180       1 connection_manager.go:833] Full node socket initialized

<< CRASHED >>>

I0512 10:37:46.057936       1 config.go:149] Logging to directory /deso/db/v-00000
I0512 10:37:46.058947       1 config.go:150] Running node in MAINNET mode