core: Postgres Sync halts at block 46631
Using latest backend and Postgres which Hypersync set to false causes the backend to restarted every x seconds. Last processed block is 39120. No exceptions spotted so far
part of log ` backend | I0508 10:10:03.886061 1 config.go:168] SnapshotBlockHeightPeriod: 1000 backend | I0508 10:10:03.886365 1 config.go:172] ArchivalMode: ON backend | I0508 10:10:03.886526 1 config.go:178] DisableSlowSync: OFF backend | I0508 10:10:03.886771 1 config.go:198] READ ONLY MODE backend | I0508 10:10:03.887080 1 config.go:206] IGNORING INBOUND INVS backend | I0508 10:10:03.887215 1 config.go:209] Max Inbound Peers: 125 backend | I0508 10:10:03.887326 1 config.go:210] Protocol listening on port 17000 backend | I0508 10:10:03.887417 1 config.go:216] Rate Limit Feerate: 0 backend | I0508 10:10:03.887529 1 config.go:217] Min Feerate: 1000 backend | I0508 10:10:03.903783 1 node.go:126] Looking for AddIPs: 0 backend | I0508 10:10:03.903805 1 node.go:131] Looking for DNSSeeds: 10 backend | badger 2022/05/08 10:10:22 INFO: All 0 tables opened in 0s backend | badger 2022/05/08 10:10:22 INFO: Discard stats nextEmptySlot: 0 backend | badger 2022/05/08 10:10:22 INFO: Set nextTxnTs to 78238 backend | badger 2022/05/08 10:10:22 INFO: Deleting empty file: /db/v-00000/badgerdb/000258.vlog backend | Migrations already up to date backend | I0508 10:10:24.588252 1 mempool.go:2262] Calling StartReadOnlyUtxoViewRegenerator… backend | I0508 10:10:24.588354 1 server.go:2304] Server.Start: Starting Server backend | I0508 10:10:24.589516 1 connection_manager.go:828] Full node socket initialized backend | I0508 10:10:24.590674 1 node.go:40] GlobalState BadgerDB Dir: /db/v-00000/badgerdb/global_state backend | I0508 10:10:24.590744 1 node.go:41] GlobalState BadgerDB ValueDir: /db/v-00000/badgerdb/global_state/badgerdb backend exited with code 137 backend | I0508 10:10:28.736972 1 config.go:151] Logging to directory /db/v-00000 backend | I0508 10:10:28.738364 1 config.go:152] Running node in MAINNET mode backend | I0508 10:10:28.738384 1 config.go:153] Data Directory: /db/v-00000
` Also see some SSLerrors with "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD
backend | E0508 10:09:20.123602 1 base.go:142] GetBlockchainDotComExchangeRate: Problem with HTTP request https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD: Get "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD": net/http: TLS handshake timeout Problem with HTTP request https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD: Get "https://api.blockchain.com/v3/exchange/tickers/CLOUT-USD": net/http: TLS handshake timeout
Not sure with exited with code 137 means.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 51 (46 by maintainers)
@tijno @mvanhalen - another update: so we got PAST block 130970 without the syntax issue - YAY! So the PR mentioned above does resolve SOME issues.
However, at block 132804, this transaction was causing postgres to throw a
RuleErrorCoinTransferBalanceEntryDoesNotExist
. This transaction is transferring zero DAO coins, which isn’t against the consensus rules (at the moment). After hypersync, when querying for a BalanceEntry from badger, if the balance entry does not exist, we return a BalanceEntry initialized with a balance of zero coins. However, Postgres was simply returning nil, thus causing the rule error. I’ve updated the PR above with this commit to make Postgres behave the same as badger.I’m syncing again now and will update you once this sync completes or hits another issue.
@tijno @mvanhalen - quick update: I was able to reproduce the syntax error running a regtest node locally and could trigger the same syntax issue when inserting a derived key with a transaction spending limit. I’ve put up this PR which will insert a byte slice representing the TransactionSpendingLimitTracker instead of the struct itself. I am currently syncing a prod node from scratch based off that PR and will report when it finishes syncing.
Hi @mvanhalen - the issue isn’t actually in block 130972, but rather 130970. I was syncing a postgres node again yesterday and managed to find this in the logs:
It looks like it’s hitting some sort of syntax error when trying to dump the view into postgres. Unfortunately, it’s unclear which insert/delete caused this issue as the logs aren’t that helpful here. So I’ll be adding some better error messaging here and syncing again.
@tijno - we are still investigating how the UTXO in that transaction is not being found.
Hey guys, wanted to give an update here. It took us a while, but we were able to find and patch the bug that was causing the sync to fail on block 46631. It was due to a slight difference in how Postgres was handling block rewards vs Badger.
My node, with the patch included, is currently at block 82732. We should wait until we hit a full sync before we declare victory-- but at this point I would recommend you guys try and continue your sync with version 2.2.3. It should pick up right where it left off at block 46631 (but make it past this block this time).
I’ll report back once I get a full sync. Again sorry this took so long guys, but Hypersync was a big change, with a lot of moving parts, and the bright side is that it paves the way for a much faster Postgres sync that doesn’t take many hours to complete.
Looking into this today @mvanhalen !
Just to confirm: Are you running with --hypersync=false and --sync-type=blocksync? These two flags are needed for Postgres.
We are fully in sync again. Thanks @lazynina
@mvanhalen - so there were still some issues with how matching orders were pulled for
DAO_COIN_LIMIT_ORDER
and storingMESSAGING_GROUP
transaction in postgres. To get around this, I’ve written these to badger - we write some other data to badger even when running a postgres node already. The latest commit to main synced a postgres node for me so you should be good to go now!I’ll continue to look at getting both
MESSAGING_GROUP
andDAO_COIN_LIMIT_ORDER
fully supported in postgres and will keep you apprised of updates on that front as I have them.@tijno - I believe it is 3JuEUd2owMuXAkMnvh2JzgigVKGor8W2Ws4aDrRGSMH8hsuVVJfq5z
The logs @diamondhands0 posted above state that the block was marked invalid because of a Creator Coin transaction that was spending non-existent UTXOs and this is the only
CREATOR_COIN
transaction in block 130972@diamondhands0 any updates on this?
cc @mvanhalen
Hey guys, @lazynina and I have been degugging here. We’re hoping to have everything sorted out this weekend, with hopefully an update tonight!
lol 😃 that was a fun block
Ive not retried as ive not seen a relevant fix. There was a postgres fix for a dao trade related bug.
it would be a shame if hardfork will go live again without postgres fix in place
Mine gut stuck again with 16GB RAM around same block.
See some IO reads spiking. Mem and CPU are well within limits. The odd thing is the blocks that make it hang do not have a huge amount of transactions.
Mine crashed at 39139
8gb server for backend with 16gb DB server . Previous postgres nodes did not have issues with this. No OOM errors in logs
No logs either
There was a huge CPU spike at the crash though on the backend server.
Memory of that server peaked at 85%
DB Server no CPU spike.
Memory usage at 10%
Restarting the backend container just now (11 hours after crash) it seemded to startup fine but server is now unavailable with memory maxing 100%.
Looks like it was deleting “empty files” and crashing repeatedly.
Like this: