chia-blockchain: [Bug] GRResult error occuring a couple times a day farming a few PB of C2 compressed plots using an Nvidia P4 GPU - Bladebit
What happened?
When the system (ProLiant DL360 Gen9, dual E5-2620 v4, 32 gigs ram, Nvidia P4, 75k C2 plots) hits a high IO load on the same block device as the Chia Full Node DB, shortly after the debug.log in chia will show GRResult not ok. The number of plots, lookup times, all seems fine - but the harvester stops finding proofs until the harvester is restarted. Happens 1-2 times in a 24 hour period on Alpha 4 through Alpha 4.3
Whenever error occurs, block validation time and lookup time consistently increase leading up to the error being thrown.
Reproducible with Nvidia Unix GPU Driver versions 530.30.03, 530.41.03, and 535.43.02
Version
2.0.0b3.dev56
What platform are you using?
Ubuntu 22.04 Linux Kernel 5.15.0-73-generic ProLiant DL360 Gen9, dual E5-2620 v4, 32 gigs ram, Nvidia P4, 75k C2 plots
What ui mode are you using?
CLI
Relevant log output
023-05-29T20:45:32.552 full_node chia.full_node.mempool_manager: WARNING pre_validate_spendbundle took 2.0414 seconds for xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2023-05-29T20:45:42.620 full_node chia.full_node.mempool_manager: WARNING add_spendbundle xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx took 10.06 seconds. Cost: 2924758101 (26.589% of max block cost)
2023-05-29T20:45:56.840 full_node chia.full_node.full_node: WARNING Block validation time: 2.82 seconds, pre_validation time: 2.81 seconds, cost: None header_hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx height: 3732042
2023-05-29T20:46:57.239 full_node chia.full_node.full_node: WARNING Block validation time: 3.34 seconds, pre_validation time: 0.42 seconds, cost: 3165259860, percent full: 28.775% header_hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx height: 3732044
2023-05-29T20:49:26.913 full_node chia.full_node.full_node: WARNING Block validation time: 2.40 seconds, pre_validation time: 0.49 seconds, cost: 2041855544, percent full: 18.562% header_hash: 8d0ce076a3270a0c8c9c8d1f0e73c9b5b884618ee34020d2a4f3ffafa459cfd0 height: 3732055
2023-05-29T20:51:06.259 full_node full_node_server : WARNING Banning 89.58.33.71 for 10 seconds
2023-05-29T20:51:06.260 full_node full_node_server : WARNING Invalid handshake with peer. Maybe the peer is running old software.
2023-05-29T20:51:27.986 harvester chia.harvester.harvester: ERROR Exception fetching full proof for /media/chia/hdd23/plot-k32-c02-2023-04-23-someplot.plot. GRResult is not GRResult_OK.
2023-05-29T20:51:28.025 harvester chia.harvester.harvester: ERROR File: /media/chia/hdd23/someplot.plot Plot ID: someplotID, challenge: 7b5b6f11ec2a86a7298cb55b7db8a016a775efea221104b37905366b49f2e2bd, plot_info: PlotInfo(prover=<chiapos.DiskProver object at 0x7f3544998f30>, pool_public_key=None, pool_contract_puzzle_hash=<bytes32: contractHash>, plot_public_key=<G1Element PlotPubKey>, file_size=92374601728, time_modified=1682261996.8218756)
2023-05-29T20:51:57.482 full_node chia.full_node.full_node: WARNING Block validation time: 10.23 seconds, pre_validation time: 0.29 seconds, cost: 959315244, percent full: 8.721% header_hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx height: 3732059
2023-05-29T20:55:24.640 full_node chia.full_node.full_node: WARNING Block validation time: 3.18 seconds, pre_validation time: 0.26 seconds, cost: 2282149756, percent full: 20.747% header_hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx height: 3732067
2023-05-29T20:56:01.825 wallet wallet_server : WARNING Banning 95.54.100.118 for 10 seconds
2023-05-29T20:56:01.827 wallet wallet_server : ERROR Exception Invalid version: '1.6.2-sweet', exception Stack: Traceback (most recent call last):
File "chia/server/server.py", line 483, in start_client
File "chia/server/ws_connection.py", line 222, in perform_handshake
File "packaging/version.py", line 198, in __init__
packaging.version.InvalidVersion: Invalid version: '1.6.2-sweet'
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 2
- Comments: 83 (8 by maintainers)
How hard is it for devs to get a 1060 and reproduce the problem locally. Just blindly upgrading the version with unrelated fixes and asking people to test it for you is not very professional. Chia is opensource and that is nice and we are grateful, but you are paid and that is your job, so do it please.
Same thing here can be days or minutes. Z840 dual e5-2699 v3, 512gb ddr4 with a 3060 ti. I can plot all day with this setup. But finish plotting and attempt to farm.CRASH! And as one person said this is a sneaky bug. It looks like things are running fine plots are passing filter, but no proofs. You go into the logs and this thing has stopped working 12 hours ago lol. Maddening. Is this even being looked at? I know they had a lay off. I guess gigahorse is going to be the answer?
Thanks for the update. Too bad that didn’t fix it.
I read on the discord to disable
nvidia-persistenced.serviceand this seems to have worked in my case. I used to have problems every couple of hours, and now I’m over a day without issues.//edit: just as I posted that, it happened again 😕