chia-blockchain: [Bug] plots beyond ~4400 = harvester 100.0 load, cache_hit: false, plots check hangs before challenges
What happened?
Noted that for the last few releases, chia_harvester was pegging a thread continuously while farming.
Info:
- System has >20k plots direct attached. Single harvester.
- plot_refresh_callback completes in 15 seconds and proof checks are typically 0.4-1 sec.
- Aside from chia_harvester constantly pegging its thread, all else appears to function normally.
Elaboration:
- Reinstalled chia_blockchain from scratch, only importing keys and mainnet/wallet db’s. No change.
- Experimented with varying numbers of plots and noted that at below ~4400 plots, chia_harvester no longer pegs a thread (dropped to 0.0 load). Added 200 plots back and load jumped back to 100.0 indefinitely.
- Experimented with various harvester config settings (num_threads, parallel_reads, batch_size). No change.
- Noted that upon startup, and with >4400 plots, the found_plot messages from harvester transition from
cache_hit: Truetocache_hit: False. - Also noted that attempting to run a
chia plots checkon any of the drives/plots withcache_hit: Falseresults in an indefinite hang of that check before it issues a single challenge. - Rewards are tracking for my total plot count (not 4400), so while the
cache_hit: Falsecauses high harvester CPU usage and inability to check those plots, they are still successfully farming.
Possible causes:
- This feels like high plot counts not playing nicely with plot_refresh / chia.plotting.cache, resulting in one of the harvester threads pegging indefinitely while attempting to cache some portion of plots over some maximum, and perhaps that same thread fails to respond to a plots check of those same plots?
Version
1.5.0
What platform are you using?
Linux
What ui mode are you using?
CLI
Relevant log output
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 21 (6 by maintainers)
Okay so… turned out that the reason for all this are plots created via bladebit RAM plotter where the
DiskProverserializes into 524.659 bytes which:uint32->Value 5794656522 does not fit into uint32while we seralize the length of the bytes.The reason why the
DiskProverserializes into such a huge blob is that those plots seem to have 65.536C2entries.Table pointers from a plot in question with
table_begin_pointers[10] - table_begin_pointers[9]-> 262.144:Table pointers from a normally working plot with
table_begin_pointers[10] - table_begin_pointers[9]-> 176:Im going to talk with @harold-b about this and will post an update once we figured this out.
It could still be a caching-related issue since it would create a new cache on the next startup (and the cache is then used while the harvester runs). Either way, we won’t know unless we can figure out a way to tell what those pegged harvester threads are doing.
Updated to 1.5.1 and cleared all settings, starting clean.
cache_hit: falseon a large portion of plots.chia plots checkof previously troublesome ranges takes a long time to start challenges (with its process pegged at 100.0 during the delay of several minutes per 1k plots in the selected range to check), but does eventually begin, and completes without error.