chia-blockchain: Nothing tells you when you're timing out on 30 second harvester proof checks

The problem

I thought I was farming, but I wasn’t – because something about my network caused the proof check to take more than the hard-coded 30 second limit.

I had an average time to win of 8 or 9 hours for more than 120 hours without a single win. This seemed statistically implausible, so I researched the logs, and cleared any errors or warnings in the logs (well done, all the warnings and errors in debug.log were indeed things I should fix!). Still no wins for a long time.

How to reproduce

Have a bunch of plots on slow storage media; when the proof check happens, verifying the proofs takes longer than the hard-coded 30 seconds allowed. You will never win a single Chia, but there’s absolutely nothing in the GUI to inform you that this is happening. You can view the logs, but in the logs it is not even presented as a warning (!), but as an INFO message:

2021-04-24T16:03:29.039 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.74567 s. Total 3922 plots
2021-04-24T16:03:29.433 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.92976 s. Total 3922 plots
2021-04-24T16:03:29.635 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.01540 s. Total 3922 plots
2021-04-24T16:03:44.553 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.11716 s. Total 3922 plots
2021-04-24T16:04:10.882 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.51660 s. Total 3922 plots
2021-04-24T16:04:36.101 harvester chia.harvester.harvester: INFO     9 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 37.96922 s. Total 3922 plots
2021-04-24T16:05:13.959 harvester chia.harvester.harvester: INFO     11 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 63.07326 s. Total 3922 plots
2021-04-24T16:05:13.959 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 50.81979 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 54.92095 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 42.29693 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 29.67779 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 18.59361 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 6.10934 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     10 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.17564 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     13 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.82808 s. Total 3922 plots
2021-04-24T16:05:35.600 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.82815 s. Total 3922 plots
2021-04-24T16:05:35.600 harvester chia.harvester.harvester: INFO     12 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.56252 s. Total 3922 plots
2021-04-24T16:05:58.022 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.26593 s. Total 3922 plots
2021-04-24T16:06:11.887 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.42098 s. Total 3922 plots
2021-04-24T16:06:36.417 harvester chia.harvester.harvester: INFO     3 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.36354 s. Total 3922 plots

Of the above, the proofs that take longer than 30 seconds are not eligible to win, but this is not logged as an ERROR or WARNING or surfaced in the UI in any way.

Expected behavior

The GUI will tell you “hey, your proof checks are too slow, there’s absolutely no chance for you to win, even if you are farming infinity plots”

Screenshots

Imgur

Desktop

  • OS: Windows 10
  • OS Version/Flavor: latest 19042.928
  • CPU: i7-1165G7

Additional context

I followed up on the #support channel in Keybase, where I got the important advice to enable INFO level logging and check for the 30 second proof limit… and I wrote up a detailed account on the forum; if you need excruciating levels of detail, please check there 🙇‍♂️

Recommended solution

  • Allow the 30 second proof check threshold to be configurable to account for slower storage (or relax it a bit)
  • change the log level of beyond 30 second on harvester proof checks to WARN or ERROR
  • in the logs, provide details on which files are being checked for proofs, so you can identify which media or NAS is potentially the problem one slowing everything down

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 12
  • Comments: 22 (4 by maintainers)

Most upvoted comments

I think lots of people need to know their whole system just too slow to provide the a valid answer in time. So please mark the logs as WARNING as they takes longer a certain threshold.

Yeah this is a critical issue for the project IMO, since a LOT of people are probably “farming” absolutely nothing due to the 30s harvester timeout, and the logs aren’t WARN-ing or ERROR-ing them… the GUI isn’t telling them… the only way to know this is happening is to intentionally set log level to INFO and scan for 30s or longer in the INFO messages 😱

😭

Your NAS caused your task to time out. It is recommended to abandon this method.

I did a somewhat scalable workaround for this by running multiple harvesters in containers (on the RPi4), each with ONE plot directory with 100 plots. This gives arounds 2 seconds for proof checking per container (up to 100 plots).

This was quite a rough finding since I’ve been happily farming on my RPi on wifi to a remote storage with proof checking usually between 60-90 seconds.

We can’t change the consensus algorithm anymore, but we can change the log level, and show the files.

Please note that looking up qualities for plots passing the filter requires about 7 random reads in a plot, whereas actually looking up a proof requires 64 reads. It might not be feasible on a slow NAS, since these are sequential reads. Furthermore, you need to take into account network latency to propagate your proof and block to the network, so you should be under 5 seconds to reduce risk of losing rewards.

Actually, the proof of space library does them sequentially, but they could be done in parallel, since it’s a tree, so you could do 1 read, then 2, then 4, … etc, for a total of around 7 sequential phases (one for each table in the plot). We haven’t got around to doing this yet.

Moreover, I would like to know why your laptop can mount nearly 400TB disks, is your disk on nas?

We can’t change the consensus algorithm anymore, but we can change the log level, and show the files.

Please note that looking up qualities for plots passing the filter requires about 7 random reads in a plot, whereas actually looking up a proof requires 64 reads. It might not be feasible on a slow NAS, since these are sequential reads. Furthermore, you need to take into account network latency to propagate your proof and block to the network, so you should be under 5 seconds to reduce risk of losing rewards.

Actually, the proof of space library does them sequentially, but they could be done in parallel, since it’s a tree, so you could do 1 read, then 2, then 4, … etc, for a total of around 7 sequential phases (one for each table in the plot). We haven’t got around to doing this yet.

@mariano54 My disk array is not in the local but distributes in different places. I gather them as a huge disk and connect them with NFS. The latency of this solution for each random seek is between 200-1000ms. The average and the mean is about 400ms. I wondering whether this is able to farm. 400ms*(7+64)seeks will be 28.4s. Just a little bit below the 30s limitation. However, this solution support parallel read. If it is seeking 64 different places in the plot at the same time, it will be able to respond to all of them within a single seek time, which is about 400ms. Is parallel seeking in the todo list and is that work as what I am thinking? When it will be updated? Thanks a lot!

Yeah, this only started happening as I added multiple NAS devices to the network. With 1 or 2 NASes, it was all fine. Once you get to 5… not so much, especially if the algorithm picks plots on 6 different devices. At the very least

  • definitely change the log level to ERROR (or WARN)
  • surface this in the UI for sure, otherwise you’re silently “farming”… absolutely nothing. A silent killer.
  • tell us WHICH files were picked for proofs so we can look for “problem” devices

thanks @mariano54 !