chia-blockchain: [BUG] My Raspberry Pi 4 4GB currently misses / doesn't finish plenty of signage points in a row
Describe the bug There seem to be phases when my Raspberry Pi 4 GB farmer is missing several signage points in a row. As a consequence, the farmer is not participating these challenges.
E.g. around height 113661:
$ chia farm challenges
Hash: 0x52aceefed2de9a7653456e3906dd7cebd8eac57edcbcea28140741bac4689860Index: 1
Hash: 0x52aceefed2de9a7653456e3906dd7cebd8eac57edcbcea28140741bac4689860Index: 0
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 63
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 62
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 61
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 60
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 37
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 36
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 35
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 34
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 33
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 32
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 31
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 30
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 29
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 28
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 27
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 26
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 25
Hash: 0xc847981b22e0ade25ad4e8bdeee45839876c82dda55d90d3a5472330c4ac5d32Index: 24
Missed signage points 38 until 59, inclusive. That’s 22 uninished signage points in a row.
I have already limited the number of target_peer_count to 30, guessing my Raspi may be overloaded during some p2p network bursts, but this did not resolve the issue.
I have added logging statements to find out under which conditions signage points don’t get added.
A failing signage point always seems to happen when verify_n_wesolowski doesn’t manage to return True eventually, i.e., all wesolowski tries for this signage point return False.
Expected behavior All signage points shall finish.
- OS: Ubuntu Server 20 LTS
- Machine: Raspberry Pi 4 4 GB
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 9
- Comments: 81 (9 by maintainers)
It seems I have found a solution to this issue for my setup.
My current theory for this issue is that there is too much filesystem IO going on on the Raspberry Pi’s SD card when Chia’s config directory
~/.chiais located on the SD card. After all, not only OS-related IO is going on on the SD card, but any Chia-related filesystem IO such as SQLite database IO and Chia logging. SD Card-based~/.chiais set up when following the current Raspberry Pi Chia installation documentation, therefore I expect many if not all Raspberry Pi Chia users facing this problem with “normal” Raspberry Pi-recommended SD cards.To bring this issue theory to test by resolving the issue, I changed Chia’s config directory
~/.chiafrom SD card to a separate (and fast) USB 3.1 flash drive (thumb drive):Monitor your logs. If everything goes well like it did for me, all signage points should now finish correctly:
From 1 to 64, all signage points within a sub slot now finish.
I have been using this solution for four hours now and so far haven’t missed finishing a single signage point. In two days, I will be able to report back full-day statistics.
In case this solution (i.e., moving
~/.chiaonto a separate device) turns out to be a viable solution, and other Raspberry Pi users facing this issue can resolve it by following this solution as well, I recommend updating the Raspberry Pi Chia installation documentation accordingly.P.S.: In case you added an additional external drive because of this and this solution works well for you, in all euphoria don’t forget to mount this new drive on boot by adding a new entry to
/etc/fstab.P.P.S.: When things have worked well for some time, it will be save to free space on the SD card by deleting
~/.chia-backup, though it’s always nice to have some backup of your SQLite databases around in case the live versions get corrupted for some reason.@grocheireland:
If logging looks similar to this …:
… then everything looks okay: “Finished signage point” increases one by one, and even though there were
Signage point 28 not addedin between, they were recovered from as you can tell by the line “Finished signage point 28/64”But if logging looks similar to this …:
… then things go wrong as reported with this issue here. It missed finishing signage points 25 and 26. Meaning your farmer didn’t check if it is eligible for winning Chia for those signage points missed.
There is an additional issue which affects everyone (not just RPI users). That is that if you receive some blocks late, due to networking issues, the signage points you receive might not be verified correctly, since they depend on the last block. I will make a fix for this that will hopefully make it into 1.1
I believe it’s 0 indexed, so there’s 64 of them and it stops at 63.
I think we’re a bit too slow to farm effectively on the Raspberry PI right now. I’m working on optimizations to fix this and our goal is definitely to work well on the RPi.
I don’t have any hard numbers on farming specifically, but right after the transactions launch, the chain experienced a fair amount of traffic, which the RPi was just barely keeping up with.
Hmmm… I don’t think that it is a performance problem only. I have RPi4 8gb ram with microSD only (but it is a fast one). And I have no missed signage points at all chia 1.1.5
Adding my struggles with signage points here as well. My setup:
.chiais on NVMe raid1 pool (BTRFS, logs (INFO), mainnet config)peer_connect_interval: 30andtarget_peer_count: 60With plotting stopped to avoid any possible contention (shouldn’t exist in the above beyond CPU), still appears to see dupes/out-of-order in the logs:
Can see the dupes persist in
challenges:I can move the plotting to another machine (Ryzen 3600XT), but I don’t believe this is a solve since it also happens when plotting is stopped. Network interface is never more than 25% utilized. What else could those of us experiencing this provide to help narrow it down?
me on 1.1.4 version still seeing missing signage points and RC and i am using pc windows 10 , GUI
Hi, with the latest version, on my pi4, I have not missed any signage points today, even when the pi was busy copying a new plot file,across the network, from my harvester. 👍👏
@coderasm, in case your farmer misses hundreds of signage points per day, I recommend giving it a try to move
~/.chia/on a separate drive. Since I have done it, I’m experiencing only a few missed signage points per day with version 1.0.5.At least with 1.0.5, I noticed that there is statistically less signage points 64/64 (== end of sub slots) finishing than signage points 1 to 63:
When a signage point 64/64 is not finishing, logs look similar like this:
@henningrettenmaier (and @grocheireland), how many SP did your Raspberry Pi miss before you moved
~/.chiato another device?From your pasted log, your Raspberry Pi seems to be missing around one signage point per hour. That is already a lot better than missing around 25 signage points per hour, as my Raspberry Pi did before moving the directory.
From your pasted log (and assuming you are in German time zone), only one missed/unfinished signage point happened very close on my side as well:
I missed signage points 10 and 11, you missed signage point 12.
Also, in Keybase, a couple of days ago people in Europe reported missing identical signage points, e.g. I can confirm my farmer missed the same signage points that @martomi did (
keybase://chat/chia_network.public#random/22366).So far, we can already tell that there are at least two independent causes for missing signage points: problematic hardware (probably because of slow SD card IO), and P2P network “hiccups” (?) several people experience.
Thanks flotti455 Yes I am seeing the same prob on a pi4, but not all the time. My pi does NOT rely on an SD card either. Also I tried keeping the pi busy/overloaded but couldn’t as of yet see any correlation between that and missing signage points.i ll look again on Saturday
I have this issue also on Pi 4 4 core with 8gb RAM.
From htop, nothing much going on the processor or other RAM / disk IO. Internet also stable with 100mbps line.
is this being looked by any developer ? What do I need to check in the logs to see this on my pi4
So i reduced
target_peer_countdown to 15. Unfortunately, after five minutes or so my Raspoi already missed finishing two signage points in a row. So I have the feeling that a peer count too high is not the reason for this issue.A regular thumb drive will work great at first and slowly decay. You will actually want a real SSD connected via USB that supports trim well.
@n1ghtwish That’s a really good idea. I’ve uploaded mine here:
https://pastebin.com/raw/w9nnKdKM
Just a couple of notes:
I see issues at the following times (I’ve bolded the times which you have also had an issue at):
So 3 out of 5 issues match up timewise between our logs. I’d say that suggests this could well be a network-wide issue. It would be really interesting to see comparisons from others here.
Tip for fellow Linux users, a quick and dirty command I use to check if a particular logfile has any SP issues:
grep -Po '\d+/\d+' ~/.chia/mainnet/log/debug.log | awk -F/ '{ if ($1==prev+1 || (prev==64 && $1==1)) { if ($1==64) { print "OK" } } else if (prev) { printf "Skipped from %s to %s\n",prev,$1 } prev = $1 }'Expected output:
OKon each line for each sub-slot completed successfully. Any non-sequential SPs will otherwise be printed.I’d be interested to see signage logs from people around the same time frames I have.
These are my logs from today 12:00 - 14:36 local time GMT.
https://pastebin.com/raw/HeAjCtTL
I have noted problems with signage at various points:
12:04 12:28 12:42 13:33 13:59
To query run something like this where you use your username, this is in PowerShell:
(gci “C:\Users*username*.chia\mainnet\log” | ? {$_.Extension -ne “.lock”} | Sort LastWriteTime | gc) -match “Finished signage point”
I’ve tried to rule our various things, so I’ve minimised network traffic (I was copying plots over the network) and also directly cabled my full node into the switch next to the router. I think it’s not a local issue here.
I’ve checked with a few people my end now and we’re all seeing out of order/duplicate signage points around the same times, seems very much like a central chain issue to me.
@arvidn Thanks, that’s good to hear.
However, it seems from the comments that people are having SP issues with more powerful hardware than the RPi. In #3507 there are lots of repeated SPs.
It seems that most people with the issue are using remote harvesters (mostly on the same LAN) and these are causing some duplicated late SPs somehow.
It may be that this is not an issue as all the SPs are there but it would be good if this could be confirmed. Right now, I’m 18 days without a block with an average of ~3,000 plots during that time. Maybe I’m very unlucky, or maybe this is causing a real issue.
Never had issues (but high response times due to lot of smb Network shares) until I switched all plotter into harvesters.
6 Harvester/Plotter in the same network. Everything connected through 1Gbit/s. ISP connection is an old ADSL 6mbit/1mbit Connection with 24 hour reconnect.
Only Main Node ports 8444-8447 opened.
sometimes it’s running just fine from 1-64 in CLI and then it’s completely freaking out.
all Harvester and Full Node are on 1.1.6dev0 and 1 Windows Harvester on 1.1.5.
CPU usage is up to 80-90%. Only process running is GUI.
@flotti455 That was painful. Yes, it’s a better, but it still missed 2 of the last 30.
Quick update: I rebooted (hadn’t rebooted after the upgrade to 1.1.1, and now I’m still getting the signage point not added errors, but typically those signage points are finishing now after a few misses. So I think restarting helped. But I’m slowly ramping back up my plotting, so not sure if it will come back or not.
Yes the fix will be in the 1.1 release
looks like improvements in #1978
If I lost 1 SP, will I lose to WIN all the 64 SP in that block? or just that 1 SP only?
I just updated to 1.0.5 for 3 hours, so far no lost in SP. Hope it is somehow fixed in 1.0.5
Thanks.