croc: Add flag to trigger hashing full file when synchronizing

On croc (version v8.6.12-c373b38), detection of whether a file is already present on the receiving side is not trustworthy. As I understand it, a constant-time hash is used, meaning that not the entire file content is inspected, but only a fixed number of samples. This, in turn, means that any differences in any other places are not detected. This is entirely unexpected behavior.

Example:

Create a file of 10 MB of \0s, followed by an A, followed by 10 MB of \0s again, transfer it, then create the same file, but with a B instead of the A, transfer it again. The second time, the file is not transferred because croc thinks the two files are identical (the debug output states that the hashes are identical). Expected behavior would be to transmit the new file.

On the sender side:

~/croctest/send $ dd if=/dev/zero bs=10M count=1 > A; echo A >> A; dd if=/dev/zero bs=10M count=1 >> A
1+0 records in
1+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0,0329896 s, 318 MB/s
1+0 records in
1+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0,0147199 s, 712 MB/s
~/croctest/send $ croc send A
Sending 'A' (20.0 MB)
Code is: herbert-mars-jump
On the other computer run

croc herbert-mars-jump

Sending (->192.168.101.108:53286)
 100% |████████████████████| (20/20 MB, 45.569 MB/s)
~/croctest/send $ dd if=/dev/zero bs=10M count=1 > A; echo B >> A; dd if=/dev/zero bs=10M count=1 >> A
1+0 records in
1+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0,0273334 s, 384 MB/s
1+0 records in
1+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0,011334 s, 925 MB/s
~/croctest/send $ croc send A
Sending 'A' (20.0 MB)
Code is: formula-nothing-virus
On the other computer run

croc formula-nothing-virus
~/croctest/receive $ sum A
00043 20481
~/croctest/send $

And on the receiver side:

~/croctest/receive $ croc herbert-mars-jump 
Accept 'A' (20.0 MB)? (y/n) y

Receiving (<-[::1]:51542)
 100% |████████████████████| (20/20 MB, 45.438 MB/s)
~/croctest/receive $ croc formula-nothing-virus
Accept 'A' (20.0 MB)? (y/n) y

Receiving (<-[::1]:51596)
~/croctest/receive $
32810 20481
~/croctest/receive $

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 23 (12 by maintainers)

Most upvoted comments

Ooooh! Does hashing the whole file load it all into memory? I was under the impression it was streamed.

@schollz Currently it does, but only because that was the code path expected for very small files and that was a simple approach. The change to streaming is pretty easy to make and I will do so. I’ll drop a note when there is a new version you can pull in.

Hi @kalafut! Thanks so much for imohash! croc has probably been used to imohash millions and millions of files by now (the public relay regularly goes through ~8 terabytes of bandwidth, and that’s just on the public relay!) I’m so glad that you created it. That’s great to know about the configurable mode - I can turn that on and see what kind of effect is has. From my vantage point though, I haven’t seen a failure from imohash in my transfers (and I transfer a lot of files at work…) nor have I seen any bug reports that I could interpret as an imohash failure. An all-around success for imohash - it really does seem to treat real world cases really well.