etcher: .gz doesn't return correct file size for content above 2^32 B = 4 GB
- 1.0.0-beta13
- Linux 64bit
Taking it out from #629, apparently the gzip file format cannot accurately return the size of files above 4GB (2^32 bytes), but returns the modulo.
Looks like on the command line people recommended something like zcat file.gz | wc -c or gzip -dc file | wc -c which give the correct value - though then decompresses the the file twice. Might have to do that for gzip in the end, though, since likely >4GB files are common for Etcher’s use case.
This might let images to start to be burned onto cards that are too small (in worst case), or affects the progress bar.
From testing with a 4100MiB > 4096MiB image, indeed .gz version lets to select a 512MB SD card, while the same file’s .xz archive does not.
For the progress bar, the MB/s reading seems to be affected (shows very low speed, eg. 0.01MB/s) but the progress percentage does not (shows correctly for the burning process), so it’s not too bad.
About this issue
- Original URL
- State: open
- Created 8 years ago
- Comments: 42 (38 by maintainers)
I know. So you could say gzip is not the best choice for files > 4 GB? I prefer xz over gzip.
Beyond all this though, we should have a heuristic that basically says this:
I think an algorithm like this, used only for gzip files (maybe bzip too?), should fix the vast majority of the cases. We should still fail well when we’re wrong, but we should try hard to be right 😃
–
Alexandros Marinos
Founder & CEO, Resin.io
+1 206-637-5498
@alexandrosm
On Fri, Mar 3, 2017 at 3:51 PM, Juan Cruz Viotti notifications@github.com wrote:
@jviotti I think your description is still mixing up two uncovered issues, broken out to two parts to here and to #629. In that other issue (as described the reproducible
ENOSPC) that happens regardless of compression.In this issue:
ENOSPCwould happen if there’s agzimage withSIZE > 2^32bytes, and the user is trying to burn onto a card which hasCAPACITY < SIZEbutCAPACITY > SIZE mod 2^32. Then it would cause an issue, because the initial capacity check in etcher couldn’t figure out the correct sizeCAPACITY > SIZE, then the only effect is that the “speed” bar is wrong, but everything else works properly (including the progress bar), and the user won’t run intoENOSPC.Decompressing things twice might be a bad solution, but judging by the comments, it’s just
gznot designed for these big files (nor to return correct size estimate either), so curious to see if there’s any other solution than run through the file twice. Should not be too bad, especially if that has some UI display such as “checking archive contents”, so people know it not all just hung. I think doing things correctly forgzis more important than taking a bit longer time. Decompression itself with no data storage just byte counting seems to be pretty fast.