go: compress/flate: deflatefast produces corrupted output
What version of Go are you using (go version
)?
$ go version go version go1.15 linux/amd64
Does this issue reproduce with the latest release?
I’m able to repro this in since go1.15
, including the latest go1.15.2
.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go envGO111MODULE=“off” GOARCH=“amd64” GOBIN=“/usr/local/google/home/yekuang/infra/go/bin” GOCACHE=“/usr/local/google/home/yekuang/infra/go/.cache” GOENV=“/usr/local/google/home/yekuang/.config/go/env” GOEXE=“” GOFLAGS=“” GOHOSTARCH=“amd64” GOHOSTOS=“linux” GOINSECURE=“” GOMODCACHE=“/usr/local/google/home/yekuang/infra/go/.vendor/pkg/mod” GONOPROXY=“” GONOSUMDB=“” GOOS=“linux” GOPATH=“/usr/local/google/home/yekuang/infra/go/.vendor:/usr/local/google/home/yekuang/infra/go” GOPRIVATE=“” GOPROXY=“off” GOROOT=“/usr/local/google/home/yekuang/golang/go” GOSUMDB=“sum.golang.org” GOTMPDIR=“” GOTOOLDIR=“/usr/local/google/home/yekuang/golang/go/pkg/tool/linux_amd64” GCCGO=“gccgo” AR=“ar” CC=“gcc” CXX=“g++” CGO_ENABLED=“1” GOMOD=“” CGO_CFLAGS=“-g -O2” CGO_CPPFLAGS=“” CGO_CXXFLAGS=“-g -O2” CGO_FFLAGS=“-g -O2” CGO_LDFLAGS=“-g -O2” PKG_CONFIG=“pkg-config” GOGCCFLAGS=“-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build699870689=/tmp/go-build -gno-record-gcc-switches”
What did you do?
We have compressed the same data using zlib.NewWriterLevel(writer, zlib.BestSpeed)
, and found that the output between go1.15
and go1.14.2
were different. This difference has eventually led to a data corruption when we uploaded the compressed file to GCS.
Here’s the minimal reproducible example I have:
package main
import (
"bufio"
"compress/zlib"
"flag"
"fmt"
"io"
"os"
"path/filepath"
)
func main() {
var filename string
flag.StringVar(&filename, "f", "", "filename")
flag.Parse()
fi, err := os.Open(filename)
if err != nil {
panic(err)
}
defer fi.Close()
outname := filepath.Base(filename) + "-gzip"
fo, err := os.Create(outname)
if err != nil {
panic(err)
}
defer fo.Close()
fmt.Printf("%s -> %s\n", filename, outname)
const outBufSize = 1024 * 1024
foWr := bufio.NewWriterSize(fo, outBufSize)
compressor, err := zlib.NewWriterLevel(foWr, zlib.BestSpeed)
buf := make([]byte, outBufSize*3)
if _, err := io.CopyBuffer(compressor, fi, buf); err != nil {
compressor.Close()
panic(err)
}
// compressor needs to be closed first to flush the rest of the data
// into the bufio.Writer
if err := compressor.Close(); err != nil {
panic(err)
}
if err := foWr.Flush(); err != nil {
panic(err)
}
}
The input data was too big to be shared (2.5G). But I can share the data internally (FYI, my LDAP is yekuang@).
What did you expect to see?
No difference in the compressed data between go1.14.2
and go.1.15
.
What did you see instead?
Compressed data were different.
Size of the compressed data:
go1.14.2
: 728571269go1.15
: 728570333
I also did a cmp
, and they started to differ at byte 363266597.
Let me know if you need more information, thanks!
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 39 (26 by maintainers)
Commits related to this issue
- compress/flate: Fix corrupted output Since matches are allowed to be up to and including at maxMatchOffset we must offset the buffer by an additional element to prevent the first 4 bytes to match aft... — committed to klauspost/go by klauspost 4 years ago
- compress/flate: Fix corrupted output Since matches are allowed to be up to and including at maxMatchOffset we must offset the buffer by an additional element to prevent the first 4 bytes to match aft... — committed to klauspost/go by klauspost 4 years ago
- [release-branch.go1.15] compress/flate: fix corrupted output The fastest compression mode can pick up a false match for every 2GB of input data resulting in incorrectly decompressed data. Since matc... — committed to golang/go by klauspost 4 years ago
It is unclear to me whether this issue is about the compressed output simply being different or that the compressed output is actually invalid (i.e., cannot be decompressed). Can you please clarify?
@egonelbre No, that cannot be correct. The value may not matter, that will only fix 16383 out of 16384 cases, but will still result in a false hit when the hash(0)&16383 matches the index.
The offset must be enough to invalidate it completely. I will see if I can figure out what is causing the false match.
@klauspost should lines https://github.com/golang/go/blob/master/src/compress/flate/deflatefast.go#L296 read:
At least this seems to fix the tests for the smaller bufferReset value.
If the output is valid (e.g., can be decompressed), then this is working as expected. The compression libraries make no guarantees that the output remains stable for all time. While that guarantee can be useful in some contexts, if unfortunately means that we can never make changes to the compression algorithm either to improve the speed or to improve the compression ratio, both which are properties that are generally considered more important than stability.
And maybe CLs for https://github.com/golang/go/issues/34121 is related