go: proposal: cmd/go: use the `go` version declared in the `go.mod` file to determine module boundaries and checksums
Summary
- Use the
goversion declared in thego.modfile to determine the boundaries of the module’s source code. - Only store and verify checksums for the source code of modules that were extracted with known-good boundaries.
- Continue to compute, store, and verify a checksum for every
go.modloaded during a build, regardless of its version.
Background
In the fix for #27093, we changed the module loader to drop symlinks in repositories when converting them to modules.
That changed the contents of some modules, and therefore their hashes, and rendered the contents of some existing go.sum files invalid (#29278). In retrospect, that was a mistake: we should never give users a reason to delete or otherwise mistrust their go.sum files, because that undermines the very purpose of go.sum files: a checksum mismatch should be treated as a potential security threat, not just a bug in the go tool.
At some point, we will probably find another bug in module extraction, or decide to make a change in how we compute module boundaries (such as ignoring go.mod files in testdata directories for #27852, or pruning out vendor directories for #30240). If and when we do, we should be careful not to break existing go.sum files.
This proposal attempts to build on #28221 to provide a safe means to make such changes.
Detail
Just as the go version determines the semantics in effect for the compiler, it should also determine the semantics of the module loader. A given release of the go tool may understand how to load arbitrarily many versions, and patch releases for older versions may even support newer versions.
If the go version used to extract the module does not support the go version declared by that module, fetch the module according to the closest supported version instead. If we have an existing checksum for the module and it does not match, fail with an “unsupported go version” warning. If we do not have an existing checksum, mark the module as provisional and do not record the new checksum (per #28835).
- If adopted, we should also backport this behavior to the next patch releases of Go 1.11 and 1.12: we want to avoid ever storing another bad checksum.
- Alternately, we could change the checksum prefix from
h1:toh2:— even though the checksum algorithm itself doesn’t need to change — to indicate the reliability of that checksum. (Anh1checksum might indicate a correctly-computed sum for an incorrectly-extracted module.)
However, do continue to record and verify checksums for all go.mod files regardless of the go version in use.
- This detail is important, because it allows us to trust the
goversion declared in thatgo.modfile: otherwise, an attacker could inject a module using a known-unsupportedgoversion in order to disable source verification.
In the .Info files served by module proxies, include both the version of the go tool used to extract the module, and the go language version actually selected by that tool. For example, if the cmd/go binary from go 1.15.2 only supports the semantics of go 1.13 and above, and is used to extract a module that declares go 1.12, the .info file would indicate:
GoTool: "1.15.2",
GoVersion: "1.13",
This allows module proxies to serve up-to-date checksums even for older or newer clients: if the proxy indicates that the module was extracted using an appropriate go version, then the client can still verify that the zip file matches the recorded checksum, and can still add the checksum to its go.mod file — even though it cannot reproduce that zip file by re-extracting that module from the origin.
Edits:
- Refined the validation logic per https://github.com/golang/go/issues/30369#issuecomment-467091575.
(CC @rsc @jayconrod @FiloSottile @hyangah @heschik @katiehockman)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 17 (17 by maintainers)
@marwan-at-work, proxies should forever use
go mod download. The reason it exists is precisely so all proxies and other downloaders agree on the bits and don’t have to reimplement all the different version control mechanisms themselves, or even have stale libraries linked in.@kardianos Part of the point of this proposal is that older clients won’t need to upgrade, as long as they’re getting their modules from an up-to-date module proxy. If the hash function hasn’t changed and the zipfile format hasn’t changed, why should the client consuming that zipfile and hashing it with that function need to change?
But perhaps we could adjust the behavior a bit. Perhaps we should only fail to record checksums for newer versions, but still verify them: we could emit the “you need to upgrade” warning instead of “checksum mismatch” only if the checksum fails and the required version is newer (as in #28221).
Then, the cases would be:
goversion, verify that the zipfile matches the checksum and record the checksum, even if the client does not support thatgoversion.goversion but the proxy does not, fail with an explicit upgrade warning (or fall back to another proxy or fetch from the origin if so configured).goversion, but thego.sumfile has a checksum entry for the given module, fetch the module (from the proxy, if so configured) using the closest supported semantics.The interesting case is:
goversion, and the client does not have a checksum entry for the module, what should we do?gocommand is upgraded (or whengo mod verifyis run).go.sumfile.On the other hand, it might be nice for older-version clients to be able to record the checksums for probably-incorrectly-extracted versions as well. (If we know that “the copy of module
mwhen extracted with Go 1.12” has checksumX, then it should continue to have checksumXwhen extracted with Go 1.12, even if the copy extracted with Go 1.13 has a different checksum.)However, I would argue that that should be part of the path information stored in the
go.sumfile, not the hash algorithm: any client that understands the algorithm, regardless of version, ought to be able to compute and verify the checksum for a given zipfile.So perhaps we could add some sort of path suffix (like we do today for the
go.sumfile) to allow for multiple different extraction algorithms. I’m just not sure that extension would be worth the extra complexity, given that at least the public module proxies should always have the latest release available to use.@kardianos
Note that the “input” in this case may be a
.zipfile provided by a proxy.If a user with an older
gotoolchain obtains a zipfile from a proxy, and the proxy indicates that the file was extracted using the version from its module’sgo.sumfile, then the oldergocommand ought to be able to verify that the contents match the checksum, even though the older toolchain cannot produce that zipfile itself.If we change the name of the hash function, then that property will not hold: the older
gotoolchain would not know whether it has the correct hash function, so it could not flag a checksum error if the zipfile doesn’t match. That would allow a compromised proxy to serve arbitrary contents to older clients.But the problem is even worse than that. Since the extraction algorithm changes the contents of the zipfile itself, a zipfile extracted using a newer Go version wouldn’t necessarily match the checksum computed by an older version, and the files needed to recompute the correct checksum wouldn’t necessarily even be present in the file (the change to the algorithm may have pruned them out). So we would still need some mechanism for older clients to determine whether to record their own checksum: changing the name of the hash function is still only a partial solution.
@bcmills I think your alternative proposal is spot on:
I think the key insight is a hash is only valid if you both define the hash algorithm and what the input is defined to me. In simple situations, such as a hash of a file or chunk, this is easy, you define the hash to be the sequence of bytes of the file.
But in Go, this hash is used over many file over many folders. The hash version should reflect the input, not just the algorithm.
I could easily see people changing an embedded go version in tools or manually for one reason or another and mostly of the time nothing would break. But then in a corner case (hash change), doing so would break go.sum. Don’t make this a hidden dependency, tie directly to the hash version.