containerd: including git attributes in "vendor" makes archive checksum change

Description

Due to including git attributes, the archive gets different checksums:

vendor/k8s.io/client-go/pkg/version/.gitattributes

This is because the amount of “significant digits” varies, in the git rev.

vendor/k8s.io/client-go/pkg/version/base.go

Steps to reproduce the issue

  1. https://github.com/containerd/containerd/archive/refs/tags/v1.5.8.tar.gz

Describe the results you received and expected

ERROR: v1.5.8.tar.gz has wrong sha256 hash:
ERROR: expected: a41ab8d39393c9456941b477c33bb1b221a29b635f1c9a99523aab2f5e74f790
ERROR: got     : 0890f7b0ee8e20a279a617c60686874b3c7a99e064adb2b38d884499b5284c43
ERROR: Incomplete download, or man-in-the-middle (MITM) attack

What version of containerd are you using?

v1.5.8

Any other relevant information

No response

Show configuration if it is related to CRI plugin.

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 40 (23 by maintainers)

Most upvoted comments

Resolving. The main branch doesn’t have the issue anymore and we wouldn’t upgrade Kubernetes dependencies in 1.6 and/or 1.5.

Having git commits and build dates in source code and in binary releases is mostly useless, except for causing confusion.

The commit is not really needed, when versions are tagged. As seen by having a commit from the wrong git repository ?

And the build date makes it hard to do reproducible builds. It is also frequently wrong, making go binaries live in the 70’s.

To be useful, there would need to make some kind of make dist and distfiles - instead of exporting git archives on GitHub.

After docker upgrading containerd to 1.5.10, this started happening again.

diff -ur containerd-1.5.10.orig/vendor/k8s.io/client-go/pkg/version/base.go containerd-1.5.10/vendor/k8s.io/client-go/pkg/version/base.go
--- containerd-1.5.10.orig/vendor/k8s.io/client-go/pkg/version/base.go	2022-03-02 19:35:48.000000000 +0100
+++ containerd-1.5.10/vendor/k8s.io/client-go/pkg/version/base.go	2022-03-02 19:35:48.000000000 +0100
@@ -55,7 +55,7 @@
 	// NOTE: The $Format strings are replaced during 'git archive' thanks to the
 	// companion .gitattributes file containing 'export-subst' in this same
 	// directory.  See also https://git-scm.com/docs/gitattributes
-	gitVersion   string = "v0.0.0-master+2a1d4dbdb2a"
+	gitVersion   string = "v0.0.0-master+2a1d4dbdb2"
 	gitCommit    string = "2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc" // sha1 from git, output of $(git rev-parse HEAD)
 	gitTreeState string = ""            // state of git tree, either "clean" or "dirty"
 

Note that the historic releases mentioned above (1.5.8, 1.5.9) also broke…

0890f7b0ee8e20a279a617c60686874b3c7a99e064adb2b38d884499b5284c43  containerd-1.5.8.tar.gz
1c8a6ecced95219af79425d3b6c0b540369c387778d227f610700bd003d3477c  containerd-1.5.9.tar.gz

So only “fixed” for containerd 1.6.

@thaJeztah @afbjorklund @jonyhy96 thanks for helping nail that down! I’ve created https://github.com/kubernetes/publishing-bot/pull/285 in Kubernetes to stop including .gitattributes files in k8s libraries. That should solve the issue for the future.

It was calculated the same way, just some time ago (the contents vary, over time)

This is because the length of the git hash varies, due to random factors on GitHub.

It might be 1e5ef943e today, and then it could be 1e5ef943eb tomorrow ?

The “long” hash remains at: 1e5ef943eb76627a6d3b6de8cd1ef6537f393a71


Ps, for wget the flag is called --content-disposition.

Saving to: ‘containerd-1.5.8.tar.gz’

The workarounds only last for “so long”, until the number of signficants digits in the commit changes again:

They also flip back and forth, depending on which server the GitHub workloads ends up on running on, etc.

Which lessens the confidence in having checksums in the first place


Minikube sorta made it worse by using the wrong file name (forgot the -J option to curl, or a make option)

And by not stating clearly that it was “computed locally”, like our OS upstream so carefully did (and we ignored)

So a much better checksum file looks like: (it wasn’t used because of the older version, 1.4.4 and not 1.5.8)

https://github.com/buildroot/buildroot/blob/2021.02.4/package/docker-containerd/docker-containerd.hash

I wonder where does sha256 hash a41ab8d39393c9456941b477c33bb1b221a29b635f1c9a99523aab2f5e74f790 from ? checksum of containerd’s source code is not contains in it’s release page, did i miss something?

When upstream doesn’t publish the checksums of a tarball, it is normally computed locally at the time of import.

This also goes if upstream uses a different checksum algorithm, like if you want sha512 but it only has sha256

But ultimately, it’s even signed.

Note that it is not the checksum of the source code, that would be contained in the git commit itself (via tree etc)

It is the checksum after first doing dist transformations, and then applying compression (maybe another timestamp)

Debian uses “pristine-tar” for this.

did i miss something

https://git-scm.com/docs/gitattributes#_export_subst

https://github.com/containerd/containerd/blob/v1.5.8/vendor/k8s.io/client-go/pkg/version/base.go#L59

	// NOTE: The $Format strings are replaced during 'git archive' thanks to the
	// companion .gitattributes file containing 'export-subst' in this same
	// directory.  See also https://git-scm.com/docs/gitattributes
	gitVersion   string = "v0.0.0-master+$Format:%h$"
	gitCommit    string = "$Format:%H$" // sha1 from git, output of $(git rev-parse HEAD)
	gitTreeState string = ""            // state of git tree, either "clean" or "dirty"

This will potentially change the output, every time that GitHub does a “git archive” for you

The alternative would be to generate and attach a static tarball, which is not really practical (and wasteful)

The killer here is using the “short” hash.