kops: Nodeup can't find container-selinux-2.68-1.el7.noarch.rpm when trying to bootstrap a new node to a cluster

1. What kops version are you running? The command kops version, will display this information. Version 1.13.0

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Version 1.13.0 3. What cloud provider are you using? AWS 4. What commands did you run? What is the simplest way to reproduce this issue? Adding a node to a cluster results in nodeup to look for Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm" which it does not exist anymore due to centos 7.7 release. 5. What happened after the commands executed? kops tries to boostrap the node but nodeup fails due to pointing to a nonexistent package

6. What did you expect to happen? New node bootstrapped and joined to the cluster

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

Sep 17 19:59:07  nodeup: I0917 19:59:07.667801    3560 executor.go:103] Tasks: 40 done / 48 total; 1 can run
Sep 17 19:59:07  nodeup: I0917 19:59:07.667844    3560 executor.go:178] Executing task "Package/docker-ce": Package: docker-ce
Sep 17 19:59:07  nodeup: I0917 19:59:07.667883    3560 package.go:206] Listing installed packages: /usr/bin/rpm -q docker-ce --queryformat %{NAME} %{VERSION}
Sep 17 19:59:07 nodeup: I0917 19:59:07.693153    3560 package.go:267] Installing package "docker-ce" (dependencies: [Package: container-selinux])
Sep 17 19:59:07  nodeup: I0917 19:59:07.747296    3560 files.go:100] Hash matched for "/var/cache/nodeup/packages/docker-ce": sha1:5369602f88406d4fb9159dc1d3fd44e76fb4cab8
Sep 17 19:59:07 nodeup: I0917 19:59:07.747368    3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0
Sep 17 19:59:07  nodeup: I0917 19:59:07.747458    3560 http.go:77] Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm"
Sep 17 19:59:07  nodeup: I0917 19:59:07.891339    3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0
Sep 17 19:59:07  nodeup: W0917 19:59:07.891385    3560 executor.go:130] error running task "Package/docker-ce" (2m20s remaining to succeed): downloaded from "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm"
 but hash did not match expected "sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0"

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 14
  • Comments: 24 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Below is an improved workaround, inspired by previous comments and pull requests. Kops supports arbitrary userdata. The snippet needs to be added to each instance group spec.

spec:
  additionalUserData:
  - content: |
      bootcmd:
        - mkdir -p /var/cache/nodeup/packages
        - curl --proxy http://my.proxy:3128 -o /var/cache/nodeup/packages/container-selinux http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
    name: workaround-container-selinux
    type: text/cloud-config

We are seeing this issue as well.

Looks like this package was removed from centos repo, returning a 404:

wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
--2019-09-17 15:10:16--  http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
Resolving mirror.centos.org (mirror.centos.org)... 23.254.0.226
Connecting to mirror.centos.org (mirror.centos.org)|23.254.0.226|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-09-17 15:10:17 ERROR 404: Not Found.

This causes a major issue when considering autoscaling (cluster-autoscaler) which takes down nodes and new ones never join the cluster.

Ideally, for resiliency Kops should not be resolving artifacts required for nodeup/bootstrapping during node runtime from public repos - not sure if this is the way to go but possibly consider placing such critical rpms/binaries in the state store during init and fetching from there during runtime? Also, if package is already installed (some may choose to bake in their AMI), it should skip trying to fetch this (not sure if this is the current behavior already).

Can the packages be externalised into a yaml/json file that nodeup reads in instead of being compiled into the binary? That would enable people to source the rpm and store it locally (s3, cloud storage, etc).

I’ve opted to save the rpm in S3 and then add it into kops with this in the instance groups:

spec:
  additionalUserData:
  - content: |
      bootcmd:
        - mkdir -p /var/cache/nodeup/packages
        - aws s3 cp s3://<my-s3-bucket>/container-selinux /var/cache/nodeup/packages/container-selinux
    name: workaround-container-selinux
    type: text/cloud-config

Then you just need to sort out the bucket policy and iam privileges for kops to read from the bucket. This is in an AWS environment obviously, I’m sure there are similar approaches for the other cloud platforms.

OK so looks like we’ll be doing 1.13.2 this morning. I’d also really prefer to get away from the OS packaging (towards “tar.gz” installation) as it seems to be introducing more problems than it solves.

For 2.68.1 -> 2.107.3: We try not to make potentially breaking changes once we have released the 1.x.0 of kops. But we do so for security fixes etc. So we can look at getting it into 1.14.0 (which hasn’t quite released yet). But is it a security fix (in which case we would get it into 1.13.0)?

We’re working on getting a 1.13/1.14 cut with these fixes asap.

You’ll either need to build and deploy your own version of kops (including protokube and kubeup), a workaround as suggested above (you can probably utilize a hook to automate it https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md#hooks) or wait for a release which we’re actively working on getting out asap!

Here’s the changelog, looks like there’s not a strict security fix vs feature distinction, so we should probably shouldn’t introduce the new version in kops 1.13:

* Fri Aug 02 2019 Jindrich Novy <jnovy@redhat.com> - 2:2.107-3
- use 2.107 in RHEL7u7
- add build.sh script

* Thu Jul 11 2019 Lokesh Mandvekar <lsm5@redhat.com> - 2:2.107-2
- Resolves: #1626215

* Mon Jun 24 2019 Lokesh Mandvekar <lsm5@redhat.com> - 2:2.107-1
- bump to v2.107

* Tue Apr 23 2019 Lokesh Mandvekar <lsm5@redhat.com> - 2:2.99-1
- built commit b13d03b

* Tue Apr 02 2019 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.95-2
- rebase

* Thu Feb 28 2019 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.84-2
- rebase

* Tue Jan 08 2019 Frantisek Kluknavsky <fkluknav@redhat.com> - 2.77-1
- backported fixes from upstream

* Mon Nov 12 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.76-1
- Allow containers to use fuse file systems by default
- Allow containers to sendto dgram socket of container runtimes
- Needed to run container runtimes in notify socket unit files.

* Fri Oct 19 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.74-1
- Allow containers to setexec themselves

* Tue Sep 18 2018 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.73-3
- tweak macro for fedora - applies to rhel8 as well

* Mon Sep 17 2018 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.73-2
- moved changelog entries:
- Define spc_t as a container_domain, so that container_runtime will transition
to spc_t even when setup with nosuid.
- Allow container_runtimes to setattr on callers fifo_files
- Fix restorecon to not error on missing directory

* Thu Sep 06 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.69-3
- Make sure we pull in the latest selinux-policy

* Wed Jul 25 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.69-2
- Add map support to container-selinux for RHEL 7.5
- Dontudit attempts to write to kernel_sysctl_t

This workaround no longer works. As of today http://mirror.centos.org/centos/7.6.1810/ has been deprecated. This also breaks the fix that went in kops 1.13.1: https://github.com/kubernetes/kops/pull/7609

As a workaround you can use http://vault.centos.org/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm

But really contianer-selinux needsto be updated to http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-3.el7.noarch.rpm along with associated dependencies

Indeed, hooks won’t work. We figured that out the exact same time as @alexinthesky 😂

Then we switched for the Debian AMI to avoid further damage by dying spot instances. kope.io/k8s-1.14-debian-stretch-amd64-hvm-ebs-2019-08-16

Now that #7609 is merged how would I be able to leverage this change? Do I have to wait for a new kops release or how is nodeup released?

@rdjy Thanks for the answer, it did the trick for us.