kubernetes: kubeadm 1.6.0 (only 1.6.0!!) is broken due to unconfigured CNI making kubelet NotReady

Initial report in https://github.com/kubernetes/kubeadm/issues/212.

I suspect that this was introduced in https://github.com/kubernetes/kubernetes/pull/43474.

What is going on (all on single master):

kubeadm starts configures a kubelet and uses static pods to configure a control plane
kubeadm creates node object and waits for kubelet to join and be ready
kubelet is never ready and so kubeadm waits forever

In the conditions list for the node:

  Ready 		False 	Wed, 29 Mar 2017 15:54:04 +0000 	Wed, 29 Mar 2017 15:32:33 +0000 	KubeletNotReady 		runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Previous behavior was for the kubelet to join the cluster even with unconfigured CNI. The user will then typically run a DaemonSet with host networking to bootstrap CNI on all nodes. The fact that the node never joins means that, fundamentally, DaemonSets cannot be used to bootstrap CNI.

Edit by @mikedanese: please test patched debian amd64 kubeadm https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290616036 with fix

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 5
Comments: 211 (116 by maintainers)

Commits related to this issue

WIP: Initial support for 1.6 1.6 final is out, but there's still issues with kubeadm (https://github.com/kubernetes/kubernetes/issues/43815), this has a patched version just for testing. Still to do... — committed to kensimon/aws-quickstart by deleted user 7 years ago
WIP: Initial support for 1.6 1.6 final is out, but there's still issues with kubeadm (https://github.com/kubernetes/kubernetes/issues/43815), this has a patched version just for testing. Still to do... — committed to kensimon/aws-quickstart by deleted user 7 years ago
Merge pull request #43835 from mikedanese/kubeadm-fix Automatic merge from submit-queue don't wait for first kubelet to be ready and drop dummy deploy Per https://github.com/kubernetes/kubernetes/i... — committed to kubernetes/kubernetes by deleted user 7 years ago
Merge pull request #43837 from mikedanese/automated-cherry-pick-of-#43835-release-1.6 Automatic merge from submit-queue Automated cherry pick of #43835 release 1.6 Automated cherry pick of #43835 r... — committed to kubernetes/kubernetes by deleted user 7 years ago
vagrant: update to patched version of kubeadm 1.6.0 Fix for: https://github.com/kubernetes/kubernetes/issues/43815 from: https://github.com/kensimon/aws-quickstart/commit/9ae07f8d9de29c6cbca4624a61e7... — committed to obnoxxx/gluster-kubernetes by obnoxxx 7 years ago
vagrant: update to patched version of kubeadm 1.6.0 Fix for: https://github.com/kubernetes/kubernetes/issues/43815 from: https://github.com/kensimon/aws-quickstart/commit/9ae07f8d9de29c6cbca4624a61e7... — committed to obnoxxx/gluster-kubernetes by obnoxxx 7 years ago
ATTEMPT: vagrant: update to patched version of kubeadm 1.6.0 Contains fix for: https://github.com/kubernetes/kubernetes/issues/43815 Signed-off-by: Michael Adam <obnox@redhat.com> — committed to obnoxxx/gluster-kubernetes by obnoxxx 7 years ago
Merge pull request #43837 from mikedanese/automated-cherry-pick-of-#43835-release-1.6 Automatic merge from submit-queue Automated cherry pick of #43835 release 1.6 Automated cherry pick of #43835 r... — committed to mintzhao/kubernetes by deleted user 7 years ago

Most upvoted comments

I’m trying to install kubernetes with kubeadm on Ubuntu 16.04. Is there a quick fix for this?

+28

luhkevin on Mar 29, 2017

this is what i did

kubeadm reset

remove ENV entries from:

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

reload systemd and kube services

systemctl daemon-reload systemctl restart kubelet.service

re-run init

kubeadm init

+23

jp557198 on Mar 29, 2017

“broken out of the box”, words learned today.

+22

Dieken on Mar 30, 2017

I am really surprised that kubernetes development community has not provided any ETA for an official fix. I mean this is a horrible bug which should be easily get caught during the code testing. Since it has not, at the very least, 1.6.1 should be pushed asap with the fix so people would stop hacking their clusters and start doing productive things 😉. Am I wrong here?

+14

sbezverk on Mar 30, 2017

Any chance to build the same fix for Centos as well? Our gating system mostly uses centos for kubernetes cluster base. If I have centos version I can guarantee approx. 100 runs of kubeadm init a day as testing.

+13

sbezverk on Mar 31, 2017

@apsinha Are you aware of this thread? It might be good to have some product folks following, as I think there will be some important takeaways for the future.

Off the top of my head:

1.6.0 never went through automated testing: https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290809481
Binaries/packages for older versions were removed, so people cannot roll back safely; broke any installation automation that was assuming they continued to be available: https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290861903
No public announcement (that I’ve seen) of the fact that this is broken
No timeline given about a fix (I’m a developer, so I know how obnoxious being asked “when will it be fixed?” is, but nevertheless, people will ask this, so it’s good to at least offer an estimated time period)
Users continuing to join the conversation confused about the status of things, workarounds, and fix schedule
Lots of technical discussion among contributors, many of which are about long term strategy, combined in the same thread as users trying to figure out what’s going on and how to deal with the immediate problem

No disrespect intended to all the great people that make Kubernetes what it is. I’m just hoping there are some “teachable moments” here moving forward, as this looks bad in terms of the public perception of Kubernetes being reliable/stable. (Granted kubeadm is alpha/beta, but it’s still got lots of visibility.)

+12

jimmycuadra on Apr 3, 2017

@overip you need to edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS

remove $KUBELET_NETWORK_ARGS

and then restart kubelet after that kubeadm init should work.

+11

sbezverk on Mar 29, 2017

I successfully setup my Kubernetes cluster on centos-release-7-3.1611.el7.centos.x86_64 by taking the following steps (I assume Docker is already installed):

(from /etc/yum.repo.d/kubernetes.repo) baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64-unstable => To use the unstable repository for the latest Kubernetes 1.6.1
yum install -y kubelet kubeadm kubectl kubernetes-cni
(/etc/systemd/system/kubelet.service.d/10-kubeadm.conf) add “–cgroup-driver=systemd” at the end of the last line. => This is because Docker uses systemd for cgroup-driver while kubelet uses cgroupfs for cgroup-driver.
systemctl enable kubelet && systemctl start kubelet
kubeadm init --pod-network-cidr 10.244.0.0/16 => If you used to add --api-advertise-addresses, you need to use --apiserver-advertise-address instead.
cp /etc/kubernetes/admin.conf $HOME/ sudo chown $(id -u)😒(id -g) $HOME/admin.conf export KUBECONFIG=$HOME/admin.conf => Without this step, you might get an error with kubectl get => I didn’t do it with 1.5.2
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml => 1.6.0 introduces a role-based access control so you should add a ClusterRole and a ClusterRoleBinding before creating a Flannel daemonset
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml => Create a Flannel daemonset
(on every slave node) kubeadm join --token (your token) (ip):(port) => as shown in the result of kubeadm init

All the above steps are a result of combining suggestions from various issues around Kubernetes-1.6.0, especially kubeadm.

Hope this will save your time.

eastcirclek on Apr 4, 2017

Do we have anytime line as to when this fixed will be ported to the CentOS repository ?

junsaw on Mar 30, 2017

+1 to resolve it today as lots of efforts are wasted on dealing with collateral from the workaround.

sbezverk on Mar 30, 2017

@srzjulio you need to update RBAC rules, we used these to get us going:

apiVersion: rbac.authorization.k8s.io/v1alpha1 kind: ClusterRoleBinding metadata: name: cluster-admin roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects:

kind: Group name: system:masters
kind: Group name: system:authenticated
kind: Group name: system:unauthenticated

sbezverk on Apr 6, 2017

This doesn’t seem to be resolved (Ubuntu LTS, kubeadm 1.6.1).

First, I also experienced kubeadm hanging on “Created API client, waiting for the control plane to become ready” when using --apiserver-advertise-address flag… The journal logs say:

let.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 04 12:27:04 xxx kubelet[57592]: E0404 12:27:04.352780   57592 eviction_manager.go:214] eviction manager: unexpected err: failed GetNode: node 'xxx' not found
Apr 04 12:27:05 xxx kubelet[57592]: E0404 12:27:05.326799   57592 kubelet_node_status.go:101] Unable to register node "xxx" with API server: Post https://x.x.x.x:6443/api/v1/nodes: dial tcp x.x.x.x:6443: i/o timeout
Apr 04 12:27:06 xxx kubelet[57592]: E0404 12:27:06.745674   57592 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://x.x.x.x:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dxxx&resourceVersion=0: dial tcp x.x.x.x:6443: i/o timeout
Apr 04 12:27:06 xxx kubelet[57592]: E0404 12:27:06.746759   57592 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:390: Failed to list *v1.Node: Get https://x.x.x.x:6443/api/v1/nodes?fieldSelector=metadata.name%3Dxxx&resourceVersion=0: dial tcp x.x.x.x:6443: i/o timeout
Apr 04 12:27:06 xxx kubelet[57592]: E0404 12:27:06.747749   57592 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:382: Failed to list *v1.Service: Get https://x.x.x.x:6443/api/v1/services?resourceVersion=0: dial tcp x.x.x.x:6443: i/o timeou

If I don’t provide this flag, kubeadm passes, but even then I get following error for kubelet starting:

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Kubelet refuses to properly start, and I cannot connect to cluster with kubectl in any way

sheerun on Apr 4, 2017

Guys what is the status quo for the fix? is it gonna move to the stable repository anytime soon?

miadabrin on Apr 1, 2017

1.6.1 is out.

mikedanese on Apr 3, 2017

v1.6.1 is in the process of being released. It will be done by EOD.

mikedanese on Apr 3, 2017

Not sure if it is generally useful, but I have an ansible playbook that does all the steps for CentOS 7

https://github.com/sjenning/kubeadm-playbook

YMMV, but it at least documents the process. I also do a few things like switch docker to use json-file logging and overlay storage.

Might be useful as a reference even if you don’t actually run the playbook.

On Tue, Apr 4, 2017 at 12:55 PM, Dave Cowden notifications@github.com wrote:

@bostone https://github.com/bostone thanks. i’ll downgrade to that version to see if i can get a working setup. on my system the latest is a weird 17.03.1.ce version ( evidently the latest greatest)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-291580822, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeJQ7CUA4vxhF3T7nMc9wRu47rbWe-Kks5rsoQmgaJpZM4MtMRe .

sjenning on Apr 4, 2017

I imagine your weave isn’t deploying properly because you are using the pre-1.6 yaml file.

Try “kubectl apply -f https://git.io/weave-kube-1.6”

On Tue, Apr 4, 2017 at 12:24 PM, Bo Stone notifications@github.com wrote:

Ok, I did all the steps from scratch and it seems to be better. Here are the steps that worked for me thus far and I’m running as root.

cat <<EOF > /etc/yum.repos.d/kubernetes.repo

[kubernetes] name=Kubernetes baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg http://yum.kubernetes.io/repos/kubernetes-el7-x86_64enabled=1gpgcheck=1repo_gpgcheck=1gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF

setenforce 0

yum install -y docker kubelet kubeadm kubectl kubernetes-cni

vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Add -cgroup-driver=systemd to 10-kubeadm.conf and save

systemctl enable docker && systemctl start docker

systemctl enable kubelet && systemctl start kubelet

sysctl -w net.bridge.bridge-nf-call-iptables=1

systemctl stop firewalld; systemctl disable firewalld

kubeadm init

cp /etc/kubernetes/admin.conf $HOME/

chown $(id -u)😒(id -g) $HOME/admin.conf

export KUBECONFIG=$HOME/admin.conf

kubectl apply -f https://git.io/weave-kube

At this point I can run kubectl get nodes and see my master node in the list. Repeat all the steps for minion except of course running kubeadm join –token a21234.c7abc2f82e2219fd 12.34.567.89:6443 as generated by kubeadm init This step completes and I can see master and minion(s) nodes

And now - the problem:

kubectl get nodes

NAME STATUS AGE VERSION abc02918 NotReady 42m v1.6.1 abc04576 NotReady 29m v1.6.1

kubectl describe node abc02918

Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message

43m 43m 1 kubelet, sdl02918 Normal Starting Starting kubelet. 43m 43m 1 kubelet, sdl02918 Warning ImageGCFailed unable to find data for container / 43m 43m 29 kubelet, sdl02918 Normal NodeHasSufficientDisk Node sdl02918 status is now: NodeHasSufficientDisk 43m 43m 29 kubelet, sdl02918 Normal NodeHasSufficientMemory Node sdl02918 status is now: NodeHasSufficientMemory 43m 43m 29 kubelet, sdl02918 Normal NodeHasNoDiskPressure Node sdl02918 status is now: NodeHasNoDiskPressure 42m 42m 1 kube-proxy, sdl02918 Normal Starting Starting kube-proxy.

So looks like the nodes are never become ready. Any suggestions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-291571437, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeJQ6OFBV3s6OHmfOkUdwqQsJ1sjg23ks5rsnzMgaJpZM4MtMRe .

sjenning on Apr 4, 2017

On a side node, you can put back the $KUBELET_NETWORK_ARGS, after the init on the master passes. I actually did not remove it on the machine I joined, only the cgroup-driver, otherwise kubelet and docker won’t work together.

But you don’t have to kubeadm reset, just change /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and do the systemctl dance:

systemctl daemon-reload systemctl restart kubelet.service

coeki on Mar 29, 2017

we worked around it by removing KUBELET_NETWORK_ARGS from kubelet command line. after that kubeadm init worked fine and we were able to install canal cni plugin.

sbezverk on Mar 29, 2017

@jbeda if you have a patched version happy to test it…

stevenbower on Mar 29, 2017

Be careful – The binding that @sbezverk has there is essentially turning off RBAC. You will have a super insecure cluster if you do that.

jbeda on Apr 6, 2017

Here’s what is seemingly working for me with the unstable repo (only tested the master itself):

"sudo apt-get update && sudo apt-get install -y apt-transport-https",
"echo 'deb http://apt.kubernetes.io/ kubernetes-xenial-unstable main' | sudo tee /etc/apt/sources.list.d/kubernetes.list",
"curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -",
"sudo apt-get update",
"sudo apt-get install -y docker.io",
"sudo apt-get install -y kubelet kubeadm kubectl kubernetes-cni",
"sudo service kubelet stop",
"sudo service kubelet start",
"sudo kubeadm init",
"sudo cp /etc/kubernetes/admin.conf $HOME/",
"sudo chown $(id -u):$(id -g) $HOME/admin.conf",
"export KUBECONFIG=$HOME/admin.conf",
"kubectl taint nodes --all dedicated-",
"kubectl apply -f https://github.com/weaveworks/weave/releases/download/latest_release/weave-daemonset-k8s-1.6.yaml",

This does spit out error: taint "dedicated:" not found at one point, but it seems to carry on regardless.

thenayr on Mar 31, 2017

For anyone still trying the temporary fix by removing the kubelet KUBELET_NETWORK_ARGS config line, @jc1arke found a simpler workaround - have two sessions to the new master and, whilst waiting for the first node to become ready, apply a node-network config in the second session: First session after running kubeadmin init:

...
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 24.820861 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
...

Second session (using Calico. Your choice may of course vary):

root@kube-test-master-tantk:~# kubectl apply -f http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/hosted/kubeadm/calico.yaml
configmap "calico-config" created
daemonset "calico-etcd" created
service "calico-etcd" created
daemonset "calico-node" created
deployment "calico-policy-controller" created
job "configure-calico" created
root@kube-test-master-tantk:~#

Back to first session:

...
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node is ready after 118.912515 seconds
[apiclient] Test deployment succeeded
[token] Using token: <to.ken>
[apiconfig] Created RBAC rules
...

zatricky on Mar 31, 2017

I suggest that we drop both the node ready and the dummy deployment check altogether for 1.6 and move them to a validation phase for 1.7.

On Mar 29, 2017 10:13 AM, “Dan Williams” notifications@github.com wrote:

@jbeda https://github.com/jbeda yeah, looks like the DaemonSet controller will still enqueue them mainly because it’s completely ignorant of network-iness. We should really fix this more generally. Is there anything immediate to do in kube or is it all in kubeadm for now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290158416, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtFIUw8GIJVfHrecB3qpTLT8Q4AyLVjks5rqpFKgaJpZM4MtMRe .

mikedanese on Mar 30, 2017

All correct, and while we’re at it

If you see this: kubelet: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: “cgroupfs” is different from docker cgroup driver: “systemd”

you have to edit your /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and add the flag --cgroup-driver=“systemd”

and do as above

kuebadm reset systemctl daemon-reload systemctl restart kubelet.service kubeadm init.

coeki on Mar 29, 2017

TL;DR;

The error message

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

is NOT necessarily bad.

That error message tells you that you have to plugin in a third party CNI spec implementation provider.

What is CNI and how does integrate with Kubernetes?

CNI stands for Container Network Interface and defines a specification that kubelet uses for creating a network for the cluster. See this page for more information how Kubernetes uses the CNI spec to create a network for the cluster.

Kubernetes doesn’t care how the network is created as long as it satisfies the CNI spec.

kubelet is in charge of connecting new Pods to the network (can be an overlay network for instance). kubelet reads a configuration directory (often /etc/cni/net.d) for CNI networks to use. When a new Pod is created, the kubelet reads files in the configuration directory, exec’s out to the CNI binary specified in the config file (the binary is often in /opt/cni/bin). The binary that will be executed belongs to and is installed by a third-party (like Weave, Flannel, Calico, etc.).

kubeadm is a generic tool to spin up Kubernetes clusters and does not know what networking solution you want and doesn’t favor anyone specific. After kubeadm init is run, there is no such CNI binary nor configuration. This means the kubeadm init IS NOT ENOUGH to get a fully working cluster up and running.

This means, that after kubeadm init, the kubelet logs will say

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

this is very much expected. If this wasn’t the case, we would have favored a specific network provider.

So how do I “fix” this error? The next step in the kubeadm getting started guide is “Installing a Pod network”. This means, kubectl apply a manifest from your preferred CNI network provider.

The DaemonSet will copy out the CNI binaries needed to /opt/cni/bin and the needed configuration to /etc/cni/net.d/. Also it will run the actual daemon that sets up the network between the Nodes (by writing iptables rules for instance).

After the CNI provider is installed, the kubelet will notice that “oh I have some information how to set up the network”, and will use the 3rd-party configuration and binaries.

And when the network is set up by the 3rd-party provider (by kubelet invoking it), the Node will mark itself Ready.

How is this issue related to kubeadm?

Late in the v1.6 cycle, a PR was merged that changed the way kubelet reported the Ready/NotReady status. In earlier releases, kubelet had always reported Ready status, regardless of whether the CNI network was set up or not. This was actually kind of wrong, and changed to respect the CNI network status. That is, NotReady when CNI was uninitialized and Ready when initialized.

kubeadm in v1.6.0 waited wrongly for the master node to be in the Ready state before proceeding with the rest of the kubeadm init tasks. When the kubelet behavior changed to report NotReady when CNI was uninitialized, kubeadm would wait forever for the Node to get Ready.

THAT WAIT MISCONCEPTION ON THE KUBEADM SIDE IS WHAT THIS ISSUE IS ABOUT

However, we quickly fixed the regression in v1.6.1 and released it some days after v1.6.0.

Please read the retrospective for more information about this, and why v1.6.0 could be released with this flaw.

So, what do you do if you think you see this issue in kubeadm v1.6.1+?

Well, I really think you don’t. This issue is about when kubeadm init is deadlocking. No users or maintainers have seen that in v1.6.1+.

What you WILL see though is

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

after every kubeadm init in all versions above v1.6, but that IS NOT BAD

Anyway, please open a new issue if you see something unexpected with kubeadm

Please do not comment more on this issue. Instead open a new one.

@billmilligan So you only have to kubectl apply a CNI provider’s manifest to get your cluster up and running I think

I’m pretty much summarizing what has been said above, but hopefully in a more clear and detailed way. If you have questions about how CNI work, please refer to the normal support channels like StackOverflow, an issue or Slack.

(Lastly, sorry for that much bold text, but I felt like it was needed to get people’s attention.)

luxas on Jul 24, 2017

@drajen No, this affected only v1.6.0. It’s expected that kubelet doesn’t find a network since you haven’t installed any. For example, just run

kubectl apply -f https://git.io/weave-kube-1.6

to install Weave Net and those problems will go away. You can choose to install Flannel, Calico, Canal or whatever CNI network if you’d like

luxas on Jun 13, 2017

Can anybody tell me how to build a patched version of kubeadm for rhel (rpm) ?

discostur on Apr 3, 2017

@coeki I’d also add a request for N-1 versions to be kept in rpm/deb repos. All major releases eventually end up with a problem or two. Operators have long avoided N.0 releases for production for that very reason. It works well if previous versions are left around for a while. But this time 1.5.x was removed entirely before 1.6 was made stable. That puts operators that weren’t very prepared (local repo mirroring, etc) from making forward progress while the issue is sorted out. The pain of a bumpy N+1 release can often be dealt with by simply keeping N around for a while.

kfox1111 on Mar 31, 2017

@mikedanese Do you have any plan to update centos yum repo ? Or is it already deployed to yum repo ?

bhtak on Mar 31, 2017

Thanks to @luxas for wrestling my particular problem to the ground: https://github.com/kubernetes/kubeadm/issues/302

datamattsson on Jun 13, 2017

Yeah, it is might be non-obvious and we’re sorry for that, but we can’t have one single providers name there either.

Chatted with @drajen on Slack and the issue was cgroup related, the kubelet was unhealthy and wasn’t able to create any Pods, hence the issue.

luxas on Jun 13, 2017

@bostone thanks. i’ll downgrade to that version to see if i can get a working setup. on my system the latest is a weird 17.03.1.ce version ( evidently the latest greatest)

dcowden on Apr 4, 2017

Ok, I did all the steps from scratch and it seems to be better. Here are the steps that worked for me thus far and I’m running as root on CentOS 7.

# cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
# setenforce 0
# yum install -y docker kubelet kubeadm kubectl kubernetes-cni
# vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Add -cgroup-driver=systemd to 10-kubeadm.conf and save

# systemctl enable docker && systemctl start docker
# systemctl enable kubelet && systemctl start kubelet
# sysctl -w net.bridge.bridge-nf-call-iptables=1
# systemctl stop firewalld; systemctl disable firewalld
# kubeadm init
# cp /etc/kubernetes/admin.conf $HOME/
# chown $(id -u):$(id -g) $HOME/admin.conf
# export KUBECONFIG=$HOME/admin.conf
# kubectl apply -f https://git.io/weave-kube

At this point I can run kubectl get nodes and see my master node in the list. Repeat all the steps for minion except kubeadm init and instead running kubeadm join --token a21234.c7abc2f82e2219fd 12.34.567.89:6443 as generated by kubeadm init This step completes and I can see master and minion(s) nodes

And now - the problem:

# kubectl get nodes
NAME       STATUS     AGE       VERSION
abc02918   NotReady   42m       v1.6.1
abc04576   NotReady   29m       v1.6.1

# kubectl describe node abc02918
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  43m		43m		1	kubelet, sdl02918			Normal		Starting		Starting kubelet.
  43m		43m		1	kubelet, sdl02918			Warning		ImageGCFailed		unable to find data for container /
  43m		43m		29	kubelet, sdl02918			Normal		NodeHasSufficientDisk	Node sdl02918 status is now: NodeHasSufficientDisk
  43m		43m		29	kubelet, sdl02918			Normal		NodeHasSufficientMemory	Node sdl02918 status is now: NodeHasSufficientMemory
  43m		43m		29	kubelet, sdl02918			Normal		NodeHasNoDiskPressure	Node sdl02918 status is now: NodeHasNoDiskPressure
  42m		42m		1	kube-proxy, sdl02918			Normal		Starting		Starting kube-proxy.

So looks like the nodes are never become ready. Any suggestions?

bostone on Apr 4, 2017

@bostone maybe you’re missing these steps after kubeadm init?

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

You also need to follow step 3 described here. That seems related to the cni config error you’re getting.

gtirloni on Apr 4, 2017

@gtirloni with our suggestion I got to the end of kubeadm init however any attempt to run kubectl produces this error: The connection to the server localhost:8080 was refused - did you specify the right host or port? I’m not sure where and how change that or what is the right port at this point?

bostone on Apr 4, 2017

@bostone you need to adjus the .spec here.

gtirloni on Apr 3, 2017

@obnoxxx try the tip of the release-1.6 branch.

$ gsutil ls gs://kubernetes-release-dev/ci/v1.6.1-beta.0.12+018a96913f57f9

https://storage.googleapis.com/kubernetes-release-dev/ci/v1.6.1-beta.0.12+018a96913f57f9/bin/linux/amd64/kubeadm

mikedanese on Apr 1, 2017

PLEASE TEST THE PATCHED DEBS

The kubernetes-xenial-unstable now has a patched build 1.6.1-beta.0.5+d8a384c1c5e35d-00 that @pipejakob and I have been testing today. The nodes remain not ready until a pod network is created (e.g. by applying weave/flannel configs). Conformance test pass. PTAL.

# cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial-unstable main
EOF

cc @luxas @jbeda

mikedanese on Mar 31, 2017

So, if I understand correctly, for now, since https://github.com/kubernetes/features/issues/166 is taking some longer time to be able to taint network availability correctly, we have to go with a work around. If we can push a fix ASAP for kubeadm, like #43835, with a comment to fix this with https://github.com/kubernetes/features/issues/166, a lot ppl are going to be happy.

coeki on Mar 30, 2017

I can’t believe nobody really tried kubeadm 1.6.0 before 1.6.0 was released…

And, kubelet 1.5.6 + kubeadm 1.5.6 are also broken, /etc/systemd/system/kubelet.service.d/10-kubeadm.conf references /etc/kubernetes/pki/ca.crt but kubeadm doesn’t generate ca.crt, there is ca.pem although.

Currently 1.6.0 and 1.5.6 are the only left releases in k8s apt repository…

Dieken on Mar 30, 2017

It looks like DaemonSets will still get scheduled even if the node is not ready. This is really, in this case, kubeadm being a little too paranoid.

The current plan that we are going to test out is to have kubeadm no longer wait for the master node to be ready but instead just have it be registered. This should be good enough to let a CNI DaemonSet be scheduled to set up CNI.

@kensimon is testing this out.

jbeda on Mar 29, 2017