kind: overlay network cannot be applied when host is behind a proxy

Environment

Host OS: RHEL 7.4 Host Docker version: 18.09.0 Host go version: go1.11.2 Node Image: kindest/node:v1.12.2

kind create cluster

[root@localhost bin]# kind create cluster
Creating cluster 'kind-1' ...
 ✓ Ensuring node image (kindest/node:v1.12.2) 🖼
 ✓ [kind-1-control-plane] Creating node container 📦
 ✓ [kind-1-control-plane] Fixing mounts 🗻
 ✓ [kind-1-control-plane] Starting systemd 🖥
 ✓ [kind-1-control-plane] Waiting for docker to be ready 🐋
 ✗ [kind-1-control-plane] Starting Kubernetes (this may take a minute) ☸
FATA[07:20:43] Failed to create cluster: failed to apply overlay network: exit status 1

Code below in pkg/cluster/context.go is trying to extract k8s version using kubectl version command in order to download the version-specific weave net.yaml. The code is not ok:-

        // TODO(bentheelder): support other overlay networks
        if err = node.Command(
                "/bin/sh", "-c",
                `kubectl apply --kubeconfig=/etc/kubernetes/admin.conf -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version --kubeconfig=/etc/kubernetes/admin.conf | base64 | tr -d '\n')"`,
        ).Run(); err != nil {
                return kubeadmConfig, errors.Wrap(err, "failed to apply overlay network")
        }

Why is the output of kubectl version command, base64 encoded?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 53 (29 by maintainers)

Commits related to this issue

Most upvoted comments

Some updates on this. I have the privilege to work with extremely bright people here, and the problem seems to lie on TLS negotiation (although not 1.3) because our proxy policy hasn’t been updated in a while, and none of the algorithm proposed by the go tls client is supported atm…

We’re working with network and security to update this policy, and I will keep you posted if that solves our problem!

I think this is good now… looking at the logs before sending them, I have noticed that:

I0225 07:49:12.803064     726 checks.go:430] validating if the connectivity type is via proxy or direct
	[WARNING HTTPProxy]: Connection to "https://172.17.0.3" uses proxy "http://127.0.0.1:3129/". If that is not intended, adjust your proxy settings
I0225 07:49:12.803104     726 checks.go:466] validating http connectivity to first IP address in the CIDR
	[WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://127.0.0.1:3129/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration

And so I decided to give it a try by unsetting all my *_proxy env variables and suddently it worked! I can finally enjoy kind on my pro workstation.

Thanks a lot @pablochacin and @BenTheElder !

so our actual podspec is ~ the contents of the pod_spec field in this prowjob (a few things get added for git checkout, environment variables…):

apiVersion: prow.k8s.io/v1
kind: ProwJob
metadata:
  annotations:
    prow.k8s.io/job: ci-kubernetes-kind-conformance
  creationTimestamp: null
  labels:
    created-by-prow: "true"
    preset-bazel-remote-cache-enabled: "true"
    preset-bazel-scratch-dir: "true"
    preset-dind-enabled: "true"
    preset-service-account: "true"
    prow.k8s.io/id: bc7c7a72-2b06-11e9-8fd7-0a580a6c037c
    prow.k8s.io/job: ci-kubernetes-kind-conformance
    prow.k8s.io/type: periodic
  name: f8f7ed86-2b0d-11e9-bfc2-0a580a6c0297
spec:
  agent: kubernetes
  cluster: default
  job: ci-kubernetes-kind-conformance
  namespace: test-pods
  pod_spec:
    containers:
    - args:
      - --job=$(JOB_NAME)
      - --root=/go/src
      - --repo=k8s.io/kubernetes=master
      - --repo=sigs.k8s.io/kind=master
      - --service-account=/etc/service-account/service-account.json
      - --upload=gs://kubernetes-jenkins/logs
      - --scenario=execute
      - --
      - ./../../sigs.k8s.io/kind/hack/ci/e2e.sh
      env:
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /etc/service-account/service-account.json
      - name: E2E_GOOGLE_APPLICATION_CREDENTIALS
        value: /etc/service-account/service-account.json
      - name: TEST_TMPDIR
        value: /bazel-scratch/.cache/bazel
      - name: BAZEL_REMOTE_CACHE_ENABLED
        value: "true"
      - name: DOCKER_IN_DOCKER_ENABLED
        value: "true"
      image: gcr.io/k8s-testimages/kubekins-e2e:v20190205-d83780367-master
      name: ""
      resources:
        requests:
          cpu: "2"
          memory: 9000Mi
      securityContext:
        privileged: true
      volumeMounts:
      - mountPath: /lib/modules
        name: modules
        readOnly: true
      - mountPath: /sys/fs/cgroup
        name: cgroup
      - mountPath: /etc/service-account
        name: service
        readOnly: true
      - mountPath: /bazel-scratch/.cache
        name: bazel-scratch
      - mountPath: /docker-graph
        name: docker-graph
    dnsConfig:
      options:
      - name: ndots
        value: "1"
    volumes:
    - hostPath:
        path: /lib/modules
        type: Directory
      name: modules
    - hostPath:
        path: /sys/fs/cgroup
        type: Directory
      name: cgroup
    - name: service
      secret:
        secretName: service-account
    - emptyDir: {}
      name: bazel-scratch
    - emptyDir: {}
      name: docker-graph
  type: periodic
status:
  startTime: "2019-02-07T19:24:22Z"
  state: triggered

@BenTheElder Thanks for the hint! After investigating more, I have seen that kubelet was constantly being killed with SIGKILL (9). I have checked dstat --top-oom and it showed that whole control plane is constantly being killed by the system.

EDIT: Unfortunately after increasing available resources nothing changed. Control plane keeps getting restarted for no reason. What might be important is that when I am testing kind locally inside a docker container docker run --privileged -it --rm ... sh and I run kind create cluster it works, but when I try to do this inside a kubernetes cluster while exec into pod, same kind create cluster fails with the above error.

so docker itself supports HTTP_PROXY / HTTPS_PROXY https://docs.docker.com/network/proxy/ we could just blindly pass through these values from the host at node creation time…

I opened issue #270 for implementing this.

Huh. I can’t spot anything relevant in there 🤔 the plot thickens 🙃

I think this week I’ll take a stab at pre-loading the CNI images and using a fixed manifest which should help avoid this sort of issue entirely 🤞

On Mon, Jan 14, 2019, 05:25 Matthias Loibl <notifications@github.com wrote:

Pulling the latest master now fixed KinD for me again. I’m not entirely sure what happened. I can’t see any changes related to my problem. I’m on the same machine and the same WiFi as first reported from. Additionally my machine was suspended most of the weekend and I didn’t run any updates during the time (like updating Docker for exmaple) 302bb7d…4a348e0 https://github.com/kubernetes-sigs/kind/compare/302bb7d...4a348e0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kind/issues/136#issuecomment-454003487, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4Bq4RTGijVedj4rKKCU0RBQeSxMPgmks5vDIVIgaJpZM4Yy9kh .

Ah, that’s almost definitely it!

kind does nothing special regarding proxies, the rest of the bringup only works because everything else (besides the overlay network config and its images) is pre-packed into the node image and doesn’t need to go out to the internet.

We can either try to get these packed into the image ahead of time (which is probably quite doable, and possibly desirable, but maybe a little tricky), or we can try to make this step respect proxy information on the host machine.

It looks like http_proxy and HTTPS_PROXY are mostly a convention that curl and a few others happen to follow to varying degrees, we’d probably need to also set the docker daemon on the “nodes” to respect this as well.

Both approaches are probably worth doing. I’ll update this issue to track.

I will test on Monday since I don’t have our corporate proxy at home… thanks for the update!

the next release will contain this fix, but in the meantime it can be installed from the current source 😬

Yes, this works:

$ docker run -ti --rm ubuntu bash
root@29cd8a005505:/# export http_proxy=http://172.17.0.1:3129/
root@29cd8a005505:/# apt-get update
Get:1 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]      
Get:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:5 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [339 kB]
Get:6 http://archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [186 kB]
Get:7 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [152 kB]
Get:8 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages [11.3 MB]
Get:9 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [3451 B]
Get:10 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages [1344 kB]   
Get:11 http://archive.ubuntu.com/ubuntu bionic/restricted amd64 Packages [13.5 kB]
Get:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [679 kB]
Get:13 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [6955 B]
Get:14 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [10.7 kB]
Get:15 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [932 kB]
Get:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [3650 B]
Fetched 15.5 MB in 1s (10.8 MB/s)                          
Reading package lists... Done

However, I have reached IT and it seems our corporate proxy (which requires a local cntlm for AD authentication) uses an old protocol for the man in the middle… and for this reason we cannot upgrade our Docker past 18.06.1-ce Do you think we could be hitting the same issue here?

@matthyx from the log I see that the proxy has been set to http://127.0.0.1:3129/ This is localhost in the host machine, but inside the kind node container this address is the container’s loopback (not the host’s loopback). Therefore, you should set your proxy to an address witch is reachable from the kind node container.

@floreks hmm took a quick peek, nothing leapt out 😞 we do run kind extensively inside a docker in docker (not the standard image though) setup for k8s CI.

We have seen kubelet continually evicting the API server in a few cases due to low disk / memory but I didn’t see that in the logs.

@endzyme I suspect some variant on #270 may help. I am also further exploring #200.

I was running dind container docker:18.09-dind in kubernetes. After I’ve changed image to docker:18.09.1-dind issue got resolved.

i wonder what was fixed.

Upgraded docker from 18.09 to 18.09.1 and problem went away 🎉.