rke: Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

I have tried the work around from rancher/rke#1295 and it didn’t work for me, it produces the same error.

I’m not sure that #19189 would relate as this is a new cluster and not an upgrade to an existing one.

This is my first time using rke and setting up a k8s cluster so please let me know if I’m missing something obvious or if you need more information from me!

RKE version: rke version v0.2.4

Docker version: (docker version,docker info preferred)

Same for both node 1 and node 2.

Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        2d0083d
 Built:             Thu Jun 27 17:56:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       2d0083d
  Built:            Thu Jun 27 17:23:02 2019
  OS/Arch:          linux/amd64
  Experimental:     false
Containers: 10
 Running: 7
 Paused: 0
 Stopped: 3
Images: 4
Server Version: 18.09.7
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-54-generic
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.789GiB
Name: k8s-node1
ID: ZC43:K3I7:HP2S:LFUA:JXMF:EXV2:V7UJ:H7QN:27IJ:S3DC:6XYW:CS2P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

Same for both node 1 and node 2.

NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
4.15.0-54-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)

VM in Hyper-V

cluster.yml file:

# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: 10.0.1.74
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  - worker
  hostname_override: k8s-node1
  user: k8s
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
- address: 10.0.1.75
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  - worker
  hostname_override: k8s-node2
  user: k8s
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    snapshot: null
    retention: ""
    creation: ""
    backup_config: null
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    service_cluster_ip_range: 172.24.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_cidr: 172.25.0.0/24
    service_cluster_ip_range: 172.24.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
  kubelet:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_domain: k8s.adminarsenal.net
    infra_container_image: ""
    cluster_dns_server: 172.24.0.10
    fail_swap_on: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
network:
  plugin: flannel
  options:
    flannel_backend_type: vxlan
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: rancher/coreos-etcd:v3.3.10-rancher1
  alpine: rancher/rke-tools:v0.1.34
  nginx_proxy: rancher/rke-tools:v0.1.34
  cert_downloader: rancher/rke-tools:v0.1.34
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.34
  kubedns: rancher/k8s-dns-kube-dns:1.15.0
  dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.0
  kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.0
  kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.3.0
  coredns: rancher/coredns-coredns:1.3.1
  coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.3.0
  kubernetes: rancher/hyperkube:v1.14.3-rancher1
  flannel: rancher/coreos-flannel:v0.10.0-rancher1
  flannel_cni: rancher/flannel-cni:v0.3.0-rancher1
  calico_node: rancher/calico-node:v3.4.0
  calico_cni: rancher/calico-cni:v3.4.0
  calico_controllers: ""
  calico_ctl: rancher/calico-ctl:v2.0.0
  canal_node: rancher/calico-node:v3.4.0
  canal_cni: rancher/calico-cni:v3.4.0
  canal_flannel: rancher/coreos-flannel:v0.10.0
  weave_node: weaveworks/weave-kube:2.5.0
  weave_cni: weaveworks/weave-npc:2.5.0
  pod_infra_container: rancher/pause:3.1
  ingress: rancher/nginx-ingress-controller:0.21.0-rancher3
  ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  metrics_server: rancher/metrics-server:v0.3.1
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: false
kubernetes_version: ""
private_registries: []
ingress:
  provider: ""
  options: {}
  node_selector: {}
  extra_args: {}
cluster_name: ""
cloud_provider:
  name: ""
prefix_path: ""
addon_job_timeout: 30
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
monitoring:
  provider: ""
  options: {}
restore:
  restore: false
  snapshot_name: ""
dns: null

Steps to Reproduce:

Save config and then run ./rke up

Results:

INFO[0075] [remove/rke-log-cleaner] Successfully removed container on host [10.0.1.74]
INFO[0075] [remove/rke-log-cleaner] Successfully removed container on host [10.0.1.75]
INFO[0075] [sync] Syncing nodes Labels and Taints
INFO[0075] [sync] Successfully synced nodes Labels and Taints
INFO[0075] [network] Setting up network plugin: flannel
INFO[0075] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0075] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0075] [addons] Executing deploy job rke-network-plugin
FATA[0105] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 16

Most upvoted comments

Same here on vagrant Centos 1905.1 (hello, @lucky-sideburn). Disabling SELinux is the key.

I had the same issue, and these two steps solved my problem

  1. Increase addon_job_timeout
  2. Check node free space (at lease 15%)

In my case, one of the nodes had DiskPressure state

As @aijanai said, there can be firewall problems, so I added this two ports and it fix the issue

sudo ufw allow 6443
sudo ufw allow 10250