rke: RBAC authorization issue (403)

RKE version: rke version v0.1.6-rc1

Docker version: Docker version 17.03.2-ce, build f5ec1e2

Operating system and kernel: 16.04.4 LTS (Xenial Xerus), 4.13.0-1012-azure

Type/provider of hosts: Azure

cluster.yml file:

# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: "40.121.217.69"
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  hostname_override: ""
  user: ubuntu
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  labels: {}
- address: "52.226.17.133"
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: ubuntu
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  labels: {}
services:
  etcd:
    image: rancher/coreos-etcd:v3.0.17
  kube-api:
    image: rancher/hyperkube:v1.8.10
    service_cluster_ip_range: 10.43.0.0/16
    pod_security_policy: false
  kube-controller:
    image: rancher/hyperkube:v1.8.10
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: rancher/hyperkube:v1.8.10
  kubelet:
    image: rancher/hyperkube:v1.8.10
    cluster_domain: cluster.local
    infra_container_image: rancher/pause-amd64:3.0
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
  kubeproxy:
    image: rancher/hyperkube:v1.8.10
network:
  plugin: canal
authentication:
  strategy: x509
ssh_key_path: ~/.ssh/id_rsa
ssh_agent_auth: false
authorization:
  mode: rbac
ignore_docker_version: false

Steps to Reproduce: Run rke up

Results:

➜  Desktop ./rke_darwin-amd64 up
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [40.121.217.69]
INFO[0002] [dialer] Setup tunnel for host [52.226.17.133]
INFO[0004] [state] Found local kube config file, trying to get state from cluster
INFO[0004] [state] Fetching cluster state from Kubernetes
INFO[0034] Timed out waiting for kubernetes cluster to get state
INFO[0034] [network] Deploying port listener containers
INFO[0040] [network] Port listener containers deployed successfully
INFO[0040] [network] Running control plane -> etcd port checks
INFO[0042] [network] Successfully started [rke-port-checker] container on host [40.121.217.69]
INFO[0043] [network] Running control plane -> worker port checks
INFO[0044] [network] Successfully started [rke-port-checker] container on host [40.121.217.69]
INFO[0045] [network] Running workers -> control plane port checks
INFO[0046] [network] Successfully started [rke-port-checker] container on host [52.226.17.133]
INFO[0047] [network] Checking KubeAPI port Control Plane hosts
INFO[0047] [network] Removing port listener containers
INFO[0048] [remove/rke-etcd-port-listener] Successfully removed container on host [40.121.217.69]
INFO[0049] [remove/rke-cp-port-listener] Successfully removed container on host [40.121.217.69]
INFO[0050] [remove/rke-worker-port-listener] Successfully removed container on host [52.226.17.133]
INFO[0050] [network] Port listener containers removed successfully
INFO[0050] [certificates] Attempting to recover certificates from backup on host [40.121.217.69]
INFO[0052] [certificates] Successfully started [cert-fetcher] container on host [40.121.217.69]
INFO[0052] [certificates] No Certificate backup found on host [40.121.217.69]
INFO[0052] [certificates] Generating CA kubernetes certificates
INFO[0052] [certificates] Generating Kubernetes API server certificates
INFO[0052] [certificates] Generating Kube Controller certificates
INFO[0053] [certificates] Generating Kube Scheduler certificates
INFO[0053] [certificates] Generating Kube Proxy certificates
INFO[0054] [certificates] Generating Node certificate
INFO[0054] [certificates] Generating admin certificates and kubeconfig
INFO[0055] [certificates] Generating etcd-40.121.217.69 certificate and key
INFO[0055] [certificates] Temporarily saving certs to control host [40.121.217.69]
INFO[0057] [certificates] Saved certs to control host [40.121.217.69]
INFO[0057] [reconcile] Reconciling cluster state
INFO[0057] [reconcile] This is newly generated cluster
INFO[0057] [certificates] Deploying kubernetes certificates to Cluster nodes
INFO[0060] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
INFO[0060] [certificates] Successfully deployed kubernetes certificates to Cluster nodes
INFO[0060] Pre-pulling kubernetes images
INFO[0060] Kubernetes images pulled successfully
INFO[0060] [etcd] Building up etcd plane..
INFO[0060] [etcd] Pulling image [rancher/coreos-etcd:v3.0.17] on host [40.121.217.69]
INFO[0063] [etcd] Successfully pulled image [rancher/coreos-etcd:v3.0.17] on host [40.121.217.69]
INFO[0064] [etcd] Successfully started [etcd] container on host [40.121.217.69]
INFO[0066] [etcd] Successfully started [rke-log-linker] container on host [40.121.217.69]
INFO[0067] [remove/rke-log-linker] Successfully removed container on host [40.121.217.69]
INFO[0067] [etcd] Successfully started etcd plane..
INFO[0067] [controlplane] Building up Controller Plane..
INFO[0068] [sidekick] Sidekick container already created on host [40.121.217.69]
INFO[0070] [controlplane] Successfully updated [kube-apiserver] container on host [40.121.217.69]
INFO[0071] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [40.121.217.69]
FATA[0127] [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Service [kube-apiserver] is not healthy on host [40.121.217.69]. Response code: [403], response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/healthz\"","reason":"Forbidden","details":{},"code":403}

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 27 (10 by maintainers)

Most upvoted comments

Hit the same error with v0.1.6-rc3

INFO[0016] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.99.210] 
FATA[0067] [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Service [kube-apiserver] is not healthy on host [192.168.99.210]. Response code: [403], response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/healthz\"","reason":"Forbidden","details":{},"code":403}

#cluster.yml
nodes:
  - address: 192.168.99.210
    user: ubuntu
    ssh_key_path: <path to key>
    role: [controlplane,worker,etcd]

system_images:
  etcd: rancher/etcd:v3.0.17
  kubernetes: rancher/hyperkube:v1.10.1
  alpine: rancher/rke-tools:v0.1.4
  nginx_proxy: rancher/rke-tools:v0.1.4
  cert_downloader: rancher/rke-tools:v0.1.4
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.4
  kubedns: rancher/k8s-dns-kube-dns-amd64:1.14.5
  dnsmasq: rancher/k8s-dns-dnsmasq-nanny-amd64:1.14.5
  kubedns_sidecar: rancher/k8s-dns-sidecar-amd64:1.14.5
  kubedns_autoscaler: rancher/cluster-proportional-autoscaler-amd64:1.0.0
  flannel: rancher/coreos-flannel:v0.9.1
  flannel_cni: rancher/coreos-flannel-cni:v0.2.0

see logs in etcd:

2018-04-27 05:16:39.477059 I | etcdmain: etcd Version: 3.0.17
......
2018-04-27 05:16:39.797472 I | etcdmain: serving client requests on 0.0.0.0:2379
2018-04-27 05:16:39.885383 I | v3rpc/grpc: Failed to dial [::]:2379: connection error: desc = "transport: remote error: bad certificate"; please retry.
2018-04-27 05:16:39.896081 I | v3rpc/grpc: Failed to dial [::]:2379: connection error: desc = "transport: remote error: bad certificate"; please retry.
2018-04-27 05:16:39.902119 I | v3rpc/grpc: Failed to dial [::]:2379: connection error: desc = "transport: remote error: bad certificate"; please retry.
2018-04-27 05:16:39.920072 I | v3rpc/grpc: Failed to dial [::]:2379: connection error: desc = "transport: remote error: bad certificate"; please retry.
2018-04-27 05:16:39.920354 I | v3rpc/grpc: Failed to dial [::]:2379: connection error: desc = "transport: remote error: bad certificate"; please retry.
2018-04-27 05:16:39.922229 I | v3rpc/grpc: Failed to dial [::]:2379: connection error: desc = "transport: remote error: bad certificate"; please retry.

The root cause is that there is time delay between this vm and my laptop. It can be solved by enabling ntpd time sync on the vm:

sudo apt-get install ntp
sudo ntpq -p

gitlawr on Apr 27, 2018