terraform-provider-rke: RC4 - Cannot provision cluster due to rke-network-plugin-deploy-job
So I just downloaded the new provider and attempted to deploy a new cluster using it. It is a single node cluster, with nothing fancy.
It runs through the Still creating... process for about a minute and a half, and then errors out with the following:
time="2020-03-17T17:58:28Z" level=info msg="[network] Setting up network plugin: weave"
time="2020-03-17T17:58:28Z" level=info msg="[addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes"
time="2020-03-17T17:58:28Z" level=info msg="[addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes"
time="2020-03-17T17:58:28Z" level=info msg="[addons] Executing deploy job rke-network-plugin"
time="2020-03-17T17:58:28Z" level=debug msg="[k8s] waiting for job rke-network-plugin-deploy-job to complete.."
Failed running cluster err:Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
As the log states, I am using the weave CNI plugin which I have always used and has worked great.
I have some defaults applied to my kubernetes components to harden the security of the cluster itself - which did work in the provider that allowed kubernetes 1.15.3.
I am trying to use kubernetes version: v1.17.2-rancher1-2.
I did add the debug = true and log_path but it doesn’t seem to add much in terms of troubleshooting with obvious errors.
This is the contents of my rke.tf file:
# ---------------------------------------------------------------------
# RKE configuration
resource rke_cluster "cluster" {
depends_on = [azurestack_public_ip.vmpip, azurestack_virtual_machine.vm]
dynamic "nodes" {
for_each = azurestack_public_ip.vmpip.*
iterator = nodes
content {
address = nodes.value.ip_address
user = "testuser"
role = ["controlplane","etcd", "worker"]
ssh_key = file("/opt/${var.deployment_name}/${var.deployment_name}")
}
}
ignore_docker_version = true
cluster_name = "${var.deployment_name}-cluster"
# Kubernetes version
kubernetes_version = "v1.17.2-rancher1-2"
private_registries {
url = "myprivateregistry"
}
#########################################################
# Network(CNI) - supported: flannel/calico/canal/weave
#########################################################
# There are several network plug-ins that work, but we default to canal
network {
plugin = "weave"
}
ingress {
provider = "nginx"
options = {
proxy-buffer-size = "16k"
http2 = "true"
}
extra_args = {
default-ssl-certificate = "ingress-nginx/wildcard-ingress"
}
}
services {
kube_api {
pod_security_policy = "false"
extra_args = {
anonymous-auth = "false"
admission-control-config-file = "/opt/kubernetes/admission.yaml"
profiling = "false"
service-account-lookup = "true"
audit-log-maxage = "30"
audit-log-maxbackup = "10"
audit-log-maxsize = "100"
audit-log-format = "json"
audit-policy-file = "/opt/kubernetes/audit.yaml"
audit-log-path = "/var/log/kube-audit/audit-log.json"
enable-admission-plugins = "ServiceAccount,PodPreset,NamespaceLifecycle,LimitRanger,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds,AlwaysPullImages,SecurityContextDeny,PodSecurityPolicy,NodeRestriction,EventRateLimit"
runtime-config = "batch/v2alpha1,authentication.k8s.io/v1beta1=true,settings.k8s.io/v1alpha1=true"
tls-cipher-suites = "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256"
}
# Optionally, define extra binds for the api server
extra_binds = [
"/var/log/kube-audit:/var/log/kube-audit",
"/opt/kubernetes:/opt/kubernetes"
]
}
scheduler {
extra_args = {
address = "127.0.0.1"
}
}
kube_controller {
extra_args = {
profiling = "false"
address = "127.0.0.1"
terminated-pod-gc-threshold = "1000"
feature-gates = "RotateKubeletServerCertificate=true"
}
}
kubelet {
extra_args = {
volume-plugin-dir = "/usr/libexec/kubernetes/kubelet-plugins/volume/exec"
#protect-kernel-defaults = true # requires additional config
streaming-connection-idle-timeout = "1800s"
authorization-mode = "Webhook"
make-iptables-util-chains = "true"
event-qps = "0"
anonymous-auth = "false"
feature-gates = "RotateKubeletServerCertificate=true"
tls-cipher-suites = "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256"
}
# Optionally define additional volume binds to a service
extra_binds = [
"/usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec",
]
}
}
}
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 23 (10 by maintainers)
Yeah, definitely the issue is on debug argument. Deleting debug argument or setting it to
falseaddress the issue.The problem is caused by rke docker.pullimage function when debug is set, https://github.com/rancher/rke/blob/v1.0.4/docker/docker.go#L266 . This is breaking the rke execution on non TTY and stucking the process.