consul-k8s: Add /quitquitquit endpoint to lifecycle sidecar/envoy so Jobs can self-terminate their sidecars
Overview of the Issue
When using job, consul-connect-envoy-sidecar makes them running forever even when the Job container is in Terminated status.
Reproduction Steps
-
Create a cluster and install consul connect via helm with this override:
--- global: enabled: true server: replicas: 1 bootstrapExpect: 1 connect: true storageClass: nfs-client client: grpc: true ui: enabled: true connectInject: enabled: true default: true centralConfig: enabled: true defaultProtocol: http -
Create a Job file
job.yml:--- apiVersion: batch/v1 kind: Job metadata: name: pi spec: template: spec: containers: - name: pi image: perl command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"] restartPolicy: Never backoffLimit: 4 -
Apply:
kubectl apply -f job.yml -
Wait a little bit
-
look at the job
kubectl describe po -ljob-name=pi:
Name: pi-7znmn
Namespace: default
Priority: 0
Node: compute02-test-2/10.253.0.9
Start Time: Thu, 24 Oct 2019 06:39:05 +0000
Labels: controller-uid=a5b8429c-44ed-48ae-bb81-82ff12772ffe
job-name=pi
Annotations: consul.hashicorp.com/connect-inject-status: injected
consul.hashicorp.com/connect-service: pi
consul.hashicorp.com/connect-service-protocol: http
Status: Running
IP: 10.233.67.186
Controlled By: Job/pi
Init Containers:
consul-connect-inject-init:
Container ID: docker://a7ef1cf9a296912b66110691345e1872f86368caaab8b3a88e200f4a71b5b8eb
Image: consul:1.6.1
Image ID: docker-pullable://consul@sha256:94cdbd83f24ec406da2b5d300a112c14cf1091bed8d6abd49609e6fe3c23f181
Port: <none>
Host Port: <none>
Command:
/bin/sh
-ec
export CONSUL_HTTP_ADDR="${HOST_IP}:8500"
export CONSUL_GRPC_ADDR="${HOST_IP}:8502"
# Register the service. The HCL is stored in the volume so that
# the preStop hook can access it to deregister the service.
cat <<EOF >/consul/connect-inject/service.hcl
services {
id = "${POD_NAME}-pi-sidecar-proxy"
name = "pi-sidecar-proxy"
kind = "connect-proxy"
address = "${POD_IP}"
port = 20000
proxy {
destination_service_name = "pi"
destination_service_id = "pi"
}
checks {
name = "Proxy Public Listener"
tcp = "${POD_IP}:20000"
interval = "10s"
deregister_critical_service_after = "10m"
}
checks {
name = "Destination Alias"
alias_service = "pi"
}
}
services {
id = "${POD_NAME}-pi"
name = "pi"
address = "${POD_IP}"
port = 0
}
EOF
# Create the central config's service registration
cat <<EOF >/consul/connect-inject/central-config.hcl
kind = "service-defaults"
name = "pi"
protocol = "http"
EOF
/bin/consul config write -cas -modify-index 0 \
/consul/connect-inject/central-config.hcl || true
/bin/consul services register \
/consul/connect-inject/service.hcl
# Generate the envoy bootstrap code
/bin/consul connect envoy \
-proxy-id="${POD_NAME}-pi-sidecar-proxy" \
-bootstrap > /consul/connect-inject/envoy-bootstrap.yaml
# Copy the Consul binary
cp /bin/consul /consul/connect-inject/consul
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 24 Oct 2019 06:39:08 +0000
Finished: Thu, 24 Oct 2019 06:39:08 +0000
Ready: True
Restart Count: 0
Environment:
HOST_IP: (v1:status.hostIP)
POD_IP: (v1:status.podIP)
POD_NAME: pi-7znmn (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/connect-inject from consul-connect-inject-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-p4ntz (ro)
Containers:
pi:
Container ID: docker://f57e3a3abd0ef2cc35b563e65e047398eba8126e68ce6e6ddd4cfb7835d02733
Image: perl
Image ID: docker-pullable://perl@sha256:b3f356876d5615e91b808cbdcba0ff618a7ba0c167326bd013c15b2194db03c9
Port: <none>
Host Port: <none>
Command:
perl
-Mbignum=bpi
-wle
print bpi(2000)
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 24 Oct 2019 06:39:24 +0000
Finished: Thu, 24 Oct 2019 06:39:30 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-p4ntz (ro)
consul-connect-envoy-sidecar:
Container ID: docker://9d8ec302199ad5271196356140dd20b204e67c56f8036a6b51173a1395d81b77
Image: envoyproxy/envoy-alpine:v1.9.1
Image ID: docker-pullable://envoyproxy/envoy-alpine@sha256:04ed416733b49260db0a346565ab523c6d2a362cfd29a1ab23a926af77849ecb
Port: <none>
Host Port: <none>
Command:
envoy
--max-obj-name-len
256
--config-path
/consul/connect-inject/envoy-bootstrap.yaml
State: Running
Started: Thu, 24 Oct 2019 06:39:25 +0000
Ready: True
Restart Count: 0
Environment:
HOST_IP: (v1:status.hostIP)
Mounts:
/consul/connect-inject from consul-connect-inject-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-p4ntz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-p4ntz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-p4ntz
Optional: false
consul-connect-inject-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m44s default-scheduler Successfully assigned default/pi-7znmn to compute02-test-2
Normal Pulled 4m41s kubelet, compute02-test-2 Container image "consul:1.6.1" already present on machine
Normal Created 4m41s kubelet, compute02-test-2 Created container consul-connect-inject-init
Normal Started 4m41s kubelet, compute02-test-2 Started container consul-connect-inject-init
Normal Pulling 4m41s kubelet, compute02-test-2 Pulling image "perl"
Normal Pulled 4m25s kubelet, compute02-test-2 Successfully pulled image "perl"
Normal Created 4m25s kubelet, compute02-test-2 Created container pi
Normal Started 4m25s kubelet, compute02-test-2 Started container pi
Normal Pulled 4m25s kubelet, compute02-test-2 Container image "envoyproxy/envoy-alpine:v1.9.1" already present on machine
Normal Created 4m25s kubelet, compute02-test-2 Created container consul-connect-envoy-sidecar
Normal Started 4m24s kubelet, compute02-test-2 Started container consul-connect-envoy-sidecar
Consul info for both Client and Server
Client info
agent:
check_monitors = 0
check_ttls = 0
checks = 62
services = 62
build:
prerelease =
revision = 9be6dfc3
version = 1.6.1
consul:
acl = disabled
known_servers = 1
server = false
runtime:
arch = amd64
cpu_count = 32
goroutines = 590
max_procs = 32
os = linux
version = go1.12.1
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 5
members = 4
query_queue = 0
query_time = 1
Server info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 9be6dfc3
version = 1.6.1
consul:
acl = disabled
bootstrap = true
known_datacenters = 1
leader = true
leader_addr = 10.233.65.27:8300
server = true
raft:
applied_index = 10031
commit_index = 10031
fsm_pending = 0
last_contact = 0
last_log_index = 10031
last_log_term = 2
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:53bfea2a-aa00-2ee2-13c5-c9b682e903f9 Address:10.233.65.27:8300}]
latest_configuration_index = 1
num_peers = 0
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 2
runtime:
arch = amd64
cpu_count = 32
goroutines = 345
max_procs = 32
os = linux
version = go1.12.1
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 5
members = 4
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
Operating system and Environment details
- OS: Debian 9
- Kubernetes: 1.15.3
- Helm: 2.14.3
- Installer: Kubespray
- CNI: cilium
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 20 (13 by maintainers)
We came across this issue and this is the (not ideal) way we’re dealing with it at the moment. Would definitely be curious what other ways there might be to solve the issue for sure!
We are sharing process namespaces and using the primary container to kill the sidecars before exit.
As part of consul-k8s releases 1.0.8 and 1.1.3 we added support for graceful shutdown in the proxy lifecycle. The next 1.2.x release that comes out will also have this feature enabled.
With it the feature, you can call
/graceful_shutdownon the proxy and it will terminate consul-dataplane.An example of how to use
/graceful_shutdowncan be seen in the jobs example on our website.The most important piece is that you can curl the endpoint:
Hi, we actually don’t have the lifecycle sidecar anymore. There’s just the envoy sidecar so I don’t think this is required now.
Gotcha yeah this makes sense. Probably we should implement
/quitquitquitin the lifecycle sidecar and then that would also make the call to envoy.