rancher: Deployed pods missing workloadID label - no active endpoints - 503 error
Rancher versions: rancher/server or rancher/rancher: 2.1.0 rancher/agent or rancher/rancher-agent: 2.1.0
Infrastructure Stack versions: healthcheck: ipsec: network-services: scheduler: kubernetes (if applicable): 1.12.0
Docker version: (docker version
,docker info
preferred)
Operating system and kernel: (cat /etc/os-release
, uname -r
preferred)
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Setup details: (single node rancher vs. HA rancher, internal DB vs. external DB)
Single node rancher.
Environment Template: (Cattle/Kubernetes/Swarm/Mesos)
Kubernetes
Steps to Reproduce:
(originally filed as https://github.com/kubernetes/kubernetes/issues/69563 but I now suspect that the missing label may be related to R2)
We deploy a new version of our app by changing the spec.template.spec.containers[0].image
attribute of the Deployment YAML, as described in the documentation for Deployment
controllers…
The Deployment
YAML looks like this:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "17"
field.cattle.io/creatorId: user-5jgmc
creationTimestamp: 2018-10-02T04:54:33Z
generation: 45
labels:
workload.user.cattle.io/workloadselector: deployment-cms-app
name: app
namespace: cms
resourceVersion: "175227"
selfLink: /apis/apps/v1beta2/namespaces/cms/deployments/app
uid: 40eeb300-c5ff-11e8-91dc-001b21dc82ba
spec:
minReadySeconds: 5
progressDeadlineSeconds: 60
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
workload.user.cattle.io/workloadselector: deployment-cms-app
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
workload.cattle.io/state: '{"b3ZoLWJtLTE=":"c-gw6hx:m-190d5b1abdb1","b3ZoLWJtLTM=":"c-gw6hx:m-93ddef52ec17","b3ZoLWRiLTE=":"c-mjbqh:m-cfa61f40f7d7"}'
creationTimestamp: null
labels:
workload.user.cattle.io/workloadselector: deployment-cms-app
spec:
affinity: {}
containers:
- env:
- <redacted>
image: registry.ourdomain.com:5000/namespace/app:31848589
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 80
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
name: app
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 80
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 5
successThreshold: 2
timeoutSeconds: 5
resources: {}
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: false
runAsNonRoot: false
stdin: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
tty: true
volumeMounts:
- <redacted>
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: registry-secret
nodeName: ovh-app-1
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- <redacted>
status:
availableReplicas: 3
conditions:
- lastTransitionTime: 2018-10-02T05:18:54Z
lastUpdateTime: 2018-10-02T05:18:54Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2018-10-03T03:48:46Z
lastUpdateTime: 2018-10-03T04:48:25Z
message: ReplicaSet "app-5bf7dbc69f" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 45
readyReplicas: 3
replicas: 3
updatedReplicas: 3
I can see the Deployments, ReplicaSets and Services as expected.
$ kubectl get deployment -n cms
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
app 3 3 3 3 18h
redis 1 1 1 1 19h
$ kubectl get replicaset -n cms
NAME DESIRED CURRENT READY AGE
app-5bf7dbc69f 3 3 3 9h
app-7dc677d665 0 0 0 17h
app-849cc7c58d 0 0 0 18h
app-dd6cf6698 0 0 0 17h
redis-66985bf6c 1 1 1 19h
$ kubectl get service -n cms
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
app ClusterIP None <none> 42/TCP 20h
ingress-e213c2b4c622329de7aa2c0c28dc37e5 ClusterIP 10.43.216.158 <none> 80/TCP 16s
redis ClusterIP None <none> 42/TCP 21h
$ kubectl get service -n cms ingress-e213c2b4c622329de7aa2c0c28dc37e5 -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
field.cattle.io/targetWorkloadIds: '["deployment:cms:app"]'
creationTimestamp: 2018-10-09T08:39:00Z
labels:
cattle.io/creator: norman
name: ingress-e213c2b4c622329de7aa2c0c28dc37e5
namespace: cms
ownerReferences:
- apiVersion: v1beta1/extensions
controller: true
kind: Ingress
name: cms
uid: 6617bfe4-c63f-11e8-b01c-9e111c023110
resourceVersion: "1479746"
selfLink: /api/v1/namespaces/cms/services/ingress-e213c2b4c622329de7aa2c0c28dc37e5
uid: c4d9594b-cb9e-11e8-a6e1-9e111c023110
spec:
clusterIP: 10.43.216.158
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
workloadID_ingress-e213c2b4c622329de7aa2c0c28dc37e5: "true"
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
$ kubectl get ingress -n cms cms -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
field.cattle.io/creatorId: user-5jgmc
field.cattle.io/ingressState: '{"Y21zL2Ntcy9kZXYuY21zLmFnZW50ZGVzaWduLmNvLnVrLy8vODA=":"deployment:cms:app","Y21zLWRldi1sZQ==":"cms:cms-dev-le"}'
field.cattle.io/publicEndpoints: '[{"addresses":["94.237.50.126"],"port":443,"protocol":"HTTPS","serviceName":"cms:ingress-e213c2b4c622329de7aa2c0c28dc37e5","ingressName":"cms:cms","hostname":"dev.cms.agentdesign.co.uk","path":"/","allNodes":true}]'
creationTimestamp: 2018-10-02T12:33:43Z
generation: 15
name: cms
namespace: cms
resourceVersion: "1479727"
selfLink: /apis/extensions/v1beta1/namespaces/cms/ingresses/cms
uid: 6617bfe4-c63f-11e8-b01c-9e111c023110
spec:
rules:
- host: dev.cms.agentdesign.co.uk
http:
paths:
- backend:
serviceName: ingress-e213c2b4c622329de7aa2c0c28dc37e5
servicePort: 80
path: /
tls:
- hosts:
- dev.cms.agentdesign.co.uk
secretName: cms-dev-le
status:
loadBalancer:
ingress:
- ip: 94.237.50.126
- ip: 94.237.51.162
- ip: 94.237.54.20
- ip: 94.237.54.24
- ip: 94.237.54.26
$ kubectl get endpoints -n cms ingress-e213c2b4c622329de7aa2c0c28dc37e5
NAME ENDPOINTS AGE
ingress-e213c2b4c622329de7aa2c0c28dc37e5 10.42.0.157:80,10.42.3.146:80,10.42.4.168:80 10m
The Deployment scales down the ‘old’ ReplicaSet and scales up the ‘new’ ReplicaSet. I can see this is happening as expected.
$ kubectl get replicaset -n cms
NAME DESIRED CURRENT READY AGE
app-5bf7dbc69f 2 2 2 9h
app-7dc677d665 2 2 1 17h
app-849cc7c58d 0 0 0 18h
app-dd6cf6698 0 0 0 17h
redis-66985bf6c 1 1 1 19h
After a few seconds it’s fully scaled…
$ kubectl get replicaset -n cms
NAME DESIRED CURRENT READY AGE
app-5bf7dbc69f 0 0 0 9h
app-7dc677d665 3 3 3 17h
app-849cc7c58d 0 0 0 18h
app-dd6cf6698 0 0 0 17h
redis-66985bf6c 1 1 1 19h
However, at this point the site is showing the 503 Service Temporarily Unavailable (nginx/1.13.12), because the nginx-ingress logs show that the ingress has been taken out of the generated nginx configurations…
W1009 07:01:57.954692 5 controller.go:769] Service "cms/ingress-e213c2b4c622329de7aa2c0c28dc37e5" does not have any active Endpoint.
I1009 07:01:57.955086 5 controller.go:169] Configuration changes detected, backend reload required.
I1009 07:01:57.955154 5 util.go:68] rlimit.max=1048576
I1009 07:01:57.955184 5 nginx.go:519] Maximum number of open file descriptors: 1047552
I1009 07:01:58.051641 5 nginx.go:626] NGINX configuration diff:
--- /etc/nginx/nginx.conf 2018-10-09 07:01:48.210045444 +0000
+++ /tmp/new-nginx-cfg780268064 2018-10-09 07:01:58.046093899 +0000
@@ -211,21 +211,12 @@
keepalive 32;
- server 10.42.3.138:80 max_fails=0 fail_timeout=0;
server 10.42.0.148:80 max_fails=0 fail_timeout=0;
+ server 10.42.3.138:80 max_fails=0 fail_timeout=0;
server 10.42.4.157:80 max_fails=0 fail_timeout=0;
}
- upstream cms-ingress-e213c2b4c622329de7aa2c0c28dc37e5-80 {
- least_conn;
-
- keepalive 32;
-
- server 10.42.4.167:80 max_fails=0 fail_timeout=0;
-
- }
-
upstream db-ingress-231cd0bcc1b631a6142a515c3a0858e8-80 {
least_conn;
@@ -657,7 +648,7 @@
port_in_redirect off;
- set $proxy_upstream_name "cms-ingress-e213c2b4c622329de7aa2c0c28dc37e5-80";
+ set $proxy_upstream_name "";
# enforce ssl on server side
if ($redirect_to_https) {
@@ -717,9 +708,8 @@
proxy_next_upstream error timeout;
proxy_next_upstream_tries 3;
- proxy_pass http://cms-ingress-e213c2b4c622329de7aa2c0c28dc37e5-80;
-
- proxy_redirect off;
+ # No endpoints available for the request
+ return 503;
}
I1009 07:01:58.117524 5 controller.go:179] Backend successfully reloaded.
The reason for the ‘Service … does not have any active Endpoint’ is as, according to the docs, because the ‘endpoints controller has [not] found the correct Pods for your Service’.
$ kubectl get endpoints -n cms ingress-e213c2b4c622329de7aa2c0c28dc37e5
NAME ENDPOINTS AGE
ingress-e213c2b4c622329de7aa2c0c28dc37e5 <none> 12m
The advice given assumes the spec.selector field of the Service is not matching the metadata.labels values on your Pods. The spec.selector
is:
spec:
selector:
workloadID_ingress-e213c2b4c622329de7aa2c0c28dc37e5: "true"
The metadata.labels on the Pods are:
metadata:
labels:
pod-template-hash: "1693867259"
workload.user.cattle.io/workloadselector: deployment-cms-app
So, the docs are correct. But I’m still not sure why the label is not being set correctly on the workload.
What you expected to happen:
The endpoint controller to find active endpoints.
How to reproduce it (as minimally and precisely as possible):
The same issue occurs on both our production and development clusters and was still present after we rebuilt both as fresh clusters and migrated things across. I suspect it’s reproducible elsewhere.
Anything else we need to know?:
Not that I can think of currently.
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
Production - 3x Bare metal hosts with 64GB
Development - 5x Standard VMs with 8GB
- OS (e.g. from /etc/os-release):
VERSION="18.04.1 LTS (Bionic Beaver)"
- Kernel (e.g.
uname -a
):
Linux <hostnameredacted> 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
Clusters were deployed using Rancher2 (v2.0.8).
- Others:
N/A
Results:
Deployed service ends up with no active endpoints, causing nginx-ingress
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 22 (3 by maintainers)
Let’s keep this issue open
Hey guys. I’ve been kept from release and test an important product for my company and this is causing a big problem for my team. Could you please give me some advice about this issue?
Thanks
Hi @otreda
Can you provide the yaml file for the service of the ingress, please?