postgres-operator: The operator fails over every 30m when sidecar is specified
After getting advice, this is a cross post with https://github.com/zalando/spilo/issues/536 — it would seem the fault is with the operator and not Spilo.
- Which image of the operator are you using?
registry.opensource.zalan.do/acid/postgres-operator:v1.6.0
- Where do you run it - cloud or metal? Kubernetes or OpenShift? GCP
- Are you running Postgres Operator in production? yes
- Type of issue? bug report
For me it’s enough to define a sidecar like this, to trigger the behaviour:
sidecars:
- name: exporter
image: wrouesnel/postgres_exporter
env:
- name: "DATA_SOURCE_URI"
value: app-analytics-db/analytics?sslmode=require
- name: "DATA_SOURCE_USER"
valueFrom:
secretKeyRef:
name: postgres.app-analytics-db.credentials
key: username
- name: "DATA_SOURCE_PASS"
valueFrom:
secretKeyRef:
name: postgres.app-analytics-db.credentials
key: password
- name: PG_EXPORTER_WEB_LISTEN_ADDRESS
value: ":9114"
- name: PG_EXPORTER_CONSTANT_LABELS
value: app=analytics-db,component=postgres
ports:
- name: http-prom
containerPort: 9114
resources:
limits:
cpu: 500m
memory: 256M
requests:
cpu: 100m
memory: 200M
Causes it to fail over every thirty minutes:
Some logs:
time="2021-01-06T17:24:36Z" level=debug msg="metadata.annotation are different" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="- zalando-postgres-operator-rolling-update-required: false" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="+ zalando-postgres-operator-rolling-update-required: true" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=info msg="reason: new statefulset containers's exporter (index 1) ports do not match the current one" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="updating statefulset" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="patching statefulset annotations" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:37Z" level=debug msg="patching statefulset annotations" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:37Z" level=debug msg="calling Patroni API on a pod app/app-analytics-db-0 to set the following Postgres options: map[wal_level:logical]" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:37Z" level=debug msg="making PATCH http request: http://10.4.7.76:8008/config" cluster-name=app/app-analytics-db pkg=cluster worker=1
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (7 by maintainers)
Ok, if I add “protocol: TCP” to the manifest I have no more rolling update. Thank you.
I’ve hacked the operator to give me more details on why the ports are different and look what I’ve found:
Seems that the TCP protocol is assigned by default by K8s. If you add
Protocol: TCP
in the manifest there will be no diff and rolling updates will stop.Hello,
I have the same problem (a sidecar container for prometheus exporter). The database is ok; the exporter is Ok. But the operator make a rolling upgrade every 30 min. Same reason: “reason: new statefulset containers’s metrics (index 1) ports do not match the current one”
The only workaround is to increase the resync_period.
Here is the log:
time="2021-01-26T12:47:15Z" level=debug msg="set statefulset's rolling update annotation to false: caller/reason from cache" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="set statefulset's rolling update annotation to true: caller/reason statefulset changes" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=info msg="statefulset testope/acid-testef is not in the desired state and needs to be updated" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePath: /dev/termination-log," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePolicy: File," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- containerPort: 9187," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- protocol: TCP" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ containerPort: 9187" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePath: /dev/termination-log," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePolicy: File," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- restartPolicy: Always," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- dnsPolicy: ClusterFirst," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- serviceAccount: postgres-pod," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- }," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- schedulerName: default-scheduler" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ }" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- kind: PersistentVolumeClaim," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- apiVersion: v1," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- status: {" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- phase: Pending" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- }" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ status: {}" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- }," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- revisionHistoryLimit: 10" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ }" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="metadata.annotation are different" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- zalando-postgres-operator-rolling-update-required: false" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ zalando-postgres-operator-rolling-update-required: true" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=info msg="reason: new statefulset containers's metrics (index 1) ports do not match the current one" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="updating statefulset" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="patching statefulset annotations" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="patching statefulset annotations" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="performing rolling update" cluster-name=testope/acid-testef pkg=cluster