postgres-operator: The operator fails over every 30m when sidecar is specified

After getting advice, this is a cross post with https://github.com/zalando/spilo/issues/536 — it would seem the fault is with the operator and not Spilo.

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.6.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? GCP
  • Are you running Postgres Operator in production? yes
  • Type of issue? bug report

For me it’s enough to define a sidecar like this, to trigger the behaviour:


  sidecars:
  - name: exporter
    image: wrouesnel/postgres_exporter

    env:
    - name: "DATA_SOURCE_URI"
      value: app-analytics-db/analytics?sslmode=require

    - name: "DATA_SOURCE_USER"
      valueFrom:
        secretKeyRef:
          name: postgres.app-analytics-db.credentials
          key: username

    - name: "DATA_SOURCE_PASS"
      valueFrom:
        secretKeyRef:
          name: postgres.app-analytics-db.credentials
          key: password

    - name: PG_EXPORTER_WEB_LISTEN_ADDRESS
      value: ":9114"

    - name: PG_EXPORTER_CONSTANT_LABELS
      value: app=analytics-db,component=postgres

    ports:
    - name: http-prom
      containerPort: 9114

    resources:
      limits:
        cpu: 500m
        memory: 256M
      requests:
        cpu: 100m
        memory: 200M

Causes it to fail over every thirty minutes:

image

Some logs:

time="2021-01-06T17:24:36Z" level=debug msg="metadata.annotation are different" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="-  zalando-postgres-operator-rolling-update-required: false" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="+  zalando-postgres-operator-rolling-update-required: true" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=info msg="reason: new statefulset containers's exporter (index 1) ports do not match the current one" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="updating statefulset" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:36Z" level=debug msg="patching statefulset annotations" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:37Z" level=debug msg="patching statefulset annotations" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:37Z" level=debug msg="calling Patroni API on a pod app/app-analytics-db-0 to set the following Postgres options: map[wal_level:logical]" cluster-name=app/app-analytics-db pkg=cluster worker=1
time="2021-01-06T17:24:37Z" level=debug msg="making PATCH http request: http://10.4.7.76:8008/config" cluster-name=app/app-analytics-db pkg=cluster worker=1

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Ok, if I add “protocol: TCP” to the manifest I have no more rolling update. Thank you.

I’ve hacked the operator to give me more details on why the ports are different and look what I’ve found:

reason: new statefulset containers's postgres-exporter (index 1) ports do not match the current one: 
[]v1.ContainerPort{v1.ContainerPort{Name:\"metrics\", HostPort:0, ContainerPort:9187, Protocol:\"TCP\", HostIP:\"\"}} vs
[]v1.ContainerPort{v1.ContainerPort{Name:\"metrics\", HostPort:0, ContainerPort:9187, Protocol:\"\", HostIP:\"\"}}

Seems that the TCP protocol is assigned by default by K8s. If you add Protocol: TCP in the manifest there will be no diff and rolling updates will stop.

Hello,

I have the same problem (a sidecar container for prometheus exporter). The database is ok; the exporter is Ok. But the operator make a rolling upgrade every 30 min. Same reason: “reason: new statefulset containers’s metrics (index 1) ports do not match the current one”

The only workaround is to increase the resync_period.

Here is the log:

time="2021-01-26T12:47:15Z" level=debug msg="set statefulset's rolling update annotation to false: caller/reason from cache" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="set statefulset's rolling update annotation to true: caller/reason statefulset changes" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=info msg="statefulset testope/acid-testef is not in the desired state and needs to be updated" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePath: /dev/termination-log," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePolicy: File," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- containerPort: 9187," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- protocol: TCP" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ containerPort: 9187" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePath: /dev/termination-log," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- terminationMessagePolicy: File," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- restartPolicy: Always," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- dnsPolicy: ClusterFirst," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- serviceAccount: postgres-pod," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- }," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- schedulerName: default-scheduler" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ }" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- kind: PersistentVolumeClaim," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- apiVersion: v1," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- status: {" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- phase: Pending" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- }" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ status: {}" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- }," cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- revisionHistoryLimit: 10" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ }" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="metadata.annotation are different" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="- zalando-postgres-operator-rolling-update-required: false" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="+ zalando-postgres-operator-rolling-update-required: true" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=info msg="reason: new statefulset containers's metrics (index 1) ports do not match the current one" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="updating statefulset" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="patching statefulset annotations" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="patching statefulset annotations" cluster-name=testope/acid-testef pkg=cluster time="2021-01-26T12:47:15Z" level=debug msg="performing rolling update" cluster-name=testope/acid-testef pkg=cluster