patroni: Kubernetes dcs read timed out

Hi I’m trying to initialize a cluster with kubernetes as dcs for patroni but I get this error:

➜  patroni kubectl logs patroni-1           
decompressing spilo image...
2018-01-24 06:54:03,626 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2018-01-24 06:54:03,631 - bootstrapping - DEBUG - Starting new HTTP connection (1): 169.254.169.254
2018-01-24 06:54:05,636 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2018-01-24 06:54:05,637 - bootstrapping - INFO - No meta-data available for this provider
2018-01-24 06:54:05,637 - bootstrapping - INFO - Looks like your running local
2018-01-24 06:54:05,670 - bootstrapping - INFO - Configuring pgbouncer
2018-01-24 06:54:05,670 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2018-01-24 06:54:05,670 - bootstrapping - INFO - Configuring patroni
2018-01-24 06:54:05,684 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2018-01-24 06:54:05,684 - bootstrapping - INFO - Configuring bootstrap
2018-01-24 06:54:05,685 - bootstrapping - INFO - Configuring certificate
2018-01-24 06:54:05,685 - bootstrapping - INFO - Generating ssl certificate
2018-01-24 06:54:05,884 - bootstrapping - DEBUG - b"Generating a 2048 bit RSA private key\n............+++\n......................................................+++\nwriting new private key to '/home/postgres/server.key'\n-----\n"
2018-01-24 06:54:05,884 - bootstrapping - INFO - Configuring crontab
2018-01-24 06:54:05,885 - bootstrapping - INFO - Configuring wal-e
2018-01-24 06:54:05,885 - bootstrapping - INFO - Configuring pam-oauth2
2018-01-24 06:54:05,885 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2018-01-24 06:54:05,888 - bootstrapping - INFO - Configuring patronictl
2018-01-24 06:54:06,650 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/patroni.conf" during parsing
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/pgq.conf" during parsing
2018-01-24 06:54:06,663 INFO RPC interface 'supervisor' initialized
2018-01-24 06:54:06,663 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-01-24 06:54:06,663 INFO supervisord started with pid 1
2018-01-24 06:54:07,669 INFO spawned: 'cron' with pid 24
2018-01-24 06:54:07,671 INFO spawned: 'patroni' with pid 25
2018-01-24 06:54:07,674 INFO spawned: 'pgq' with pid 26
2018-01-24 06:54:08,676 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:08,676 INFO success: patroni entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:08,677 INFO success: pgq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:12,576 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='10.233.0.1', port=443): Read timed out. (read timeout=3
.3333333333333335)",)': /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
2018-01-24 06:54:12,576 WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='10.233.0.1', port=443): Read timed out. (read timeout=
3.3333333333333335)",)': /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
2018-01-24 06:54:13,975 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:13,981 INFO: failed to acquire initialize lock
2018-01-24 06:54:25,519 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:25,525 INFO: failed to acquire initialize lock
2018-01-24 06:54:33,572 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:33,579 INFO: failed to acquire initialize lock
2018-01-24 06:54:43,672 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:43,679 INFO: failed to acquire initialize lock
2018-01-24 06:54:53,741 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:53,751 INFO: failed to acquire initialize lock
2018-01-24 06:55:03,798 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:55:03,807 INFO: failed to acquire initialize lock
2018-01-24 06:55:13,848 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:55:13,856 INFO: failed to acquire initialize lock
...

But when I shell into pod I don’t see any issue with API server:

root@patroni-1:/home/postgres# KUBE_TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token)
root@patroni-1:/home/postgres# curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://10.233.0.1:443/api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
{
  "kind": "EndpointsList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/namespaces/default/endpoints",
    "resourceVersion": "10343553"
  },
  "items": [
    {
      "metadata": {
        "name": "patroni",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/endpoints/patroni",
        "uid": "5cf882de-ff82-11e7-9b4b-005056bb262b",
        "resourceVersion": "9831960",
        "creationTimestamp": "2018-01-22T14:41:42Z",
        "labels": {
          "app": "patroni",
          "application": "patroni",
          "cluster": "patroni",
          "release": "patroni"
        },
        "annotations": {
          "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Endpoints\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"patroni\",\"application\":\"patroni\",\"cluster\":\"patroni\",\"release\":\"patroni\"},\"name\":\"patroni\",\"namespace\":\"default\"},\"subsets\":[]}\n"
        }
      },
      "subsets": null
    }
  ]
}

I tried some other commands:

root@patroni-1:/home/postgres# patronictl -c postgres.yml reinit patroni
+---------+-----------+------+------+-------+-----------+
| Cluster | Member    | Host | Role | State | Lag in MB |
+---------+-----------+------+------+-------+-----------+
| patroni | patroni-0 | None |      |       |   unknown |
| patroni | patroni-1 | None |      |       |   unknown |
| patroni | patroni-2 | None |      |       |   unknown |
+---------+-----------+------+------+-------+-----------+
Which member do you want to reinitialize [patroni-2, patroni-0, patroni-1]? []: patroni-1
Are you sure you want to reinitialize members patroni-1? [y/N]: y
Traceback (most recent call last):
  File "/usr/local/bin/patronictl", line 11, in <module>
    sys.exit(ctl())
  File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/patroni/ctl.py", line 530, in reinit
    r = request_patroni(member, 'post', 'reinitialize', body, auth_header(obj))
  File "/usr/local/lib/python3.5/dist-packages/patroni/ctl.py", line 141, in request_patroni
    data=json.dumps(content) if content else None, timeout=60)
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 612, in send
    adapter = self.get_adapter(url=request.url)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 703, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'b''://b''/reinitialize'

No luck with them either.

Here is my adopted manifest from https://github.com/unguiculus/charts/tree/feature/patroni/incubator/patroni:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: patroni 
  labels:
    app: patroni
    release: patroni 
    application: patroni
    cluster: patroni
spec:
  serviceName: patroni
  replicas: 3
  template:
    metadata:
      labels:
        app: patroni
        release: patroni
        application: patroni
        cluster: patroni
    spec:
      serviceAccountName: patroni-serviceaccount
      containers:
        - name: spilo
          image: registry.opensource.zalan.do/acid/spilo-10:latest
          imagePullPolicy: Always
          env:
            - name: DEBUG
              value: "true"
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: DCS_ENABLE_KUBERNETES_API
              value: "true"
            - name: USE_ENDPOINTS
              value: "true"
            - name: PATRONI_KUBERNETES_USE_ENDPOINTS 
              value: "true"
            - name: PATRONI_USE_KUBERNETES
              value: "true"
            - name: PATRONI_KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: PATRONI_CONFIGURATION
              value: |
                postgresql:
                  bin_dir: /usr/lib/postgresql/10/bin
                kubernetes:
                  labels:
                    app: patroni
                    release: patroni
                    application: patroni
                    cluster: patroni
                  scope_label: cluster
            - name: SCOPE
              value: patroni
            - name: PGPASSWORD_SUPERUSER
              valueFrom:
                secretKeyRef:
                  name: patroni
                  key: password-superuser
            - name: PGPASSWORD_STANDBY
              valueFrom:
                secretKeyRef:
                  name: patroni
                  key: password-standby
            - name: PGROOT
              value: /home/postgres/pgdata
          ports:
            - containerPort: 8008
              name: patroni
              protocol: TCP
            - containerPort: 5432
              name: postgresql
              protocol: TCP
          volumeMounts:
            - name: pg-vol
              mountPath: /home/postgres/pgdata
            - mountPath: /etc/patroni
              name: patroni-config
              readOnly: true
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchLabels:
                  app: patroni
                  release: patroni
      volumes:
        - name: patroni-config
          secret:
            secretName: patroni
        - name: pg-vol
          hostPath:
            path: /pintapin/data/postgres
            type: Directory

There was some issues with RBAC and I used following role to overcome them:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: patroni-role
  namespace: default
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - pods
      - secrets
      - namespaces
    verbs:
      - get
      - list
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - endpoints
    verbs:
      - get
      - list
      - watch

And generated patroni config:

root@patroni-1:/home/postgres# cat postgres.yml 
bootstrap:
  dcs:
    loop_wait: 10
    maximum_lag_on_failover: 33554432
    postgresql:
      parameters:
        archive_mode: 'on'
        archive_timeout: 1800s
        autovacuum_analyze_scale_factor: 0.02
        autovacuum_max_workers: 5
        autovacuum_vacuum_scale_factor: 0.05
        checkpoint_completion_target: 0.9
        hot_standby: 'on'
        log_autovacuum_min_duration: 0
        log_checkpoints: 'on'
        log_connections: 'on'
        log_disconnections: 'on'
        log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
        log_lock_waits: 'on'
        log_min_duration_statement: 500
        log_statement: ddl
        log_temp_files: 0
        max_connections: 266
        max_replication_slots: 5
        max_wal_senders: 5
        tcp_keepalives_idle: 900
        tcp_keepalives_interval: 100
        track_functions: all
        wal_keep_segments: 8
        wal_level: hot_standby
        wal_log_hints: 'on'
      use_pg_rewind: true
      use_slots: true
    retry_timeout: 10
    ttl: 30
  initdb:
  initdb:
  - encoding: UTF8
  - locale: en_US.UTF-8
  - data-checksums
  post_init: /post_init.sh "zalandos"
kubernetes:
  labels:
    app: patroni
    application: patroni
    cluster: patroni
    release: patroni
  pod_ip: 10.233.87.10
  ports:
  - name: postgresql
    port: 5432
  role_label: spilo-role
  scope_label: cluster
  use_endpoints: true
postgresql:
  authentication:
    replication:
      password: '1234567890 '
      username: standby
    superuser:
      password: '1234567890 '
      username: postgres
  bin_dir: /usr/lib/postgresql/10/bin
  connect_address: 10.233.87.10:5432
  data_dir: /home/postgres/pgdata/pgroot/data
  listen: 0.0.0.0:5432
  name: patroni-1
  parameters:
    archive_command: /bin/true
    bg_mon.listen_address: 0.0.0.0
    extwlist.extensions: btree_gin,btree_gist,hstore,intarray,ltree,pgcrypto,pgq,pg_trgm,postgres_fdw,uuid-ossp,hypopg
    log_destination: csvlog
    log_directory: ../pg_log
    log_file_mode: '0644'
    log_filename: postgresql-%u.log
    log_rotation_age: 1d
    log_truncate_on_rotation: 'on'
    logging_collector: 'on'
    shared_buffers: 1995MB
    shared_preload_libraries: bg_mon,pg_stat_statements,pg_cron,set_user,pgextwlist
    ssl: 'on'
    ssl_cert_file: /home/postgres/server.crt
    ssl_key_file: /home/postgres/server.key
  pg_hba:
  - local   all             all                                   trust
  - hostssl all             +zalandos    127.0.0.1/32       pam
  - host    all             all                127.0.0.1/32       md5
  - hostssl all             +zalandos    ::1/128            pam
  - host    all             all                ::1/128            md5
  - hostssl replication     standby all                md5
  - hostnossl all           all                all                reject
  - hostssl all             +zalandos    all                pam
  - hostssl all             all                all                md5
  use_unix_socket: true
restapi:
  connect_address: 10.233.87.10:8008
  listen: 0.0.0.0:8008
scope: patroni

This happens on all pods. I’ve tried running patroni manually but it jumps straightly to failed issue and I don’t see timeout any more

root@patroni-1:/home/postgres# patroni postgres.yml 
2018-01-24 07:26:16,041 INFO: Lock owner: None; I am patroni-1
2018-01-24 07:26:16,058 INFO: failed to acquire initialize lock

Tried mimic start up, but I can’t do it on kubernetes:

root@patroni-1:/home/postgres# /launch.sh 
ERROR: Supervisord is already running

Issue is same on all pods. Also tried to dive into code but it was hard for me to follow the breadcrumbs. So what should I do now? Is there a misconfiguration?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (6 by maintainers)

Commits related to this issue

Most upvoted comments

i have upgraded helm to use kubernetes DCS. its seems patronictl is not working anymore.

@k1-hedayati you mind opening a PR with that role and rolebinding to the k8s template, I must admit I’ve been struggling with this problem (failed reading pods, and silent endpoint updates) myself as well.