patroni: Kubernetes dcs read timed out
Hi I’m trying to initialize a cluster with kubernetes as dcs for patroni but I get this error:
➜ patroni kubectl logs patroni-1
decompressing spilo image...
2018-01-24 06:54:03,626 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2018-01-24 06:54:03,631 - bootstrapping - DEBUG - Starting new HTTP connection (1): 169.254.169.254
2018-01-24 06:54:05,636 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2018-01-24 06:54:05,637 - bootstrapping - INFO - No meta-data available for this provider
2018-01-24 06:54:05,637 - bootstrapping - INFO - Looks like your running local
2018-01-24 06:54:05,670 - bootstrapping - INFO - Configuring pgbouncer
2018-01-24 06:54:05,670 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2018-01-24 06:54:05,670 - bootstrapping - INFO - Configuring patroni
2018-01-24 06:54:05,684 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2018-01-24 06:54:05,684 - bootstrapping - INFO - Configuring bootstrap
2018-01-24 06:54:05,685 - bootstrapping - INFO - Configuring certificate
2018-01-24 06:54:05,685 - bootstrapping - INFO - Generating ssl certificate
2018-01-24 06:54:05,884 - bootstrapping - DEBUG - b"Generating a 2048 bit RSA private key\n............+++\n......................................................+++\nwriting new private key to '/home/postgres/server.key'\n-----\n"
2018-01-24 06:54:05,884 - bootstrapping - INFO - Configuring crontab
2018-01-24 06:54:05,885 - bootstrapping - INFO - Configuring wal-e
2018-01-24 06:54:05,885 - bootstrapping - INFO - Configuring pam-oauth2
2018-01-24 06:54:05,885 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2018-01-24 06:54:05,888 - bootstrapping - INFO - Configuring patronictl
2018-01-24 06:54:06,650 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/patroni.conf" during parsing
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/pgq.conf" during parsing
2018-01-24 06:54:06,663 INFO RPC interface 'supervisor' initialized
2018-01-24 06:54:06,663 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-01-24 06:54:06,663 INFO supervisord started with pid 1
2018-01-24 06:54:07,669 INFO spawned: 'cron' with pid 24
2018-01-24 06:54:07,671 INFO spawned: 'patroni' with pid 25
2018-01-24 06:54:07,674 INFO spawned: 'pgq' with pid 26
2018-01-24 06:54:08,676 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:08,676 INFO success: patroni entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:08,677 INFO success: pgq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:12,576 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='10.233.0.1', port=443): Read timed out. (read timeout=3
.3333333333333335)",)': /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
2018-01-24 06:54:12,576 WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='10.233.0.1', port=443): Read timed out. (read timeout=
3.3333333333333335)",)': /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
2018-01-24 06:54:13,975 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:13,981 INFO: failed to acquire initialize lock
2018-01-24 06:54:25,519 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:25,525 INFO: failed to acquire initialize lock
2018-01-24 06:54:33,572 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:33,579 INFO: failed to acquire initialize lock
2018-01-24 06:54:43,672 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:43,679 INFO: failed to acquire initialize lock
2018-01-24 06:54:53,741 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:53,751 INFO: failed to acquire initialize lock
2018-01-24 06:55:03,798 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:55:03,807 INFO: failed to acquire initialize lock
2018-01-24 06:55:13,848 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:55:13,856 INFO: failed to acquire initialize lock
...
But when I shell into pod I don’t see any issue with API server:
root@patroni-1:/home/postgres# KUBE_TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token)
root@patroni-1:/home/postgres# curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://10.233.0.1:443/api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
{
"kind": "EndpointsList",
"apiVersion": "v1",
"metadata": {
"selfLink": "/api/v1/namespaces/default/endpoints",
"resourceVersion": "10343553"
},
"items": [
{
"metadata": {
"name": "patroni",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/endpoints/patroni",
"uid": "5cf882de-ff82-11e7-9b4b-005056bb262b",
"resourceVersion": "9831960",
"creationTimestamp": "2018-01-22T14:41:42Z",
"labels": {
"app": "patroni",
"application": "patroni",
"cluster": "patroni",
"release": "patroni"
},
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Endpoints\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"patroni\",\"application\":\"patroni\",\"cluster\":\"patroni\",\"release\":\"patroni\"},\"name\":\"patroni\",\"namespace\":\"default\"},\"subsets\":[]}\n"
}
},
"subsets": null
}
]
}
I tried some other commands:
root@patroni-1:/home/postgres# patronictl -c postgres.yml reinit patroni
+---------+-----------+------+------+-------+-----------+
| Cluster | Member | Host | Role | State | Lag in MB |
+---------+-----------+------+------+-------+-----------+
| patroni | patroni-0 | None | | | unknown |
| patroni | patroni-1 | None | | | unknown |
| patroni | patroni-2 | None | | | unknown |
+---------+-----------+------+------+-------+-----------+
Which member do you want to reinitialize [patroni-2, patroni-0, patroni-1]? []: patroni-1
Are you sure you want to reinitialize members patroni-1? [y/N]: y
Traceback (most recent call last):
File "/usr/local/bin/patronictl", line 11, in <module>
sys.exit(ctl())
File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/usr/lib/python3/dist-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3/dist-packages/click/decorators.py", line 27, in new_func
return f(get_current_context().obj, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/patroni/ctl.py", line 530, in reinit
r = request_patroni(member, 'post', 'reinitialize', body, auth_header(obj))
File "/usr/local/lib/python3.5/dist-packages/patroni/ctl.py", line 141, in request_patroni
data=json.dumps(content) if content else None, timeout=60)
File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 612, in send
adapter = self.get_adapter(url=request.url)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 703, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'b''://b''/reinitialize'
No luck with them either.
Here is my adopted manifest from https://github.com/unguiculus/charts/tree/feature/patroni/incubator/patroni:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: patroni
labels:
app: patroni
release: patroni
application: patroni
cluster: patroni
spec:
serviceName: patroni
replicas: 3
template:
metadata:
labels:
app: patroni
release: patroni
application: patroni
cluster: patroni
spec:
serviceAccountName: patroni-serviceaccount
containers:
- name: spilo
image: registry.opensource.zalan.do/acid/spilo-10:latest
imagePullPolicy: Always
env:
- name: DEBUG
value: "true"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: DCS_ENABLE_KUBERNETES_API
value: "true"
- name: USE_ENDPOINTS
value: "true"
- name: PATRONI_KUBERNETES_USE_ENDPOINTS
value: "true"
- name: PATRONI_USE_KUBERNETES
value: "true"
- name: PATRONI_KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: PATRONI_CONFIGURATION
value: |
postgresql:
bin_dir: /usr/lib/postgresql/10/bin
kubernetes:
labels:
app: patroni
release: patroni
application: patroni
cluster: patroni
scope_label: cluster
- name: SCOPE
value: patroni
- name: PGPASSWORD_SUPERUSER
valueFrom:
secretKeyRef:
name: patroni
key: password-superuser
- name: PGPASSWORD_STANDBY
valueFrom:
secretKeyRef:
name: patroni
key: password-standby
- name: PGROOT
value: /home/postgres/pgdata
ports:
- containerPort: 8008
name: patroni
protocol: TCP
- containerPort: 5432
name: postgresql
protocol: TCP
volumeMounts:
- name: pg-vol
mountPath: /home/postgres/pgdata
- mountPath: /etc/patroni
name: patroni-config
readOnly: true
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchLabels:
app: patroni
release: patroni
volumes:
- name: patroni-config
secret:
secretName: patroni
- name: pg-vol
hostPath:
path: /pintapin/data/postgres
type: Directory
There was some issues with RBAC and I used following role to overcome them:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: patroni-role
namespace: default
rules:
- apiGroups:
- ""
resources:
- configmaps
- pods
- secrets
- namespaces
verbs:
- get
- list
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- list
- watch
And generated patroni config:
root@patroni-1:/home/postgres# cat postgres.yml
bootstrap:
dcs:
loop_wait: 10
maximum_lag_on_failover: 33554432
postgresql:
parameters:
archive_mode: 'on'
archive_timeout: 1800s
autovacuum_analyze_scale_factor: 0.02
autovacuum_max_workers: 5
autovacuum_vacuum_scale_factor: 0.05
checkpoint_completion_target: 0.9
hot_standby: 'on'
log_autovacuum_min_duration: 0
log_checkpoints: 'on'
log_connections: 'on'
log_disconnections: 'on'
log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
log_lock_waits: 'on'
log_min_duration_statement: 500
log_statement: ddl
log_temp_files: 0
max_connections: 266
max_replication_slots: 5
max_wal_senders: 5
tcp_keepalives_idle: 900
tcp_keepalives_interval: 100
track_functions: all
wal_keep_segments: 8
wal_level: hot_standby
wal_log_hints: 'on'
use_pg_rewind: true
use_slots: true
retry_timeout: 10
ttl: 30
initdb:
initdb:
- encoding: UTF8
- locale: en_US.UTF-8
- data-checksums
post_init: /post_init.sh "zalandos"
kubernetes:
labels:
app: patroni
application: patroni
cluster: patroni
release: patroni
pod_ip: 10.233.87.10
ports:
- name: postgresql
port: 5432
role_label: spilo-role
scope_label: cluster
use_endpoints: true
postgresql:
authentication:
replication:
password: '1234567890 '
username: standby
superuser:
password: '1234567890 '
username: postgres
bin_dir: /usr/lib/postgresql/10/bin
connect_address: 10.233.87.10:5432
data_dir: /home/postgres/pgdata/pgroot/data
listen: 0.0.0.0:5432
name: patroni-1
parameters:
archive_command: /bin/true
bg_mon.listen_address: 0.0.0.0
extwlist.extensions: btree_gin,btree_gist,hstore,intarray,ltree,pgcrypto,pgq,pg_trgm,postgres_fdw,uuid-ossp,hypopg
log_destination: csvlog
log_directory: ../pg_log
log_file_mode: '0644'
log_filename: postgresql-%u.log
log_rotation_age: 1d
log_truncate_on_rotation: 'on'
logging_collector: 'on'
shared_buffers: 1995MB
shared_preload_libraries: bg_mon,pg_stat_statements,pg_cron,set_user,pgextwlist
ssl: 'on'
ssl_cert_file: /home/postgres/server.crt
ssl_key_file: /home/postgres/server.key
pg_hba:
- local all all trust
- hostssl all +zalandos 127.0.0.1/32 pam
- host all all 127.0.0.1/32 md5
- hostssl all +zalandos ::1/128 pam
- host all all ::1/128 md5
- hostssl replication standby all md5
- hostnossl all all all reject
- hostssl all +zalandos all pam
- hostssl all all all md5
use_unix_socket: true
restapi:
connect_address: 10.233.87.10:8008
listen: 0.0.0.0:8008
scope: patroni
This happens on all pods. I’ve tried running patroni manually but it jumps straightly to failed issue and I don’t see timeout any more
root@patroni-1:/home/postgres# patroni postgres.yml
2018-01-24 07:26:16,041 INFO: Lock owner: None; I am patroni-1
2018-01-24 07:26:16,058 INFO: failed to acquire initialize lock
Tried mimic start up, but I can’t do it on kubernetes:
root@patroni-1:/home/postgres# /launch.sh
ERROR: Supervisord is already running
Issue is same on all pods. Also tried to dive into code but it was hard for me to follow the breadcrumbs. So what should I do now? Is there a misconfiguration?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 18 (6 by maintainers)
Commits related to this issue
- Don't swallow silently all errors from k8s API Output exception trace to the logs when http status code == 403, something is wrong with permissions. when http status code == 409 -- such error could ... — committed to zalando/patroni by deleted user 6 years ago
- Don't swallow silently all errors from k8s API (#611) Output exception trace to the logs when http status code == 403, something is wrong with permissions. When http status code == 409 -- such err... — committed to zalando/patroni by CyberDem0n 6 years ago
i have upgraded helm to use kubernetes DCS. its seems patronictl is not working anymore.
@k1-hedayati you mind opening a PR with that role and rolebinding to the k8s template, I must admit I’ve been struggling with this problem (failed reading pods, and silent endpoint updates) myself as well.