microk8s: etcd problem + snap refresh failure

My apiserver all of sudden fell over after running pretty much flawlessly for about 2 months. It looks like etcd is the problem. Looks like something to do with the recent upgrade to 3.4?

It’s a multi-node cluster on RPi 4B (Ubuntu 19.10, fully up to date). Inspection report attached.

From the apiserver logs:

Feb 26 15:15:46 io microk8s.daemon-apiserver[9125]: I0226 15:15:46.842986    9125 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://127.0.0.1:12379 0  <nil>}]
Feb 26 15:15:47 io microk8s.daemon-apiserver[9125]: W0226 15:15:47.777158    9125 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:12379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: read tcp 127.0.0.1:40640->127.0.0.1:12379: read: connection reset by peer". Reconnecting...
Feb 26 15:15:47 io microk8s.daemon-apiserver[9125]: W0226 15:15:47.777207    9125 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:12379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: read tcp 127.0.0.1:40644->127.0.0.1:12379: read: connection reset by peer". Reconnecting...
Feb 26 15:15:50 io microk8s.daemon-apiserver[9125]: W0226 15:15:50.234521    9125 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:12379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: read tcp 127.0.0.1:40650->127.0.0.1:12379: read: connection reset by peer". Reconnecting...
Feb 26 15:15:50 io microk8s.daemon-apiserver[9125]: W0226 15:15:50.235216    9125 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:12379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: read tcp 127.0.0.1:40648->127.0.0.1:12379: read: connection reset by peer". Reconnecting...

and then the etcd logs:

Feb 26 15:16:04 io etcd[9798]: enabled capabilities for version 3.3
Feb 26 15:16:04 io etcd[9798]: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32 from store
Feb 26 15:16:04 io etcd[9798]: set the cluster version to 3.3 from store
Feb 26 15:16:04 io etcd[9798]: restore compact to 13963548
Feb 26 15:16:04 io etcd[9798]: simple token is not cryptographically signed
Feb 26 15:16:04 io etcd[9798]: starting server... [version: 3.3.4, cluster version: 3.3]
Feb 26 15:16:04 io etcd[9798]: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
Feb 26 15:16:04 io etcd[9798]: ClientTLS: cert = /var/snap/microk8s/1174/certs/server.crt, key = /var/snap/microk8s/1174/certs/server.key, ca = , trusted-ca = /var/snap/microk8s/1174/certs/ca.crt, client-cert-auth = true, crl-file =
Feb 26 15:16:05 io etcd[9798]: updated the cluster version from 3.3 to 3.4
Feb 26 15:16:05 io etcd[9798]: cluster cannot be downgraded (current version: 3.3.4 is lower than determined cluster version: 3.4).
Feb 26 15:16:05 io systemd[1]: snap.microk8s.daemon-etcd.service: Main process exited, code=exited, status=1/FAILURE
Feb 26 15:16:05 io systemd[1]: snap.microk8s.daemon-etcd.service: Failed with result 'exit-code'.

inspection-report-20200226_152214.tar.gz

EDIT: also looks like the snap change failed, which occured about the time that I noticed that the cluster was down:

+ sudo -E /snap/microk8s/1216/usr/bin/python3 /snap/microk8s/1216/scripts/cluster/distributed_op.py remove_argument kubectl --kubeconfig
Removing argument --kubeconfig from nodes.
Applying to node triton.
Traceback (most recent call last):
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 549, in urlopen
    conn = self._get_conn(timeout=pool_timeout)
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 251, in _get_conn
    return conn or self._new_conn()
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 764, in _new_conn
    raise SSLError("Can't connect to HTTPS URL because the SSL "
requests.packages.urllib3.exceptions.SSLError: Can't connect to HTTPS URL because the SSL module is not available.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/snap/microk8s/1216/scripts/cluster/distributed_op.py", line 161, in <module>
    remove_argument(service, args[2])
  File "/snap/microk8s/1216/scripts/cluster/distributed_op.py", line 104, in remove_argument
    do_op(remote_op)
  File "/snap/microk8s/1216/scripts/cluster/distributed_op.py", line 38, in do_op
    verify=False)
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/requests/api.py", line 107, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in request
    resp = self.send(prep, **send_kwargs)
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/requests/sessions.py", line 588, in send
    r = adapter.send(request, **kwargs)
  File "/snap/microk8s/1216/usr/lib/python3/dist-packages/requests/adapters.py", line 447, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: Can't connect to HTTPS URL because the SSL module is not available.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 21 (8 by maintainers)

Most upvoted comments

This is some interesting info you provided @timwebster9. The problem is on the configure hook and not on the enable command. Thank you. I do not have the bandwidth to work on this at the moment so there is no point in keeping this cluster. I would suggest you redeploy using the 1.17/stable channel.

ktsakalozos on Feb 28, 2020