traefik: Datastore sync error: Object lock value: expected b8e35510

Do you want to request a feature or report a bug?

Bug

What did you do?

Launched Traefik as cluster form this manual.

What did you expect to see?

No errors

What did you see instead?

traefik output:

infra_proxy.0.eloqzxy578jr@swarm-manager-0    | time="2018-06-13T13:38:32Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 3.353099079s"
infra_proxy.0.eloqzxy578jr@swarm-manager-0    | time="2018-06-13T13:38:35Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 7.734764651s"
infra_proxy.0.eloqzxy578jr@swarm-manager-0    | time="2018-06-13T13:38:43Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 18.253019683s"
infra_proxy.0.eloqzxy578jr@swarm-manager-0    | time="2018-06-13T13:39:01Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 20.274627269s"
infra_proxy.0.eloqzxy578jr@swarm-manager-0    | time="2018-06-13T13:39:22Z" level=error msg="Datastore sync error: Object lock value: expected b8e35510-1397-4396-9a94-a8ed41560195, got , retrying in 269.349437ms"
infra_proxy.0.eloqzxy578jr@swarm-manager-0    | legolog: 2018/06/13 13:39:30 [INFO] acme: Registering account for aaaa@aaaa.io
infra_proxy.0.fv9ni12zaup4@swarm-manager-1    | 10.255.0.4 - - [13/Jun/2018:13:42:12 +0000] "GET / HTTP/1.0" 404 19 "-" "-" 1 "backend not found" "/" 0ms
infra_proxy.0.eloqzxy578jr@swarm-manager-0    | 10.255.0.4 - - [13/Jun/2018:13:42:21 +0000] "GET / HTTP/1.0" 404 19 "-" "-" 1 "backend not found" "/" 0ms
infra_proxy.0.eloqzxy578jr@swarm-manager-0    | 10.255.0.3 - - [13/Jun/2018:13:42:58 +0000] "GET / HTTP/1.0" 404 19 "-" "-" 2 "backend not found" "/" 0ms
infra_proxy.0.fv9ni12zaup4@swarm-manager-1    | time="2018-06-13T13:51:24Z" level=error msg="Leadership election error failed to read lock: Get http://consul:8500/v1/kv/traefik/leader?index=5848&wait=15000ms: dial tcp 10.0.0.8:8500: connect: no route to host, retrying in 42.481718789s"
infra_proxy.0.fv9ni12zaup4@swarm-manager-1    | time="2018-06-13T13:51:24Z" level=error msg="KV connection error: watchtree channel closed, retrying in 552.821584ms"


traefik_init:

infra_proxy_init.1.jkwmgpocxokv@swarm-manager-1    | 2018/06/13 13:38:13 Storing configuration: <big json here>

Output of traefik version:

Version:      v1.6.3
Codename:     tetedemoine
Go version:   go1.10.2
Built:        2018-06-05_03:29:01PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, …)?

    consul:
      hostname: consul
      image: consul
      command: agent -server -bootstrap-expect=1
      environment:
        - CONSUL_LOCAL_CONFIG={"datacenter":"ams3","server":true,"enable_debug":true}
        - CONSUL_BIND_INTERFACE=eth0
        - CONSUL_CLIENT_INTERFACE=eth0
      deploy:
        labels:
          - "traefik.enable=false"
        replicas: 1
        placement:
          constraints:
            - node.role == manager
        restart_policy:
         condition: on-failure
      networks:
        - traefik
      volumes:
        - consul-data:/consul/data

    proxy_init:
      image: traefik:1.6.3-alpine
      command:
         - "traefik"
         - "storeconfig"
         - "--api"
         - "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
         - "--entrypoints=Name:https Address::443 TLS"
         - "--defaultentrypoints=http,https"
         - "--acme"
         - "--acme.storage='traefik/acme/account'"
         - "--acme.entryPoint=https"
         - "--acme.httpChallenge.entryPoint=http"
         - "--acme.onHostRule=false"
         - "--acme.acmelogging=true"
         - "--acme.onDemand=false"
         - "--acme.email=roman@whatever.io"
         - "--docker"
         - "--docker.swarmMode"
         - "--docker.domain=swarm.whatever.io"
         - "--docker.watch"
         - "--consul"
         - "--consul.endpoint=consul:8500"
         - "--consul.prefix=traefik"
         - "--accesslogsfile=/dev/stdout"
      networks:
         - traefik
      deploy:
         placement:
            constraints:
              - node.role == manager
         restart_policy:
            condition: on-failure
      depends_on:
        - consul

  proxy:
      image: traefik:1.6.3-alpine
      depends_on:
        - proxy_init
        - consul
      command:
        - "traefik"
        - "--consul"
        - "--consul.watch"
        - "--consul.endpoint=consul:8500"
        - "--consul.prefix=traefik"
      volumes:
        - /var/run/docker.sock:/var/run/docker.sock
      networks:
        - traefik
      ports:
        - 80:80
        - 443:443
        - 8080:8080
      volumes:
        - "/var/run/docker.sock:/var/run/docker.sock"
      deploy:
        mode: global
        restart_policy:
          condition: on-failure
        placement:
          constraints:
            - node.role == manager
        update_config:
          parallelism: 1
          delay: 10s

networks:
  traefik:
      driver: overlay

volumes:
  consul-data:

If applicable, please paste the log output in DEBUG level (--logLevel=DEBUG switch)

consul container

infra_consul.1.rxgmyuldfu6p@swarm-manager-1    |     2018/06/13 13:54:57 [ERR] http: Request PUT /v1/session/create?wait=30000ms, error: Missing node registration from=10.0.0.10:34330
infra_consul.1.rxgmyuldfu6p@swarm-manager-1    |     2018/06/13 13:55:01 [ERR] http: Request PUT /v1/session/create?wait=30000ms, error: Missing node registration from=10.0.0.11:46408
infra_consul.1.rxgmyuldfu6p@swarm-manager-1    |     2018/06/13 13:55:01 [WARN] consul.fsm: EnsureRegistration failed: failed inserting node: node ID "4400be7d-ecd1-c15c-594c-b8d763c65750" for node "consul" aliases existing node "ca84546beaf2"
infra_consul.1.rxgmyuldfu6p@swarm-manager-1    |     2018/06/13 13:55:01 [WARN] agent: Syncing node info failed. failed inserting node: node ID "4400be7d-ecd1-c15c-594c-b8d763c65750" for node "consul" aliases existing node "ca84546beaf2"
infra_consul.1.rxgmyuldfu6p@swarm-manager-1    |     2018/06/13 13:55:01 [ERR] agent: failed to sync remote state: failed inserting node: node ID "4400be7d-ecd1-c15c-594c-b8d763c65750" for node "consul" aliases existing node "ca84546beaf2"

And if I’d launch traefik 1.5 as on that manual without changing absolutely anything except ACME email - it’s similar output. Sorry without debug because it’s really wordy.

$ docker --tls service logs -f infra_traefik
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:44Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 532.066212ms" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:45Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 458.005135ms" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:45Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 878.925167ms" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:44Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 467.819053ms" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:45Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 520.709512ms" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:45Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 761.341364ms" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:44Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 465.226492ms" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:44Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 463.286566ms" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:45Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 1.056679773s" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:53Z" level=error msg="Load config error: Unexpected response code: 500, retrying in 1.251030901s" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:53Z" level=error msg="Load config error: Unexpected response code: 500, retrying in 1.135779476s" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:53Z" level=error msg="Load config error: Unexpected response code: 500, retrying in 1.688059111s" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:56Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing node registration), retrying in 580.210284ms" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:56Z" level=error msg="Cannot unmarshall private key []" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:56Z" level=error msg="Error building ACME client &{Email: Registration:<nil> PrivateKey:[] DomainsCertificate:{Certs:[] lock:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} ChallengeCerts:map[] HTTPChallenge:map[]}: private key was nil" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:57Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing node registration), retrying in 594.274959ms" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:57Z" level=error msg="Cannot unmarshall private key []" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:57Z" level=error msg="Error building ACME client &{Email: Registration:<nil> PrivateKey:[] DomainsCertificate:{Certs:[] lock:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} ChallengeCerts:map[] HTTPChallenge:map[]}: private key was nil" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:57Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing node registration), retrying in 742.791882ms" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:57Z" level=error msg="Cannot unmarshall private key []" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:57Z" level=error msg="Error building ACME client &{Email: Registration:<nil> PrivateKey:[] DomainsCertificate:{Certs:[] lock:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} ChallengeCerts:map[] HTTPChallenge:map[]}: private key was nil" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:57Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 840.089349ms" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:57Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 928.245698ms" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:58Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 917.49328ms" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:58Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 1.567695523s" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:58Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 1.013373382s" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:59Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 717.059863ms" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:59Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 1.489577248s" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:15:59Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 1.448180821s" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:15:59Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 2.23620903s" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:16:01Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 1.310668277s" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:16:01Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 3.450216199s" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:16:02Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 3.534395919s" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:16:02Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 2.211433481s" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:16:04Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 4.280841828s" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:16:04Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 5.420684251s" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:16:05Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 4.230201542s" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:16:08Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 7.971177426s" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:16:09Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 4.362036267s" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:16:10Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 8.672095649s" 
infra_traefik.0.71cmslvo7ul8@swarm-manager-1    | time="2018-06-13T14:16:14Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 12.08051947s" 
infra_traefik.0.5wlrzsjcy1j0@swarm-manager-0    | time="2018-06-13T14:16:16Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 6.720661168s" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:16:18Z" level=error msg="Leadership election error Unexpected response code: 500 (Missing check 'serfHealth' registration), retrying in 12.498967374s" 
^C⏎                                                         
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:44Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 465.226492ms" 
infra_traefik.0.3q211ndstidz@swarm-master    | time="2018-06-13T14:15:44Z" level=error msg="Load config error: Get http://consul:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp: lookup consul on 127.0.0.11:53: no such host, retrying in 463.286566ms" 
                                                                                                                                                               

Related to #3372

I’ve asked about this issue in Slack channel, and the only answer I’ve got, I have a problem with swarm setup. I have a regular swarm setup on digital ocean. There’s eth0 and eth1 in there, one is for outer world, another one for inner. I have no idea what’s happening. That Consul agent/server gives tons of errors when starting, first agent complains that it wasn’t registered, after sometime it does register a node because server goes up in like 15sec. Meanwhile traefik bombs consul and getting error that it can’t be registered. Then that datastore store sync error pops up. So many things happening in here that I don’t even know what exact issue I’m having in the first place, and are they even connected in between. This setup doesn’t work. If someone could replicate traefik cluster setup and send docker-compose file we can compare output.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 16
  • Comments: 53 (6 by maintainers)

Most upvoted comments

Half a year passed, still no way to launch clustered Traefik. Ended up using CloudFlare as ssl termination. Traefik still does great job as a reverse proxy.

I ran into the same issue on a HA config in kubernetes. Acme was not working (same error “Datastore sync error: object lock value: expected … got …”). It started working right after I switched to a single Traefik replica with acme.storage pointing to a flatfile in a volume.

Still issue in v1.7.24. I am stuck with certificate renewal one of my domain. Luckily for me, it`s “only” Traefik dashboard. What to do when it will be real production domain ? Why this issue does not have higher priority due to fact that Trafik 1.7 is still supported ?

I’m getting the same error. Running a docker-swarm cluster and a consul cluster (both with 4 manager nodes)

Datastore sync error: object lock value: expected 041c53fc-9c0f-4fbd-8e95-33884a577cbc, got aa4dd022-0dd2-4461-be1a-f87bd3fe2729, retrying in 576.592466ms

Looking into the consul key-value storage, the traefik/acme/account/lock element has 041c53fc-9c0f-4fbd-8e95-33884a577cbc… so I do not understand why Traefik is complaining.

Restarting Traefik, get the same error, just different uuids,

Datastore sync error: object lock value: expected 7df12a2f-a4cc-482b-9773-7374c5ca8b2a, got 041c53fc-9c0f-4fbd-8e95-33884a577cbc, retrying in 728.73302ms

And now the consul traefik/acme/account/lock key has been updated to 7df12a2f-a4cc-482b-9773-7374c5ca8b2a

Looks that traefik writes a lock ID and then expects the previous lock value

Our issue fixed itself…

It seems that when using Consul in combination with the Let’s Encrypt TLS cert verification method the lock occurs because the transactions on consul happen in too quick succession. When switching to DNS validation (which takes longer because the writes needed to the DNS provider) the issue is gone. Maybe this information helps someone to write a fix or patch the problem.

Maybe can we, in this particular case, show a more detailled message. I’ll talk about this possibility with the team.

Once again, many thanks for your feedbacks and I hope I answered your questions.

@nmengin are you saying this message is harmless? Or are you saying that it matters, but because the feature is experimental that we should expect problems? Is there a log level at which I can see successful writes to the key?

Hello @rosskevin, @holms , @patrick-motard , @jpsecher , @numkem , @dalsh , @schemen

Many thanks for your feedbacks.

The error Datastore sync error: is generated when Træfik tries to write into the ACME account key. Indeed, this key has a lock in the way to manage the concurrency and Træfik cannot write into this key until the lock does not contain the expected value.

That’s why you can see this log message many times when træfik is started (Træfik tries to write information in the ACME account key).

This behavior comes from the cluster feature which is still in experimental mode. When Træfik is linked to a KV store, the cluster mode is activated even if there is only one instance of Træfik.

With the team with have discussed about this message to know if we have to change the level, but this message is generated by the library we have used to be connected to he KV stores (valkeyrie) and it can happens in others places where there is a real problem that’s why we did not change it for the moment…

Maybe can we, in this particular case, show a more detailled message. I’ll talk about this possibility with the team.

Once again, many thanks for your feedbacks and I hope I answered your questions.

Our issue fixed itself…

It seems that when using Consul in combination with the Let’s Encrypt TLS cert verification method the lock occurs because the transactions on consul happen in too quick succession. When switching to DNS validation (which takes longer because the writes needed to the DNS provider) the issue is gone. Maybe this information helps someone to write a fix or patch the problem.

I did this once, ended up locking certs for 2 weeks of my domain. Tried to automate this with cloudflare and traefik, and it’s failed. This is risky because you only have 2 tries to validate or else letsecnrypt will lock you.

Have the same issue on a docker swarm cluster also. Force updating the traefik service solves the issue but it reappears after sometime.

This may be perhaps a P1 priority. As-is, on my scripted bootstrapped/reproducible environment, killing the traefik pod does not get me past it. At this moment, I can blow away everything - but that is not workable in production which is just a few weeks away for us.

related? #2581 (P2 open), #2546 (closed), #3372 (P1 closed)

I have tried the separation of services as well, sadly without luck.

I started investigating because I’m experiencing this issue as well and haven’t had the opportunity to upgrade to 2.X now that consul support is available. I hadn’t realized how hard it was to read Golang (I primarily develop in Python) until I started trying to read the source code. In case anyone else is interested in picking up the investigation it looks like this is the best point to look.

We also have this issue with consul`.

level=error msg=“Datastore sync error: object lock value: expected 9a7b70c4-5019-4cdc-96ec-fede5d1308a1, got e90c5c7d-ffe5-4162-8209-838e23373f86, retrying in 368.84116ms”

This makes deploying Traefik in a docker swarm, even with a single Traefik instance, and setting consul K/V store for the acme certs impossible.

I’m also experiencing this issue in docker swarm. I spent ages trying to understand how to set up consul and now this.

I have the same issue in pretty much the same configuration. ACME works for a certain time until it stops with only that error message.