traefik: Do not attempt to renew certificate no longer used

Do you want to request a feature or report a bug?

Bug

What did you do?

I booted one service in one server, behind Traefik, configured with docker backend and correctly labeled.

Everything worked fine:

  • Traefik contacted LE
  • Traefik obtained cert
  • Traefik renewed cert after awhile, when needed
  • Traefik 🎸 ❤️

But, after some time, i moved the service to another equally-configured server.

What did you expect to see?

Traefik in old server stop renewing cert if no container is using it actively.

What did you see instead?

These logs:

INFO[2018-05-23T16:30:05Z] Renewing certificate from LE : {Main:example.com SANs:[]} 
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Trying renewal with -1447 hours remaining
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Obtaining bundled SAN certificate
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] AuthURL: https://acme-v01.api.letsencrypt.org/acme/authz/******
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Could not find solver for: dns-01
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Trying to solve HTTP-01
ERRO[2018-05-23T16:31:35Z] Error renewing certificate from LE: acme: Error 403 - urn:acme:error:unauthorized - Invalid response from http://example.com/.well-known/acme-challenge/****** [83.48.28.165]: 404
Error Detail:
        Validation for example.com:80
        Resolved to:
                *.*.*.*
        Used: *.*.*.*

Output of traefik version: (What version of Traefik are you using?)

Version:      v1.5.3
Codename:     cancoillotte
Go version:   go1.9.4
Built:        2018-02-27_02:47:04PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, …)?

Docker backend, CLI flags configuration… but I don’t think is relevant for this issue, where everything is working as expected.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 46
  • Comments: 20 (3 by maintainers)

Most upvoted comments

This is especially annoying, when the certificates are stored in KV store (consul in our case) which limits the size of the acme.json object. We spin up instances on demand and tear them down after couple of days. But the certificates stay in the file and eventually preventing new certificates from being created.

The only workaround for me here is to stop traefik, semi-manually remove the obsolete certificates, push the new file to the KV store and start traefik again.

For a non-existing URL it does not make sense to renew and can be removed. Once the URL would be used again traefik can request the certificate again.

Wow, that issue is still open after 2 years? With Traefik you can trigger the Let’s Encrypt rate limit pretty fast, just have 5 certs in acme.json that can’t be renewed, because the domain moved the DNS wo somewhere else I believe a non-existent domain does not trigger it (NXDOMAIN), but domains pointing to another http endpoint, will trigger the rate-limit. Removing from ACME Json will cause a traefik start, or at least kill -HUP, so causing downtime. Is this the “upsell reason” for buying traefik EE ?

I’m seeing this too, given the ephemerality of services that Traefik targets, it would make sense to remove not attempt to renew certificates that are not present on any services.

It would be even better if those certificates were removed, maybe after some time if a service is just momentarily offline.

Any news on this? I also ran into an issue where my traefik had requested a huge number of certs for non existing frontends.

Bash oneliner guy i am, i hacked some commands together to clean those old certs out of the acme.json. I use the api endpoint of the dashboard for this. I leave this here for anyone, but please think before you do something. I only use kubernetes backend.

How I removed unused certs

I take no warranty for your copy paste job! What worked for me, maybe fail on your environment

Note: This could be archived in many ways. I did not choose the shortest nor i played code golf. This is meant to be a little bit human readable at least.

I simply jump directly into my traefik container and did the folling:

  • Install dependencies
apk update && apk add jq curl
  • Fetch existing frontends
curl -s "https://<USERNAME>:<PASSWORD>@<TRAEFIK_DASHBOARD_URL>/api" | jq ".kubernetes.frontends" | jq "keys" | jq -r ".[]" | sed "s/\/.*//" | uniq > existing_frontends
  • Fetch existing certs
cat /acme/acme.json | jq ".Certificates" | jq ".[]" | jq ".Domain"  | jq -r ".Main" | sort | uniq > existing_certs
  • now get the certs that need to be removed
diff existing_certs existing_frontends | tail -n +4 | grep "^-" | sed "s/^-//" > certs_to_remove
  • and let jq remove them
cp /acme/acme.json /acme/acme.json.new
cat certs_to_remove | xargs -i sh -c "jq 'del(.Certificates[]| select(.Domain.Main == \"{}\"))' /acme/acme.json.new > /acme/acme.json.new2; mv /acme/acme.json.new2 /acme/acme.json.new"
  • verify the new file
cat /acme/acme.json.new | jq ".Certificates[].Domain.Main"
  • change permission
chmod 600 /acme/acme.json.new
  • backup and overwrite
cp /acme/acme.json /acme/acme.json.bak
echo "remove next # - think before you do something"
#cp /acme/acme.json.new /acme/acme.json

after all that work i deleted my traefik pod to cleanup everything and enjoyed a cup of coffee as reward.

I take no warranty for your copy paste job!

For what its worth I created a Makefile that is tested with Traefik 2.2.x This is mostly based on the comment of @MaxWinterstein

acmefile = acme.json
traefik_dashboard = <TRAEFIK_DASHBOARD_URL>
auth_user = <USERNAME>
auth_password = <PASSWORD>

.SILENT: clean
.PHONY: clean
clean:
    curl -s "https://$(auth_user):$(auth_password)@$(traefik_dashboard)/api/http/routers" | jq -r ".[]" | jq ".rule" | sed "s/\"Host(\`//g;s/\`)\"//g" | uniq > existing_frontends;
    cat $(acmefile) | jq ".default.Certificates[].domain.main" | sort | uniq | sed "s/\"//g" > existing_certs;
    awk 'NR==FNR{a[$$0];next}!($$0 in a)' existing_frontends existing_certs > certs_to_remove;
    cp $(acmefile) $(acmefile).new;
    cat certs_to_remove | xargs -I'{}' -i sh -c "jq 'del(.default.Certificates[]| select(.domain.main == \"{}\"))' $(acmefile).new > $(acmefile).new2; mv $(acmefile).new2 $(acmefile).new";
    chmod 600 $(acmefile).new;
    chown traefik:docker $(acmefile).new;
    mv $(acmefile) $(acmefile).bak;
    mv $(acmefile).new $(acmefile);
    rm certs_to_remove existing_certs existing_frontends;
    docker-compose restart;

Figured I might as well just create a Gist

Can definitely recommend the cert-manager and acme-dns way. Using this to get wildcart lets-encrypt certs refreshed works nice for me.

ditched traefik for ambassador a while ago and never looked back.

also confirm this. it would be nice to have an API call that can automatically delete the certs from the store (file/consul/other)

this are the logs from the traefik:v1.7.9 docker container:

{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"Renewing certificate from LE : {Main:My.Removed.Domain.TLD SANs:[]}\"\n","stream":"stdout","time":"2019-04-07T17:50:16.346915604Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Trying renewal with 688 hours remaining\"\n","stream":"stdout","time":"2019-04-07T17:50:16.347232814Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Obtaining bundled SAN certificate\"\n","stream":"stdout","time":"2019-04-07T17:50:16.34740912Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/AUTHZ_TOKEN_NOT_SHARED\"\n","stream":"stdout","time":"2019-04-07T17:50:16.659316163Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Could not find solver for: tls-alpn-01\"\n","stream":"stdout","time":"2019-04-07T17:50:16.659442867Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: use http-01 solver\"\n","stream":"stdout","time":"2019-04-07T17:50:16.659449967Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Trying to solve HTTP-01\"\n","stream":"stdout","time":"2019-04-07T17:50:16.659453967Z"}
{"log":"time=\"2019-04-07T17:50:21Z\" level=error msg=\"Error renewing certificate from LE: acme: Error -\u003e One or more domains had a problem:\\n[My.Removed.Domain.TLD] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://My.Removed.Domain.TLD/.well-known/acme-challenge/ACME_CHALLANGE_TOKEN_NOT_SHARED [1.3.3.7]: \\\"\u003chtml\u003e\\\\r\\\\n\u003chead\u003e\u003ctitle\u003e403 Forbidden\u003c/title\u003e\u003c/head\u003e\\\\r\\\\n\u003cbody bgcolor=\\\\\\\"white\\\\\\\"\u003e\\\\r\\\\n\u003ccenter\u003e\u003ch1\u003e403 Forbidden\u003c/h1\u003e\u003c/center\u003e\\\\r\\\\n\u003chr\u003e\u003ccenter\u003e\\\", url: \\n\"\n","stream":"stdout","time":"2019-04-07T17:50:21.45825789Z"}

This is a script wrote a while back in which I made manual changes in order to removed the old/bad/unused/migrated certs from the consul acme.json: Removed manually all reference for the targeted domains.

After this a manual push of the cert is required made. Ensure that you don’t corrupt your acme.json while editing (you’ll still have the original backups at FILE_ORIGINAL_BASE64)

You need to run this on one of your consul servers (script tested on ubuntu 16.04) (in some cases had to reboot the entire setup to make the changes visible, still didn’t found a stable/easy/logical way of making the update). Take the script from below as a guideline and for the first run I would recommend a step by step run so you’ll understand what happens. Also look after variables that start with CHANGE_ME and make the appropriate changes.

#!/bin/bash

# 
# IF YOU RUN THIS SCRIPT ENSURE THAT NO CERTIFICATE UPDATES ARE MADE DURING THIS PERIOD
# YOU RISK TO DELETE/OVERRIDE NEWLY GENERATED CERTIFICATES
#

BASE_CERT_PATH=traefik
CERT_PATH=${BASE_CERT_PATH}/acme/account/object

BASE_FILE_NAME=/root/consul_dump.`date +%Y-%m-%d_%H-%M-%S`
FILE_ORIGINAL_BASE64=${BASE_FILE_NAME}.1.base64
FILE_ORIGINAL_JSON=${BASE_FILE_NAME}.2.json
FILE_MODIFIED_JSON=${BASE_FILE_NAME}.3.json.modified
FILE_MODIFIED_BASE64=${BASE_FILE_NAME}.4.base64.modified

# get data out of consul - in base64 gzip enconding

# use this for HTTTPS authentication
CONSUL_AUTH_PARAMS="-ca-path=/consul/tls/ca.pem -client-cert=/consul/tls/consul.pem -client-key=/consul/tls/consul-key.pem -http-addr=https://127.0.0.1:8443 -tls-server-name=CHANGE_ME_CONSUL-HA-HOSTNAME.TLD"
echo `docker exec CHANGE_ME_CONSUL-DOCKER-CONTAINER-NAME consul kv get $CONSUL_AUTH_PARAMS -base64 $CERT_PATH` > $FILE_ORIGINAL_BASE64
cat $FILE_ORIGINAL_BASE64 | base64 --decode | gzip -dc | jq . > $FILE_ORIGINAL_JSON
cp $FILE_ORIGINAL_JSON $FILE_MODIFIED_JSON

# make manual changes
vim $FILE_MODIFIED_JSON

# convert back to base64 gzip and store for push upstream
cat $FILE_MODIFIED_JSON | gzip -c | base64 -w 0 > $FILE_MODIFIED_BASE64 

# put data back to consul store
echo 'THIS IS DONE'
echo 'MANUALLY RUN: '

echo "cat $FILE_MODIFIED_BASE64 | /var/lib/docker/overlay2/CHANGE_ME_CONSUL-LAYER-THAT-CONTAINS-THE-CONSUL-BINARY/diff/bin/consul kv put $CONSUL_AUTH_PARAMS -base64 $CERT_PATH -"

I filled bug https://github.com/traefik/traefik/issues/9162. It was closed as a duplicate of this one. While the issue is similar, I do not feel it’s a duplicate, as this issue relates to a whole certificate not used anymore, mine was referencing a SAN in a still used certificate that was removed.

@MaxWinterstein cert-manager is def the way to go, even if it is paired with Traefik, better than Traefik’s built-in resolver. Ambassador is on my list, I checked Contour and Gloo before because people compared them all the time to Ambassador and yeah. I got Contour running, nice docs but it seems not be very feature-rich but somewhat a community. Gloo looks best on paper but I couldn’t get it to run, so yeah… you’re happy with Ambassador? Any drawbacks?