cert-manager: Propagation check failed, wrong service used by cm-acme-http-solver

Describe the bug: Propagation check failed, wrong service used by cm-acme-http-solver and ACME cannot check validity of the domain and deliver certificate

Expected behaviour: Challenge success and certificate delivered

Steps to reproduce the bug: Following https://docs.cert-manager.io/en/latest/tutorials/acme/http-validation.html , but I have multiple subdomains at the same time.

Anything else we need to know?: I move from cert-manager 0.8 to 0.11 few weeks. All works fine, with new subdomains added. Since few days, new subdomains failed to be validated.

Environment details::

Kubernetes version : v1.13.10
Cloud-provider : Azure AKS
cert-manager version : v0.11.0
Install method : helm

/kind bug

kubectl describe challenge:

Name:         tls-secret-1495667673-716095195-908999738
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1alpha2
Kind:         Challenge
Metadata:
  Creation Timestamp:  2019-11-20T10:00:11Z
  Finalizers:
    finalizer.acme.cert-manager.io
  Generation:  1
  Owner References:
    API Version:           cert-manager.io/v1alpha2
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  tls-secret-1495667673-716095195
    UID:                   81e087b8-0b7c-11ea-95a0-7e8b3f31c3c5
  Resource Version:        10962589
  Self Link:               /apis/acme.cert-manager.io/v1alpha2/namespaces/default/challenges/tls-secret-1495667673-716095195-908999738
  UID:                     8a97cf77-0b7c-11ea-95a0-7e8b3f31c3c5
Spec:
  Authz URL:  https://acme-v02.api.letsencrypt.org/acme/authz-v3/1322577887
  Dns Name:   domain20.convertigo.net
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   Issuer
    Name:   letsencrypt-prod
  Key:      BZiefIaNveMd0bwXbjywYExT6wGHdETnJLs5D6iZOAY.zIJBhOqgURIGfuNbqfatmAXt5je_GyTDV34tQ02Xqmw
  Solver:
    Http 01:
      Ingress:
        Class:  nginx
  Token:        BZiefIaNveMd0bwXbjywYExT6wGHdETnJLs5D6iZOAY
  Type:         http-01
  URL:          https://acme-v02.api.letsencrypt.org/acme/chall-v3/1322577887/aUEJmg
  Wildcard:     false
Status:
  Presented:   true
  Processing:  true
  Reason:      Waiting for http-01 challenge propagation: wrong status code '503', expected '200'
  State:       pending
Events:        <none>

I have a cm-acme-http-solver-wmpps Ingress with:

  "spec": {
    "rules": [
      {
        "host": "domain20.convertigo.net",
        "http": {
          "paths": [
            {
              "path": "/.well-known/acme-challenge/BZiefIaNveMd0bwXbjywYExT6wGHdETnJLs5D6iZOAY",
              "backend": {
                "serviceName": "cm-acme-http-solver-9mddf",
                "servicePort": 8089
              }
            }
          ]
        }
      }
    ]
  },

It refers a Service cm-acme-http-solver-9mddf that don’t exist but I have a cm-acme-http-solver-6c7r2. Is this normal ?

Do you need another information or do you know a work around ?

Thx !

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 5
Comments: 29 (2 by maintainers)

Most upvoted comments

I delete the Ingress of the Challenge that point a wrong service and … a new valid Ingress was created ! The certificate is now good. I let the ticket open in case of new cases for few days.

nicolas-albert on Nov 22, 2019

@nicolas-albert we are experiencing the exact same issue. Ingress created by cert-manager points to an acme solver service that doesn’t exist in the namespace (another one exists with a different name)

This looks like a bug.

We tend to issue multiple requests like this (for many subdomains at a time), and most times they appear to succeed, but there’s always the odd one that gets stuck like this.

greywolve on Nov 30, 2019

@macevil I added a node so the pods could start, this fixed it for me!

benjick on May 24, 2020

How can I apply this patch now?

I am using the following command to deploy the cert-manager: helm install cert-manager
–namespace cert-manager
–version v0.12.0
jetstack/cert-manager

debanjanbasu on Jan 4, 2020

We are currently using the patch by temporarily editing the cert manager deployment to use https://hub.docker.com/r/oliverpowell84/cert-manager-controller/tags for the controller image @nicolas-albert

greywolve on Dec 19, 2019

Should be fixed by https://github.com/jetstack/cert-manager/pull/2460 Waiting for the v0.13 release to test it. Unless there is a simple way to use this patch now.

nicolas-albert on Dec 17, 2019

I have the same problem, cert-manager v0.10.1, Helm v2.14.2

It’s really easy to reproduce when you have lots of domains on your certificate. Let’s Encrypt allows up to 100 alternate names, and some of our certs have more than 50. It’s trivial to reproduce this problem with that many domains.

andrew-cormick-dockery on Dec 13, 2019

@munnerz @nicolas-albert @schemen we’ve had this bug pop up again on the weekend. I spent some time digging, and I now have a decent hypothesis on what’s happening. I created a new issue to describe the bug here: https://github.com/jetstack/cert-manager/issues/2442 .

greywolve on Dec 11, 2019

We upgraded to Helm v3 yesterday, and since then, we haven’t been able to reproduce this bug. I’m not sure if this was caused by us using a pre v3 Helm or not. @nicolas-albert did you install cert-manager with Helm, and if so, which version?

In summary. cert-manager v0.12.0 with Helm v3 appears to fix this issue, but we’ll keep you updated if we spot this happening again.

greywolve on Dec 6, 2019

Hard to come up with a reproducible example because it doesn’t seem to happen every time, only sometimes.

I’ll do my best to give you as much alternative info as possible. Going to see if I can provoke it again, and record all the logs etc.

greywolve on Dec 5, 2019

Heyo! I got the same issue on a K3S cluster with traefik as Ingress. Once I edited the Ingress object to point towards the correct service it resolved itself quickly. Also version 0.12

schemen on Dec 4, 2019

Bad news 😕 Are you using AKS too ?

nicolas-albert on Dec 3, 2019

0.12 seems out? https://github.com/jetstack/cert-manager/releases/tag/v0.12.0

We’ve just upgraded, so I’m holding thumbs that it will sort of this issue.

greywolve on Dec 2, 2019