cert-manager: Regression: HTTP-01 challenges fail with Istio, Traefik, ingress-gce and Azure AGIC (kubernetes.io/ingress.class not set anymore in 1.5 and 1.6)

Short description: if you use Traefik, Istio, ingress-gce, Azure AGIC or NGINX Inc’s NGINX Ingress, upgrading from cert-manager 1.4 or below to cert-manager 1.5 or 1.6 will silently break the HTTP-01 solver.

  • Which versions are affected? You will be affected if you are using Kubernetes 1.19 and above in conjunction with cert-manager 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.6.0, 1.6.1, 1.6.2, and 1.6.3 and that you use an Issuer or ClusterIssuer with an HTTP-01 solver configured (spec.acme.solvers[].http01).
  • Why? Since cert-manager 1.5 (https://github.com/jetstack/cert-manager/pull/4225), the solver does not set the annotation kubernetes.io/ingress.class anymore; instead, it sets the field ingressClassName, introduced with Kubernetes 1.18 (both in Ingress v1 and v1beta1) which requires the presence of an IngressClass (depending on the ingress controller, see below table).
  • A workaround is to create an IngressClass resource, but that comes as a surprise to the user. This breakage is “invisible” to the user, making it hard to debug. But the workaround does not work for Azure’ AGIC since it was using annotations with a slash, which aren’t compatible with ingressClassName (https://github.com/jetstack/cert-manager/issues/4547).

Who is affected?

Anyone using cert-manager 1.5 or 1.6 as well as one of the ingress controllers affected:

Ingress controller Is affected?
Istio yes
Traefik yes
Contour no
Gloo Edge no
ingress-nginx no
NGINX Ingress (NGINX Inc) yes but mitigated by the fact the Helm charts creates an IngressClass nginx for you
Azure AGIC yes
ingress-gce yes

Notes:

  • Istio and Traefik did not change the behaviour of the annotation ever, which means an IngressClass is never mandatory when using the annotation. On the other side, the ingressClassName requires you to have an IngressClass present on the cluster. This is why upgrading to 1.5 or 1.6 will break people’s HTTP-01 ACME solver.
  • Contour uses the “default controller” behaviour. The “default controller” is a controller which picks up Ingress resources that do not have the annotation kubernetes.io/ingress.class. With 1.18, Contour added --ingress-class-name (disabled by default) which allows to set a different string to be matched with the annotation or ingressClassName on the Ingress. With this flag, the controller changes from “default controller” to “only matching class” behaviour and ignores any ingress that does not have the annotation or the ingressClassName. Contour does not mind whether the IngressClass resource exists, and only matches ingresses with the same ingressClassName or annotation.
  • Gloo uses the “default controller” behaviour. With 1.18, Gloo added REQUIRE_INGRESS_CLASS=true which turns on the “only matching class” behaviour. Gloo does not mind whether the IngressClass resource exists.
  • ingress-nginx uses the “default controller” behaviour. ingress-nginx changed the behaviour of the annotation in their v1.0.0: before v1.0.0, the annotation did not require an IngressClass, but the ingressClassName field did. With v1.0.0, both the annotation and ingressClassName require an IngressClass to exist. The present issue would not appear anyways since the ingress-nginx Helm chart creates an IngressClass for you by default.
  • F5’s NGINX Ingress are the worst compatibility-wise: they broke people between two minor versions of their controller. They changed the behaviour of the annotation when Kubernetes 1.18 is detected. When 1.17 and below is detected (ingressClassName did not exist yet), the annotation does not require an IngressClass to exist. When 1.18 and above is detected, both the annotation and the ingressClassName require an IngressClass to exist. The present issue is mitigated by the fact the F5 NGINX Ingress Helm chart creates an IngressClass for you by default.
  • Azure AGIC did not change the behaviour associated with the annotation. It does not support the ingressClassName field. Another issue with AGIC is that its default class name is azure/application-gateway, which is fine for the annotation but not fine for ingressClassName (#4547).
  • ingress-gce does not support the ingressClassName field.

Example with Istio:

This issue stems from ~a thread~ (the Slack was discontinued, the link is dead) on the OpenFaaS Slack. A special thanks to @alexellis who was very helpful in finding out this issue.

In 1.5, the HTTP-01 solver behaviour changed with regard to the ingress.class annotation. Let us use the following ACME issuer:

apiVersion: cert-manager.io/v1
kind: Issuer
spec:
  acme:
    solvers:
    - http01:
        ingress:
          class: istio

Before 1.5, cert-manager was creating a temporary Ingress resource that had the annotation kubernetes.io/ingress.class:

apiVersion: networking.k8s.io/v1beta # ✨
kind: Ingress
metadata:
  name: cm-acme-http-solver-qq8hf
  annotations:
    kubernetes.io/ingress.class: istio
spec:
  rules:
  - host: test.dev.mydomain.net
    http:
      paths:
      - path: /.well-known/acme-challenge/5rVG6s5v4WJ_3WifvTzZtstttojFPRb76IwFMOHez9s
        backend:
          serviceName: cm-acme-http-solver-q4s47
          servicePort: 8089

After 1.5, with the support of Ingress v1, the temporary Ingress “lost” its annotation:

apiVersion: networking.k8s.io/v1 # ✨
kind: Ingress
metadata:
  name: cm-acme-http-solver-qq8hf
spec:
  ingressClassName: istio
  rules:
  - host: test.dev.mydomain.net
    http:
      paths:
      - path: /.well-known/acme-challenge/5rVG6s5v4WJ_3WifvTzZtstttojFPRb76IwFMOHez9s
        pathType: ImplementationSpecific
        backend:
          service
            name: cm-acme-http-solver-q4s47
            port:
              number: 8089

As detailed in the Istio page Configuring ingress using an Ingress resource, Istio has two different ways of operating:

  • The standalone annotation mode: when the annotation kubernetes.io/ingress.class is present, it knows that Istio has to take care of this Ingress.
  • The IngressClass mode: when the field ingressClassName is used, Istio expects an IngressClass to be present with the name istio. If it does not exist, nothing happens.

With cert-manager 1.5, cert-manager switched from the standalone annotation mode to the IngressClass mode, breaking anyone using Istio’s Ingress controller (on clusters after Kubernetes 1.18 since that only happens when Ingress v1 is available).

I think we made a wrong assumption when we wrote the v1 internal conversion here. I think the annotation, although “deprecated”, doesn’t have the same behaviour as the ingressClassName field:

% kubectl explain ingress.spec.ingressClassName
KIND:     Ingress
VERSION:  networking.k8s.io/v1
FIELD:    ingressClassName <string>
DESCRIPTION:
     IngressClassName is the name of the IngressClass cluster resource. The
     associated IngressClass defines which controller will implement the
     resource. This replaces the deprecated `kubernetes.io/ingress.class`
     annotation. For backwards compatibility, when that annotation is set, it
     must be given precedence over this field. The controller may emit a warning
     if the field and annotation have different values. Implementations of this
     API should ignore Ingresses without a class specified. An IngressClass
     resource may be marked as default, which can be used to set a default value
     for this field. For more information, refer to the IngressClass
     documentation.

I think that the two modes (using “deprecated” standalone annotation and using the IngressClass) serve different purposes, and cert-manager should not switch from the standalone annotation mode to the IngressClass mode.

Related:

Slack threads:

/kind bug /area acme/http01

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 26 (17 by maintainers)

Commits related to this issue

Most upvoted comments

That’s my current thinking, yep… on the premise that:

  • It is valid for an ingress controller to only pay attention to the annotation (e.g. all ingress controllers that were created before the field existed)
  • It isn’t valid for an ingress controller to ignore the annotation (as it must always take precedence over the field)

Therefore for maximum compatibility, it’d appear setting the annotation is the more broadly supported option.

I do also think some of the wording in the Kubernetes API types should be revised however - the annotation is denoted as deprecated, but there’s not actual any defined deprecation path since the annotation behaviour does exist in the v1 API (and I think it’s all been a little bit mixed up with the deprecation of the v1beta1 schema).

Plan:

  • ~[ ] istio.io: Mention that an IngressClass is needed when using Kubernetes 1.19 and above (since v1 Ingress objects are created by cert-manager) (https://istio.io/latest/docs/ops/integrations/certmanager/)~
  • ~[ ] cert-manager: Document on the website what happens when a user uses ingress.class on a v1 Ingress with the cert-manager.io/issuer annotation~
  • ~[ ] cert-manager: Document the API class field behavior (Kubernetes 1.18 and before versus and 1.19 and above)~
  • ~[ ] cert-manager: open a PR for adding a warning on the ClusterIssuer and Issuer when the ingressClassName does not match any IngressClass, with a “happy” message when the issuer is found after being not found (to avoid confusion when the IngressClass is created after the Issuer resource).~

Update 15 January 2022: we have determined that this is a regression (which we classify as critical bug on the Supported releases page). The plan is to revert to the pre-1.5 behavior and backport the fix to cert-manager 1.5 and cert-manager 1.6.

Hm, taking a look at the v1.5 release, it appears that this doesn’t apply to v1.5.0-v1.5.3, but v1.5.4 is where we actually changed this behaviour: https://github.com/jetstack/cert-manager/blob/v1.5.4/pkg/issuer/acme/http/ingress.go#L154

This is the PR that backported it: https://github.com/jetstack/cert-manager/pull/4472

This is quite surprising given it’s quite a significant change to be backported - we’ve also broken our patch release guarantee too (some users on older v1.5 versions may upgrade to v1.5.4 to receive security/critical fixes, which will unexpectedly break renewal) 😬

I think we should revert this cherry-pick and release a new v1.5 release, and advise users to not upgrade to v1.5.4 if they make use of the ACME issuer.

I just noticed the annotation is present in Ingress v1beta1 but not in Ingress v1. I don’t understand why it was removed. I thought “deprecated” meant “to be removed later”, so the annotation should still be present on the Ingress v1.

I will write up the ingress implementations and how they deal with the annotation/field at different version

Just looked at two of the most popular implementations to see if they have a version that supports kube 1.19, but does not look at the ingressClassName field. I realize that it may not be feasible to do it for all

  • ingress-nginx has supported both the annotation and the ingressClass field since 0.31.0 that came out in April 2020 so hopefully there wouldn’t be many users on older versions- I didn’t find clear info whether pre-0.31.0 can be used with Kubernetes below 1.19 . The field takes precedence (see https://github.com/kubernetes/ingress-nginx/issues/8103 - I am not sure whether that means that they might stop supporting the annotation)
  • Istio supports both the field and the annotation, the oldest Istio relase that supports Kubernetes 1.19 is Istio v1.9 and that supports ingressClassName field

I do wonder whether any of the controller implementation may stop looking at the annotation, see i.e https://github.com/kubernetes/ingress-nginx/issues/8103

There’s some good ideas about how we can roll forward to use the new field, but I’m conscious we’ve effectively broken users and we should really roll back before attempting to go forward again…

Also just digging into https://github.com/jetstack/cert-manager/blob/7f9fceef907d9ba15e700cd8836000f78548e907/internal/ingress/convert.go - I didn’t realise we’re doing this, but we need to also be certain we keep these conversions up to date as new fields are added to Ingress resources when we upgrade our k8s dependencies, else we’ll potentially even remove fields from user ingresses (if they are using edit-in-place).

You can actually already see an example of this that the generator picked up on itself here I think: https://github.com/jetstack/cert-manager/blob/7f9fceef907d9ba15e700cd8836000f78548e907/internal/ingress/convert.go#L386-L387