cilium: Ingress: incompatible with cert-manager ACME HTTP-01

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

With cilium 1.14.3, set the default ingress class to shared. Create a basic ClusterIssuer for the Let’s Encrypt ACME HTTP01 challenge. Create an Ingress with the appropriate annotation and spec.tls setup properly. Cilium Install

export CILIUM_VERSION="v1.14.3"
helm upgrade --install \
    cilium \
    cilium/cilium \
    --version ${CILIUM_VERSION} \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set kubeProxyReplacement=true \
    --set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set bpf.autoMount.enabled=true \
    --set cgroup.autoMount.enabled=false \
    --set cgroup.hostRoot=/sys/fs/cgroup \
    --set k8sServiceHost=localhost \
    --set k8sServicePort=7445 \
    --set bpf.masquerade=true \
    --set bandwidthManager.enabled=true \
    --set bandwidthManager.bbr=true \
    --set prometheus.enabled=true \
    --set operator.prometheus.enabled=true \
    --set hubble.relay.enabled=true \
    --set hubble.ui.enabled=true \
    --set hubble.enabled=true \
    --set hubble.metrics.enableOpenMetrics=true \
    --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" \
    --set ingressController.enabled=true \
    --set ingressController.default=true \
    --set ingressController.loadbalancerMode=shared \
    --set ingressController.service.allocateLoadBalancerNodePorts="true" \
    --set ingressController.service.loadBalancerIP="10.0.0.153" \
    --set l2announcements.enabled=true \
    --set l2podAnnouncements.enabled=true \
    --set l2podAnnouncements.interface=eth0 \
    --set k8sClientRateLimit.qps=${QPS} \
    --set k8sClientRateLimit.burst=${BURST} \
    --set envoy.enabled=true \
    --set envoy.prometheus.enabled=true \
    --set routingMode=native \
    --set ipv4NativeRoutingCIDR=10.0.0.0/8 \
    --set policyEnforcementMode=default \
    --set debug.enabled=false \
    --set autoDirectNodeRoutes=true \
    --set hostFirewall.enabled=true \
    --set loadBalancer.algorithm=maglev \
    --set maglev.tableSize=65521 \
    --set maglev.hashSeed=${SEED} \
    --set loadBalancer.mode=dsr \
    --set loadBalancer.dsrDispatch=opt \
    --values <(cat <<EOF
ingressController:
  service:
    labels:
      cilium.loadbalancer.ips.service/name: ingress-gateway-pool
EOF
)

Cert-manager install

export CERT_MANAGER_VERSION="v1.13.1"
helm upgrade --install cert-manager jetstack/cert-manager \
    --namespace cert-manager \
    --create-namespace \
    --set installCRDs=true \
    --version $CERT_MANAGER_VERSION  \
    --set "extraArgs={--feature-gates=ExperimentalGatewayAPISupport=true,--feature-gates=AdditionalCertificateOutputFormats=true}" \
    --set webhook.extraArgs={--feature-gates="AdditionalCertificateOutputFormats=true"}

ZeroSSL Cluster Issuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: cilium-zerossl
spec:
  acme:
    # ZeroSSL ACME server
    server: https://acme.zerossl.com/v2/DV90
    email: ${CLUSTER_ISSUER_EMAIL}

    privateKeySecretRef:
      name: zerossl-private-key

    externalAccountBinding:
      keyID: ${ZERO_SSL_EAB_KEY_ID}
      keySecretRef:
        name: zerossl-eab-secret
        key: secret

    # ACME HTTP01 Ingress solver
    solvers:
    - http01:
        ingress:
          ingressClassName: cilium

LB Pool and Announcement Policy (No BGP)

apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: ingress-gateway-pool
spec:
  disabled: false
  cidrs:
    - cidr: 10.0.0.152/30
  serviceSelector:
    matchLabels:
      cilium.loadbalancer.ips.service/name: ingress-gateway-pool
---
apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
  name: ingress-gateway-pool-announcement-policy
spec:
  loadBalancerIPs: true
  interfaces:
  - eth0
  serviceSelector:
    matchLabels:
      cilium.loadbalancer.ips.service/name: ingress-gateway-pool

Ingress Service

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ${INGRESS_NAME}
  namespace: ${NAMESPACE}
  annotations:
    cert-manager.io/cluster-issuer: cilium-zerossl
spec:
  ingressClassName: cilium
  rules:
  - host: ${SERVICE_HOST}
    http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: ${SERVICE_NAME}
              port:
                number: 3001
  tls:
  - hosts:
    - ${SERVICE_HOST}
    secretName: cilium-zerossl-${INGRESS_NAME}-tls

Observe as cert manager spins in a loop and creates hundreds of ingresses.

image

Cilium Version

cilium-cli: v0.15.11 compiled with go1.21.3 on linux/amd64 cilium image (default): v1.14.2 cilium image (stable): v1.14.3 cilium image (running): 1.14.3

Kernel Version

Linux: 6.1.58

Kubernetes Version

Client Version: v1.28.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.28.3

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 1
  • Comments: 30 (10 by maintainers)

Most upvoted comments

Updating to 1.15.0-pre.3 seems to have resolved this for me.

I think the problem that @vincedihub is also that ImplementationSpecific paths are not sorted correctly - if there is a Prefix match for / and an ImplementationSpecific match for any path (including /.wellknown/...), then the Prefix path match will take precedence.

So, that’s a separate problem that also breaks cert-manager Ingresses - but I think it’s distinct to the other problem, which is almost certainly related to ownerReferences not being handled properly.

A fix has been merged for the ownerReference problem (in #28452), but that’s only available in 1.15 prereleases at the moment. I’ll check in about marking that for backport.

That means that if we fix the ImplementationSpecific issue, then we should be good here, I think? I’m having a look at that at the moment.

Is there any way to correct the problem between pathType: ImplementationSpecific and pathType: Prefix in the same ingress?

Cilium does not correctly manage two types of pathType in the same ingress. pathType: ImplementationSpecific and pathType: Prefix.

We did a curl test on the well-know, and noticed the redirection. This redirection belongs to application’s pathType: Prefix. We found that pathType: Prefix took precedence over pathType: ImplementationSpecific which poses a problem.

@youngnick after a number of missteps for testing, I can confirm you’ve fixed the issue. Thank you!

Is there any way to correct the problem between pathType: ImplementationSpecific and pathType: Prefix in the same ingress?

I’ve been hitting this conflict between the two pathTypes as well on v1.15.0-pre.2, both in a single ingress and in separate ones (with a default TLS secret set). Finding this report helped me work around the issue. I have almost the same cilium config but without the L2 announcements, maglev and DSR.

Actually, no, sorry.

Does cert-manager set the ingressClassName as directed in the config correctly? All of those ingress classes that get created, do they have an ingressClass name set? And do they have an ownerReference? If they don’t have an ownerreference, then this will hit the same issue as #22340 if something strips the owner reference from the Ingress, then cert-manager has no way to know that it owns the Ingress, and so will create a new one, which will have the same problem, and so on. However, it looks like #28452 will fix that exact problem. That fix did not make 1.14.3, so could you test with main?

We have update the ingress with this annotation : acme.cert-manager.io/http01-edit-in-place: "true" (https://cert-manager.io/docs/usage/ingress/#supported-annotations)

Now, we don’t have many well-know like before.
Cert-manager add a well-know in the original Ingress request

....
     http:
        paths:
        - backend:
            service:
              name: cm-acme-http-solver-69fj5
              port:
                number: 8089
          path: /.well-known/acme-challenge/j_fnZrscaPRqlNTcLh3KmaaE1JPK_tekuM1ZtrLgav0
          pathType: ImplementationSpecific
.....

There’s a small change in the .well-known url, which goes from HTTP to HTTPS. This is not a problem, as cert-manager has anticipated this by generating a temporary certificate (https://cert-manager.io/docs/usage/certificate/#temporary-certificates-whilst-issuing)

But we have discovered another problem: Cilium does not correctly manage two types of pathType in the same ingress. pathType: ImplementationSpecific and pathType: Prefix.

We did a curl test on the well-know, and noticed the redirection. This redirection belongs to application’s pathType: Prefix. We found that pathType: Prefix took precedence over pathType: ImplementationSpecificwhich poses a problem.