envoy: Wildcard Let's Encrypt certs do not validate after DST Root CA expiration

Title: Wildcard Let’s Encrypt certs do not validate after DST Root CA expiration

Description: After one of the original Let’s Encrypt CAs expired, wildcard certs issued by LE no longer validate with envoy. I am not able to personally reproduce with non-wildcards.

I believe this is due to boringssl behavior, though I’d expect all LE certs have this problem in that case.

Repro steps:

To make a self contained reproduction, I’ve created an envoy configuration with an http listener (http://localhost:10000) that proxies to a local https listener (https://test-upstream.letsencrypt.localhost.pomerium.io:10443) that proxies the request to httpbin.org. The https listener is using a wildcard let’s encrypt cert.

  1. Download gist. The included certificates were generated for this repro and not used anywhere.
  2. Run with envoy -c config.yaml
  3. Verify system curl sees https://test-upstream.letsencrypt.localhost.pomerium.io:10443 as valid:
curl -I https://test-upstream.letsencrypt.localhost.pomerium.io:10443
HTTP/1.1 200 OK
date: Thu, 07 Oct 2021 20:56:19 GMT
content-type: text/html; charset=utf-8
content-length: 9593
server: envoy
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 21
  1. Curl through envoy
curl localhost:10000
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

You’ll note curl has no issue but envoy itself does.

Admin and Stats Output: n/a

Config:

#
# http://localhost:10000 -> [envoy] -> https://test-upstream.letsencrypt.localhost.pomerium.io:10443 -> [envoy] -> https://httpbin.org
#
# be sure to uncomment your operating system's cert store.

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 127.0.0.1
      port_value: 9901
static_resources:
  listeners:
    - name: http
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                access_log:
                - name: envoy.access_loggers.stdout
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                route_config:
                  name: test_upstream
                  virtual_hosts:
                    - name: test_upstream
                      domains: ["*"]
                      routes:
                        - match: 
                            prefix: "/"
                          route:
                            cluster: test_upstream
                            auto_host_rewrite: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

    - name: https-upstream
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10443
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: upstream_https
                access_log:
                - name: envoy.access_loggers.stdout
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                route_config:
                  name: httpbin.org
                  virtual_hosts:
                    - name: httpbin.org
                      domains: ["*"]
                      routes:
                        - match: 
                            prefix: "/"
                          route:
                            cluster: httpbin.org
                            auto_host_rewrite: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              common_tls_context:
                tls_certificates:
                  certificate_chain: 
                    filename: ./fullchain.pem
                  private_key:
                    filename: ./privkey.pem

  clusters:
    - name: test_upstream
      connect_timeout: 1s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            validation_context:
              trusted_ca:
                filename: /etc/ssl/cert.pem # osx
                # filename: /etc/ssl/certs/ca-certificates.crt # linux
      load_assignment:
        cluster_name: test_upstream
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: test-upstream.letsencrypt.localhost.pomerium.io
                      port_value: 10443

    - name: httpbin.org
      connect_timeout: 1s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
      load_assignment:
        cluster_name: test_upstream
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: httpbin.org
                      port_value: 443


Logs:

[2021-10-07T20:49:18.754Z] "GET / HTTP/1.1" 503 UF 0 195 2 - "-" "curl/7.79.0" "7889b267-29b4-4680-930f-f20807690799" "localhost:10000" "127.0.0.1:10443"
[2021-10-07 16:48:39.667][12679903][debug][pool] [source/common/conn_pool/conn_pool_base.cc:407] [C1] client disconnected, failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2021-10-07 16:48:39.667][12679903][debug][router] [source/common/router/router.cc:1075] [C0][S10352822734306019930] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2021-10-07 16:48:39.668][12679903][debug][http] [source/common/http/filter_manager.cc:935] [C0][S10352822734306019930] Sending local reply with details upstream_reset_before_response_started{connection failure,TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED}

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 19 (8 by maintainers)

Most upvoted comments

yep, understood ive raised it with other maintainers - backporting to old versions is non-trivial - but as you say may well be necessary

@phlax the way the trust chain is built and evaluated is the problem. On the same system, other tools (such as curl) work fine with the expired CA present.

You can see some discussion in the boringssl thread about the behavior. I’m not sure if just pulling in the fix from Boring is sufficient but I’d expect the behavior to be to follow the shortest CA chain and validate that. What’s happening now is the longer of two trust chains is being followed and the root of it is expired, even though a shorter (and also trusted) root is present.

(you updated as I was responding)

Yes, the expired root cert will likely get removed over time but a fix within envoy/boring would be dramatically more timely. Since most software isn’t impacted, I don’t think operating system vendors have a lot of pressure to remove the old CA. Openssl, for instance, has the shorter chain evaluation behavior.

If envoy doesn’t address this, folks will likely need to rebuild their system trust stores when deploying envoy outside of the official container, which doesn’t scale well and will lead to a lot of user frustration.