grpc: After Letsencrypt expiry, SSL validation fails, even though the certificate is fully valid and validates everywhere else.

What version of gRPC and what language are you using?

2.41.0 C#

What operating system (Linux, Windows,…) and version?

Windows 10

What runtime / compiler are you using (e.g. python version or version of gcc)

.NET Core 3.1 LTS

What did you do?

Any client request to a server secured with Letsencrypt fails.

What did you see instead?

I0930 18:16:43.268989 0 ..\..\..\src\core\tsi\ssl_transport_security.cc:223:                 LOOP - TLS 1.3 client read_server_cer  - !!!!!!
I0930 18:16:43.269429 0 ..\..\..\src\core\tsi\ssl_transport_security.cc:223:                 LOOP - TLS 1.3 client read_server_cer  - !!!!!!
E0930 18:16:43.270021 0 ..\..\..\src\core\tsi\ssl_transport_security.cc:1469: Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED.
D0930 18:16:43.270113 0 ..\..\..\src\core\lib\security\transport\security_handshaker.cc:184: Security handshake failed: {"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I0930 18:16:43.273169 0 ..\..\..\src\core\lib\channel\handshaker.cc:89: handshake_manager 0755AFA8: error={"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"} shutdown=0 index=2, args={endpoint=(nil), args=(nil) {size=0: }, read_buffer=(nil) (length=0), exit_early=0}
I0930 18:16:43.273723 0 ..\..\..\src\core\lib\channel\handshaker.cc:122: handshake_manager 0755AFA8: handshaking complete -- scheduling on_handshake_done with error={"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I0930 18:16:43.274327 0 ..\..\..\src\core\lib\iomgr\timer_generic.cc:450: TIMER 0755AFE0: CANCEL pending=true
I0930 18:16:43.274816 0 ..\..\..\src\core\lib\iomgr\resource_quota.cc:840: RU '89.163.144.187:443' (07519D98) unreffing: 1 -> 0
I0930 18:16:43.274999 0 ..\..\..\src\core\ext\filters\client_channel\subchannel.cc:1012: Connect failed: {"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I0930 18:16:43.275627 0 ..\..\..\src\core\ext\filters\client_channel\client_channel.cc:626: chand=06C2A85C: connectivity change for subchannel wrapper 075BC8D8 subchannel 005BDFD0; hopping into work_serializer
I0930 18:16:43.276086 0 ..\..\..\src\core\ext\filters\client_channel\client_channel.cc:661: chand=06C2A85C: processing connectivity change in work serializer for subchannel wrapper 075BC8D8 subchannel 005BDFD0 watcher=075D4C30

Anything else we should know about your project / environment?

nginx reverse proxy with grpc module against an internal backend.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 13
  • Comments: 55 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Per https://github.com/grpc/grpc/releases schedule, the next release (pre-release) will happen in a week or so. I will also try cherry-picking #27539 into v.1.41.0 later this week.

FWIW: I was able to bring our systems back to life by re-generating the certs using the following command, then restarting the hosts:

certbot certonly --dns-route53 -d "$DOMAIN_NAME" -m "$EMAIL" --preferred-chain "ISRG Root X1" --agree-tos --no-eff-email

The --preferred-chain "ISRG Root X1" piece is the critical bit here should anyone need to adjust this.

I think there are following ways to resolve the issue:

  1. Filter out DST Root CA X3 from the trust store (and make sure ISRG Root X1 exists in the trust store), and set the env var - GRPC_DEFAULT_SSL_ROOTS_FILE_PATH with its file path. gRPC team also updated its own root store to remove DST Root CA X3 cert (https://github.com/grpc/grpc/pull/27539). This approach is suggested in OpenSSL (https://www.openssl.org/blog/blog/2021/09/13/LetsEncryptRootCertExpire/), and should work regardless of which crypto library (openSSL/boringSSL) you use with gRPC.
  2. You can rely on OpenSSL or BoringSSL to smartly ignore the cert path (chain) rooted at an expired CA cert. The feature is only available in OpenSSL 1.1.0 or later, and BoringSSL recently added its support as well (https://bugs.chromium.org/p/boringssl/issues/detail?id=439&sort=-modified). gRPC interacts with those crypto libraries as follows: 1) If you build from source, gRPC will first check if the installed OpenSSL version is >= 1.0.2, and if it is not, it will use the boringSSL shipped with the package. It applies to all wrapped languages; 2) If you use pre-built gRPC binaries, except for gRPC node and IOS, they will always use the shipped boringSSL.
  3. Get (and use) a new Let’s Encrypt cert chained from ISRG Root X1 .

From the gRPC-side, we will update the version of BoringSSL gRPC depends on, and make sure the next release includes both the update and https://github.com/grpc/grpc/pull/27539. Besides that, I do not think any other change is needed at the gRPC side.

I have found a temporary fix for this, as follows:

  • download the roots.pem file from here https://github.com/grpc/grpc/blob/master/etc/roots.pem
  • add the downloaded roots.pem to your project
  • set environment variable GRPC_DEFAULT_SSL_ROOTS_FILE_PATH to point to the downloaded roots.pem
  • We are running in AWS lambda (python), so are using the following: GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/var/task/roots.pem

Another workaround: if you need to use GRPC and/or GRPC.core dotnet libraries you can build the certificate from system CAs. It fixed it for us, Have fun deploying 😕

public static string GetRootCertificates()

Confirmed that this fixes Windows C# client for me.

Are there any other workarounds, e.g. ENV variables that could be set on a machine (might be easier than updating the app binary)?

Also, isn’t this the proper approach of getting certs that the library itself should use? I mean for the future, is it by design that client library hardcodes the set of root certs as static file and any expirations or changes in those will require update of the client to new binary?

The problem is that grpc core is C based and that does not have access to OS level certificate stores. This functionality here is provided by .NET.

They did everything correctly by either bundling the roots OR giving the user an option to specify their own roots. However, there is a SSL component that is broken somewhere in the library, which causes the certificate validation to fail without manually removing the X3 cert from the roots.

So library wise this is the only thing they can do, no problem there, however the SSL component needs looking at.

Also, I think this issue should have maximum priority. This actively brings/brought down production systems, and because the bug is somewhere in the SSL chain, it could easily happen with some other cert provider as well.

Another workaround: if you need to use GRPC and/or GRPC.core dotnet libraries you can build the certificate from system CAs. It fixed it for us, Have fun deploying 😕

public static string GetRootCertificates()
        {
            StringBuilder builder = new StringBuilder();
            X509Store store = new X509Store(StoreName.Root,StoreLocation.LocalMachine);
            store.Open(OpenFlags.ReadOnly);
            var set = new HashSet<string>();
            foreach (X509Certificate2 mCert in store.Certificates)
            {
                if (set.Contains(mCert.Thumbprint))
                {
                    continue;
                }
                set.Add(mCert.Thumbprint);
                builder.AppendLine(
                    "# Issuer: " + mCert.Issuer.ToString() + "\n" +
                    "# Subject: " + mCert.Subject.ToString() + "\n" +
                    "# Label: " + mCert.FriendlyName.ToString() + "\n" +
                    "# Serial: " + mCert.SerialNumber.ToString() + "\n" +
                    "# SHA1 Fingerprint: " + mCert.GetCertHashString().ToString() + "\n" +
                    ExportToPEM(mCert) + "\n");
            }
            return builder.ToString();
        }

public static string ExportToPEM(X509Certificate cert)
        {
            StringBuilder builder = new StringBuilder();

            builder.AppendLine("-----BEGIN CERTIFICATE-----");
            builder.AppendLine(Convert.ToBase64String(cert.Export(X509ContentType.Cert), Base64FormattingOptions.InsertLineBreaks));
            builder.AppendLine("-----END CERTIFICATE-----");

            return builder.ToString();
        }

....
var channel = new Channel(uri.Host, uri.Port, new SslCredentials(GetRootCertificates());

Whilst that fixes the immediate issue, does this highlight a possible issue with the way the TLS client is performing chain building?

My understanding of the LE certificate arrangement was that there should still be a trusted chain it can find back to ISRG X1

FWIW: I was able to bring our systems back to life by re-generating the certs using the following command, then restarting the hosts:

certbot certonly --dns-route53 -d "$DOMAIN_NAME" -m "$EMAIL" --preferred-chain "ISRG Root X1" --agree-tos --no-eff-email

Edit: as mentioned by @DeanBrunt below, the critical part was adding the --preferred-chain "ISRG Root X1" parameter.

v1.41.1 was released with the fix.

Hi @YifeiZhuang. Can you comment on what needs to happen now for pypi to get updated? Do you know if it will automatically update at some point? As of now, I still see the expired root certs in grpcio and the date on the package that is hosted on PyPi indicates that it was built prior to this cherry pick.

Thank you!

Update: sorry haven’t seen that the workaround was already mentioned before, so many comments 😅 But for us it works as well to set the env var.

—‐---------

We had the same issue yesterday.

It seems that the grpc .net lib loads depending on the os different certs and it seem to load an invalid root cert.

We found the following workaround

Set the environment variable: GRPC_DEFAULT_SSL_ROOTS_FILE_PATH The lib need to know where there root certificate is located, example in arch linux it is: /etc/ssl/certs/ When i did it locally the name of the certificate was ISRG_Root_X1.pem, which is the new lets encrypt root certificate

So try with export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/pathtocerts/rootcert.pem and running the application.

  1. gprc .net lib is not loading different certs, it’s loading the certs that are bundled inside the library unless you specify otherwise.
  2. The problem is not with any specific certificate or anything being invalid. The problem is that in the SSL chain of grpc library somewhere is a bug that does not respect cert expiry and switching properly.

Literally nothing you said in the comment is correct. Please read the issue before making absolute claims.

I just want to call out the --preferred-chain "ISRG Root X1" fix worked for us and is supported in cert-manager as part of the Issuer and ClusterIssuer resources; see release notes. Also, one can install the kubectl cert-manger plugin in order to trigger a renewal.

Update: sorry haven’t seen that the workaround was already mentioned before, so many comments 😅 But for us it works as well to set the env var.

—‐---------

We had the same issue yesterday.

It seems that the grpc .net lib loads depending on the os different certs and it seem to load an invalid root cert.

We found the following workaround

Set the environment variable: GRPC_DEFAULT_SSL_ROOTS_FILE_PATH The lib need to know where there root certificate is located, example in arch linux it is: /etc/ssl/certs/ When i did it locally the name of the certificate was ISRG_Root_X1.pem, which is the new lets encrypt root certificate

So try with export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/pathtocerts/rootcert.pem and running the application.

Unfortunately it works only with “certonly” so it means updating all the scripts if they use “renew”.

Found a workaround: Embed the .pem file in your app (as a string), apply the fix by @nickbabkin and then create the SSLCredentials using that file. That way the current library version can still be used, and you don’t need to set a system-specific path to the root certs…

Seeing this on the Python and C++ gRPC clients.

Our testing shows that latest version of grpc/grpc etc/roots.pem as of 253d707 does not work. When we use GRPC_DEFAULT_SSL_ROOTS_FILE_PATH environment variable to replace it with the system certificate bundle we are able to get our Python gRPC based clients working.

To replicate the fix on an Ubuntu/Debian system:

export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/etc/ssl/certs/ca-certificates.crt
<your command here>

Does not work if doing it on server. The client is still screwed.

Yes I think the grpc-dotnet-client is purely .net HTTPClient based so it is indeed using the system cert… I did re-request the certs with certbot, but I didn’t try to ask for the ISRG Root X1 as the preferred cert.

This needs a fix from GRPC side, this is totally not cool though.

The fix is to remove the following certificate:

SHA1 Fingerprint: da:c9:02:4f:54:d8:f6:df:94:93:5f:b1:73:26:38:ca:6a:d7:7c:13

From the roots.pem

If you’re in control of the servers here, requesting Let’s Encrypt cert with ISRG Root X1 as the preferred chain may fix this issue.

Seeing this on the Python and C++ gRPC clients.

Our testing shows that latest version of grpc/grpc etc/roots.pem as of 253d7076fc19c7380b3f58b598eaca1b076bec74 does not work. When we use GRPC_DEFAULT_SSL_ROOTS_FILE_PATH environment variable to replace it with the system certificate bundle we are able to get our Python gRPC based clients working.

To replicate the fix on an Ubuntu/Debian system:

export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/etc/ssl/certs/ca-certificates.crt
<your command here>

EDIT - Our C++ clients ended up using the system certificates which did not have the problem the built in certs had.

For Python at least, looks like boringssl is embedded as the SSL lib, which has this bug that was fairly well documented with a bunch of other ssl libs (OpenSSL) but not boring.

https://github.com/grpc/grpc/blob/9978223a26dbb0cfa51881090929ff88ff430351/setup.py#L70

https://bugs.chromium.org/p/boringssl/issues/detail?id=439&sort=-modified

We fixed this in our client by just hardcoding the ISRG roots in our package: https://github.com/Couchers-org/couchers/pull/2034/files (in case someone is looking for a quick fix on their python code)

Doing this for testing;

 private static string GetRootsFromWindowsStore()
    {
      var sb    = new StringBuilder();
      var roots = new X509Store(StoreName.Root, StoreLocation.CurrentUser);
      var cas   = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser);

      try
      {
        roots.Open(OpenFlags.ReadOnly);
        cas.Open(OpenFlags.ReadOnly);

        // Place all certificates in an X509Certificate2Collection object
        var certs = roots.Certificates;
        certs.AddRange(cas.Certificates);

        foreach (var c in certs)
        {
          // We dont want the X3 cert
          if (!c.Issuer.Contains("X3")) sb.AppendLine(GenerateBase64Cert(c));
        }
      }
      catch (Exception e)
      {
        Log.Warning(e, "Failed to extract certs from Windows certificate store");
      }
      finally
      {
        roots.Close();
        cas.Close();
      }

      return sb.ToString();
    }

Yep that’s the cert that has now expired and is still in the chain for LE certs due to a cross-signing agreement

Details here: https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/