grpc: After Letsencrypt expiry, SSL validation fails, even though the certificate is fully valid and validates everywhere else.
What version of gRPC and what language are you using?
2.41.0 C#
What operating system (Linux, Windows,…) and version?
Windows 10
What runtime / compiler are you using (e.g. python version or version of gcc)
.NET Core 3.1 LTS
What did you do?
Any client request to a server secured with Letsencrypt fails.
What did you see instead?
I0930 18:16:43.268989 0 ..\..\..\src\core\tsi\ssl_transport_security.cc:223: LOOP - TLS 1.3 client read_server_cer - !!!!!!
I0930 18:16:43.269429 0 ..\..\..\src\core\tsi\ssl_transport_security.cc:223: LOOP - TLS 1.3 client read_server_cer - !!!!!!
E0930 18:16:43.270021 0 ..\..\..\src\core\tsi\ssl_transport_security.cc:1469: Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED.
D0930 18:16:43.270113 0 ..\..\..\src\core\lib\security\transport\security_handshaker.cc:184: Security handshake failed: {"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I0930 18:16:43.273169 0 ..\..\..\src\core\lib\channel\handshaker.cc:89: handshake_manager 0755AFA8: error={"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"} shutdown=0 index=2, args={endpoint=(nil), args=(nil) {size=0: }, read_buffer=(nil) (length=0), exit_early=0}
I0930 18:16:43.273723 0 ..\..\..\src\core\lib\channel\handshaker.cc:122: handshake_manager 0755AFA8: handshaking complete -- scheduling on_handshake_done with error={"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I0930 18:16:43.274327 0 ..\..\..\src\core\lib\iomgr\timer_generic.cc:450: TIMER 0755AFE0: CANCEL pending=true
I0930 18:16:43.274816 0 ..\..\..\src\core\lib\iomgr\resource_quota.cc:840: RU '89.163.144.187:443' (07519D98) unreffing: 1 -> 0
I0930 18:16:43.274999 0 ..\..\..\src\core\ext\filters\client_channel\subchannel.cc:1012: Connect failed: {"created":"@1633015003.270000000","description":"Handshake failed","file":"..\..\..\src\core\lib\security\transport\security_handshaker.cc","file_line":336,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I0930 18:16:43.275627 0 ..\..\..\src\core\ext\filters\client_channel\client_channel.cc:626: chand=06C2A85C: connectivity change for subchannel wrapper 075BC8D8 subchannel 005BDFD0; hopping into work_serializer
I0930 18:16:43.276086 0 ..\..\..\src\core\ext\filters\client_channel\client_channel.cc:661: chand=06C2A85C: processing connectivity change in work serializer for subchannel wrapper 075BC8D8 subchannel 005BDFD0 watcher=075D4C30
Anything else we should know about your project / environment?
nginx reverse proxy with grpc module against an internal backend.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 13
- Comments: 55 (7 by maintainers)
Commits related to this issue
- Reconfigure cert-manager to solve https://github.com/grpc/grpc/issues/27532 — committed to beneath-hq/beneath by begelundmuller 3 years ago
- fix(email-tagger-kre-bridge): use custom root certificates when creating secure gRPC channel to avoid issues with Let's Encrypt certificates (see https://github.com/grpc/grpc/issues/27532) — committed to Bruin-Dev/Intelygenz by danielfernandez-igz 3 years ago
- fix(t7-bridge): use custom root certificates when creating secure gRPC channel to avoid issues with Let's Encrypt certificates (see https://github.com/grpc/grpc/issues/27532) — committed to Bruin-Dev/Intelygenz by danielfernandez-igz 3 years ago
Per https://github.com/grpc/grpc/releases schedule, the next release (pre-release) will happen in a week or so. I will also try cherry-picking #27539 into v.1.41.0 later this week.
The
--preferred-chain "ISRG Root X1"
piece is the critical bit here should anyone need to adjust this.I think there are following ways to resolve the issue:
GRPC_DEFAULT_SSL_ROOTS_FILE_PATH
with its file path. gRPC team also updated its own root store to remove DST Root CA X3 cert (https://github.com/grpc/grpc/pull/27539). This approach is suggested in OpenSSL (https://www.openssl.org/blog/blog/2021/09/13/LetsEncryptRootCertExpire/), and should work regardless of which crypto library (openSSL/boringSSL) you use with gRPC.From the gRPC-side, we will update the version of BoringSSL gRPC depends on, and make sure the next release includes both the update and https://github.com/grpc/grpc/pull/27539. Besides that, I do not think any other change is needed at the gRPC side.
I have found a temporary fix for this, as follows:
The problem is that grpc core is C based and that does not have access to OS level certificate stores. This functionality here is provided by .NET.
They did everything correctly by either bundling the roots OR giving the user an option to specify their own roots. However, there is a SSL component that is broken somewhere in the library, which causes the certificate validation to fail without manually removing the X3 cert from the roots.
So library wise this is the only thing they can do, no problem there, however the SSL component needs looking at.
Also, I think this issue should have maximum priority. This actively brings/brought down production systems, and because the bug is somewhere in the SSL chain, it could easily happen with some other cert provider as well.
Another workaround: if you need to use GRPC and/or GRPC.core dotnet libraries you can build the certificate from system CAs. It fixed it for us, Have fun deploying 😕
Whilst that fixes the immediate issue, does this highlight a possible issue with the way the TLS client is performing chain building?
My understanding of the LE certificate arrangement was that there should still be a trusted chain it can find back to ISRG X1
FWIW: I was able to bring our systems back to life by re-generating the certs using the following command, then restarting the hosts:
Edit: as mentioned by @DeanBrunt below, the critical part was adding the
--preferred-chain "ISRG Root X1"
parameter.v1.41.1 was released with the fix.
Hi @YifeiZhuang. Can you comment on what needs to happen now for pypi to get updated? Do you know if it will automatically update at some point? As of now, I still see the expired root certs in grpcio and the date on the package that is hosted on PyPi indicates that it was built prior to this cherry pick.
Thank you!
Literally nothing you said in the comment is correct. Please read the issue before making absolute claims.
I just want to call out the
--preferred-chain "ISRG Root X1"
fix worked for us and is supported incert-manager
as part of theIssuer
andClusterIssuer
resources; see release notes. Also, one can install thekubectl
cert-manger plugin in order to trigger a renewal.Update: sorry haven’t seen that the workaround was already mentioned before, so many comments 😅 But for us it works as well to set the env var.
—‐---------
We had the same issue yesterday.
It seems that the grpc .net lib loads depending on the os different certs and it seem to load an invalid root cert.
We found the following workaround
Set the environment variable: GRPC_DEFAULT_SSL_ROOTS_FILE_PATH The lib need to know where there root certificate is located, example in arch linux it is: /etc/ssl/certs/ When i did it locally the name of the certificate was ISRG_Root_X1.pem, which is the new lets encrypt root certificate
So try with
export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/pathtocerts/rootcert.pem
and running the application.Unfortunately it works only with “certonly” so it means updating all the scripts if they use “renew”.
Found a workaround: Embed the .pem file in your app (as a string), apply the fix by @nickbabkin and then create the SSLCredentials using that file. That way the current library version can still be used, and you don’t need to set a system-specific path to the root certs…
Does not work if doing it on server. The client is still screwed.
Yes I think the grpc-dotnet-client is purely .net HTTPClient based so it is indeed using the system cert… I did re-request the certs with certbot, but I didn’t try to ask for the ISRG Root X1 as the preferred cert.
This needs a fix from GRPC side, this is totally not cool though.
The fix is to remove the following certificate:
SHA1 Fingerprint: da:c9:02:4f:54:d8:f6:df:94:93:5f:b1:73:26:38:ca:6a:d7:7c:13
From the roots.pem
If you’re in control of the servers here, requesting Let’s Encrypt cert with ISRG Root X1 as the preferred chain may fix this issue.
Seeing this on the Python and C++ gRPC clients.
Our testing shows that latest version of grpc/grpc
etc/roots.pem
as of 253d7076fc19c7380b3f58b598eaca1b076bec74 does not work. When we useGRPC_DEFAULT_SSL_ROOTS_FILE_PATH
environment variable to replace it with the system certificate bundle we are able to get our Python gRPC based clients working.To replicate the fix on an Ubuntu/Debian system:
EDIT - Our C++ clients ended up using the system certificates which did not have the problem the built in certs had.
For Python at least, looks like boringssl is embedded as the SSL lib, which has this bug that was fairly well documented with a bunch of other ssl libs (OpenSSL) but not boring.
https://github.com/grpc/grpc/blob/9978223a26dbb0cfa51881090929ff88ff430351/setup.py#L70
https://bugs.chromium.org/p/boringssl/issues/detail?id=439&sort=-modified
We fixed this in our client by just hardcoding the ISRG roots in our package: https://github.com/Couchers-org/couchers/pull/2034/files (in case someone is looking for a quick fix on their python code)
Doing this for testing;
Here’s a PR with fix: https://github.com/grpc/grpc/pull/27533/files
Yep that’s the cert that has now expired and is still in the chain for LE certs due to a cross-signing agreement
Details here: https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/