blackbox_exporter: Wrong probe_ssl_earliest_cert_expiry value for certificates with multiple trust chains
We try to monitor cert expiration date for this certificate: tvonline.swb-gruppe.de With openssl and using Google Chrome browser we get an expected value (Oct 26 23:59:59 2019).
echo | openssl s_client -servername tvonline.swb-gruppe.de -connect tvonline.swb-gruppe.de:443 2>/dev/null | openssl x509 -noout -dates notBefore=Oct 26 00:00:00 2017 GMT notAfter=Oct 26 23:59:59 2019 GMT
But with blackbox_exporter we get:
probe_ssl_earliest_cert_expiry 1.534824e+09
which is GMT: Tuesday, 21 August 2018 04:00:00.
It seems a similar problem is described here: https://security.stackexchange.com/questions/66487/what-happens-when-certificates-further-up-the-chain-expires-before-mine-equifa
It’s recommended to update openssl to v1.0.2 there which has a fix for this issue but I guess golang/blackbox_exporter use some other mechanism to work with SSL. Is there any workaround or fix for this issue?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 20
- Comments: 50 (18 by maintainers)
Commits related to this issue
- Merge pull request #1 from greg-solutions/feature/extend_cert_expire_data Extend information about last cert of chain — committed to greg-solutions/blackbox_exporter by vadimDidenko 4 years ago
- Add new probe_ssl_earliest_verified_chain_expiry metric Resolves https://github.com/prometheus/blackbox_exporter/issues/340 Based on disscution in the issue above, this metric will help determine wh... — committed to itkq/blackbox_exporter by itkq 4 years ago
- Add new probe_ssl_earliest_verified_chain_expiry metric Resolves https://github.com/prometheus/blackbox_exporter/issues/340 Based on discussion in the issue above, this metric will help determine wh... — committed to itkq/blackbox_exporter by itkq 4 years ago
- Add new probe_ssl_earliest_verified_chain_expiry metric Resolves https://github.com/prometheus/blackbox_exporter/issues/340 Based on the discussion in the issue above, this metric will help determin... — committed to itkq/blackbox_exporter by itkq 4 years ago
- Add new probe_ssl_latest_verified_chain_expiry metric Resolves https://github.com/prometheus/blackbox_exporter/issues/340 Based on the discussion in the issue above, this metric will help determine ... — committed to itkq/blackbox_exporter by itkq 4 years ago
- Add new probe_ssl_latest_verified_chain_expiry metric Resolves https://github.com/prometheus/blackbox_exporter/issues/340 Based on the discussion in the issue above, this metric will help determine ... — committed to itkq/blackbox_exporter by itkq 4 years ago
- Add new probe_ssl_last_chain_expiry_timestamp_seconds metric (#636) * Add new probe_ssl_latest_verified_chain_expiry metric Resolves https://github.com/prometheus/blackbox_exporter/issues/340 B... — committed to prometheus/blackbox_exporter by itkq 4 years ago
It isn’t just going to be a month, though, and it still means that other certs in the chain could be missed. I just don’t understand your resistance to just adding some functionality to only report on the end of the chain, which a lot of people have asked for, provided patches for, etc. If this many people are asking, it’s clearly sought after and would be helpful. As a sysadmin, I can tell you, this is a HUGE limiting factor in needing to run blackbox vs. having to export our own SSL metrics. It’s not possible to always control certs in the way you describe. They are issued to us, often on a per-cert basis, and we aren’t looking to renew 200 certs just because blackbox is going to yell at us for another year about them. That isn’t a good use of money.
https://thesslonline.com/blog/sectigo-addtrust-external-ca-root-expiring-may-30-2020
A lot of certificates use
Sectigo AddTrust External CA
in their chain and it’s going to expire next month on May 30th 2020. Is there a way to disable the checks on these kind of intermediate certs?From the blackbox exporter standpoint that is a true positive, and you should look at either updating that cert or removing it if it’s no longer relevant.
That isn’t always an option, though. It even says in the documentation:
That doesn’t mean we should have to upgrade our (still valid) certs. This is a shortcoming, and frankly an issue, with blackbox at this point.
Right, and in 90% of cases, that’s what we want to monitor as sysadmins - the last cert in the file or at the end of the trust chain. I don’t care if another cert in the chain is going to expire, so I would disable alerting for that entirely, because they’re outside of my purview or control. Certs expire in certain parts of chains, especially when bridging is occurring, just like in this scenario, all the time when talking about certs that come from huge places like Sectigo/Comodo. Some people would still find it useful, so I don’t propose getting rid of the functionality at all. Manual cert manipulation isn’t worth it to save the kiliobytes of bandwidth, load time, etc. It’s more risky that you will break something or return something a client will find invalid. We get certs delivered to us as a bundle that we stick into place and call it good until it’s time to renew. That’s the industry behavior.
Edit: it just seems unreasonable to not ADD functionality to an already great tool that would make it better for a lot of users and completely not impact people who don’t want to use it vs. asking people to change their entire config management and cert management paradigms. Truly, not trying to be argumentative. I want to find a solution to this, but I don’t understand your opposition, and I would like to.
For those of you interested in only alerting on specific certificates in the chain, that can be achieved with: https://github.com/ribbybibby/ssl_exporter.
I’m planning on getting #635 in and then releasing, so hopefully in the next week or so.
I was affected by this issue. What I did as a workaround was to edit the PEM file to remove the intermediate certificates expiring on 2020-05-30.
(Using the BEGIN CERTIFICATE / END CERTIFICATE separators and using “openssl x509 -in foo.pem -text -noout” to know the expiry date of each).
(Just in case somebody finds this useful after a google search).
I’m not willing to accept a feature that enables users to purposefully ignore a definitive upcoming breakage of their application.
To be clear I have rejected PRs that only look at the last cert in the list, what no one has sent me is a PR that provides the time when the last chain will expire.
If you’re sure it’s not a problem then I suggest silencing the alert for a month, and then removing it at that point.
I think an important topic is missing from this discussion: availability of trust achors. The kind of certficiate chain we’re looking at here is:
A -> B -> C -> AddTrust CA
The thing is that the
AddTrust CA
certificate will be expiring by the end of May 2020 and blackbox, correctly, reports this. However, on most systems, this CA is not a trust anchor on the system, whileC
is (because theC
certificate is in the OS store or available as a builtin object in the browser CA store).With C as a valid trust anchor, an SSL client that has to investigate the validity of the certificate chain, will look at
A
, thenB
and thenC
, stopping there becauseC
is a trust anchor. There is no need for the client to investigateAddTrust CA
, sinceC
has explicitly been trusted on the system.The fact that
AddTrust CA
will expire by the end of May is not an issue for clients that have an updated trust anchor forC
, and this is exactly what is the case for most clients. Things will work as intended (not by accident) with this certificate chain. There is no need to replace the certificate with a new one that does not have theAddTrust CA
dangling beneath the certificate tree.So what would be correct behavior?
What could be done is to only check the certificate tree up to and including the level of a trust root that is available on the system. In the example case from above: only check
A
,B
andC
. This might arguably be the wrong thing to do, since the perspective that blackbox has is not guaranteed to be the same perspective als the regular client that visits the TLS service. E.g. a web browser visiting a https service might work when it has a builtin objectC
, while blackbox might report an issue when it doesn’t haveC
but only the expiring/expiredAddTrust CA
Because it might be hard to make sure that blackbox has the same perspective on valid trust anchors as the visiting clients have, a better option might be some exclude list to flag those certificates that should not be considered in the check. Blackbox could even provide this list for well-known cases like the mentioned
AddTrust CA
.My $0.02, I hope they can help this discussion forward.
Yes.
In the case of the presented issue this morning, it will NOT break almost anyone using the certs.
Are you saying if I submit a PR that actually looks at the time when the last chain will expire you will consider it? I am more than willing to put in the time. It is worth noting, though, that a majority of server certs are the last in the list because most software requires that to be the order. Root -> intermediates -> server cert.
So turns out this is definitely still an issue - has there been any discussion in terms of how to move forward? The behavior as-is is basically useless.
Edit: I understand the use case for maybe a private CA, but for people trying to monitor externally issued certs, this probably isn’t helpful. I’m very interested in a solution that just adds a new value for the end of the chain, and would be willing to work on a patch if one isn’t started and that seems a reasonable path forward that would be accepted.
Moved to SSL exporter to validate SSL connection. Thank you @ribbybibby for the link.
I think what needs to happen is that we report on expiration of certificate paths, rather than the certificate itself, and have an option whether the expiration of a path triggers an alert.
Right now I have to remove the alert because the legacy chain cert is expiring, but the 2nd path is still going to be fine. Here’s some more info for the Sectigo certs: https://support.sectigo.com/Com_KnowledgeDetailPage?Id=kA03l00000117LT
I’m surprised this hasn’t been a larger issue for people.