aws-cdk: (certificatemanager): DnsValidatedCertificate timeout while waiting for certificate approval

Describe the bug Creating certificates via certificate manager and route54 DNS validation fails with a timeout. Error message:

Failed to create resource. Resource is not in the state certificateValidated

Expected behavior The lambda waiting for the approval should probably wait more than the hardcoded 5 minutes right now.

Version:

  • OS: linux
  • Programming Language: typescript
  • CDK Version: 0.33.x

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 29
  • Comments: 34 (7 by maintainers)

Commits related to this issue

Most upvoted comments

I had this same issue happen, and it turned out that my domain had a different set of name servers than the created hosted zone.

To fix it manually: You can update the name servers for a domain to match the hosted zone in the top right of the domain information on the R53 console (on the left menu click on “registered domains” then click on your domain in the list).

AWS docs for updating name servers here: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/domain-name-servers-glue-records.html

As for the CDK, the HostedZone construct should probably be updated to use the name servers that the domain is configured for so that multiple hosted zones can be created for the same domain.

It is also worth noting that I had transferred the domain from a different AWS account, and had no existing hosted zones. Not sure how the existing implementation determines what name servers to use for a hosted zone, but maybe this is why it is failing to use the correct ones?

Still a problem; Requested at 2020-01-16T10:33:04UTC Issued at 2020-01-16T10:46:21UTC Can the delay duration be a variable so we can specify a value?

I’m hitting this also again and interestingly it seems that it depends on the region - we are using eu-central-1 for everything besides the cognito certificates (they must be issued in us-east-1 for custom domains). In eu-central-1 the approval goes through in sec/mins - for us-east-1 it takes hours. I don’t know what could be the follow up problems, but what if we add an option to skip the validation of the certificate issue status - is this possible at all?

Lately, certificate validation often takes more than 10 minutes. In the worst case it took about 42 minutes, as far as I tested. It would be better if the waiter params could be specified in DnsValidatedCertificateProps.

Screen Shot 2020-01-16 at 19 57 50

Here is a potential solution: Make sure that your Hosted Zone (the one you are writing the CNAME record-set to) is registered. Meaning: when you type in the “zone name” (i.e vincent.subdomain.domain.co.za) of the Hosted Zone in NSLookup it should return the 4 name servers. If it does not, then you cannot validate a certificate with that domain name (hosted zone)

I’m sorry but I believe this can only be properly fixed by Amazon internal team.

The problem is that DnsValidatedCertificate works by creating a custom resource with lambda that adds those records and then waits for validation. But since this is a lambda, there is a max run time of 15 minutes. Yet based on comments above, validating certificates may take hours on us-east-1. I’ve been currently waiting on validation for 49 minutes and it’s still not validated.

As to why we have to use the DnsValidatedCertificate: We are a team in Europe, with our main region being Ireland: eu-west-1. There are many certificates that require certs placed in N. Virginia: us-east-1. That rules out the regular acm.Certificate class because that class will only deploy to the main region.

We also don’t want a separate stack that deploys into us-east-1 because then you cannot export certificate ARN and import it into another stack. Fn::importValue only works within the same region.

Workarounds: The only workaround right now is to deploy it in a separate stack into us-east-1, then have a second stack that exports certificate values which are hard-coded as strings (manual step) and then have a third stack which actually uses those values.

One other workaround is to retry stack deployment early in the morning when it seems to get validated in time - but that is highly unreliable.

Solutions: Well ideally you could internally push for making certificate validations faster in that region and guarantee validations under 15 minutes. Or implement an API to do cross-region certificate creations, so CloudFormation would support this scenario natively (without the lambda). Or don’t force us to deploy certificates to a specific region (us-east-1), then we could all happily use the acm.Certificate class.

I’ve never really used CustomResource, so don’t know much about that. But is there a way to run something else than a lambda that might run for longer?

If you can’t do any of that, you could at least make the stack deployments idempotent. Problem is that the custom resource lambda fails and triggers a rollback, which orphans the certificate and new re-deployment doesn’t use the original cert that might be already validated. There would be no problem if I could: deploy a stack, wait for it to fail due to lambda timeout, wait until certificate is valdiated, re-deploy - and it will pickup the original certificate and successfully complete.

Does it really need to fail and trigger rollback? How come the main acm.Certificate within one region works?

At the very least this issue should be documented on the cdk page for DnsValidatedCertificate construct.

What is the solution for this issue now? As the ACM timing out causing the rollback of the whole cdk stack that I’m deploying. And I need to add a cert into the CloudFrontConfig.

I dont understand how increasing the wait time to 9mins was a valid solution? That does not solve the problem at all.

@BillyBunn, might be a long shot, but I switched to Certificate and my deploy started hanging as well. I never let it time out but I noticed in my gmail spam folder I had a bunch of emails from AWS re: Certificate Approval with a link that I had to click to approve the certificate. I marked them not as spam and tried again; clicking the approve link seemed to do the trick.

I switched back to the DNS validated cert afterward, and that one seems to work if I wait for the hostedZone to get created, then use its name servers to update the name servers section under registered domains via the UI. The deploy hangs while I do that but then seems to finish up.

@njlynch Unfortunately I’m experiencing the same timeout issue, even with the Certificate construct. I’ve tried using both.

DnsvalidatedCertificate timed out after a few minutes with

CREATE_FAILED | AWS::CloudFormation::CustomResource 
Received response status [FAILED] from custom resource. Message returned: Resource is not in the state certificateValidated
... stacktrace

Certificate timed out after a few hours with

CREATE_FAILED | AWS::CertificateManager::Certificate 
Certificate is in PENDING_VALIDATION status
... stacktrace

Also, both ways are unable to delete the failed stack because of DNS record sets created in the same deployment that pointed at a CloudFront alias (probably should be a separate issue).

DELETE_FAILED | AWS::Route53::HostedZone
The specified hosted zone contains non-required resource record sets  and so cannot be deleted.

Ran into this trying to deploy a static site (S3 bucket, CloudFront distribution, Route53 hosted zone, ACM certificate) with a domain registered already with Route53. I have noticed also what @acdoussan mentioned—the name servers for the registered domain do not match the hosted zone NS records made by PublicHostedZone.

Anything obvious that is causing this? My code:

    const websiteBucket = new s3.Bucket(this, "WebsiteBucket", {
      autoDeleteObjects: true,
      publicReadAccess: false,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    const websiteHostedZone = new route53.PublicHostedZone(this, "WebsiteHostedZone", {
      zoneName: 'domain-name.com',
    });

    // Have also tried `DnsValidatedCertificate`
    const websiteCertificate = new certificateManager.Certificate(this, "WebsiteCertificate", {
      domainName: 'domain-name.com',
      subjectAlternativeNames: ['www.domain-name.com'],
      validation: certificateManager.CertificateValidation.fromDns(websiteHostedZone),
    });

    const websiteBucketDistribution = new cloudfront.Distribution(this, "WebsiteBucketDistribution", {
      certificate: websiteCertificate,
      defaultBehavior: {
        origin: new origins.S3Origin(websiteBucket),
        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
      },
      defaultRootObject: "index.html",
      domainNames: ['domain-name.com'],
    });

    new route53.ARecord(this, "WebsiteARecord", {
      target: route53.RecordTarget.fromAlias(new targets.CloudFrontTarget(websiteBucketDistribution)),
      recordName: 'domain-name.com',
      zone: websiteHostedZone,
    });

    new route53.AaaaRecord(this, "WebsiteAAAARecord", {
      target: route53.RecordTarget.fromAlias(new targets.CloudFrontTarget(websiteBucketDistribution)),
      recordName: 'domain-name.com',
      zone: websiteHostedZone,
    });

Edit: Can recreate with simply this

    const websiteHostedZone = new route53.PublicHostedZone(this, "WebsiteHostedZone", {
      zoneName: 'domain-name.com',
    });

    // Have also tried `DnsValidatedCertificate
    const websiteCertificate = new certificateManager.Certificate(this, "WebsiteCertificate", {
      domainName: 'domain-name.com',
      subjectAlternativeNames: ['www.domain-name.com'],
      validation: certificateManager.CertificateValidation.fromDns(websiteHostedZone),
    });

For those experiencing this issue:

Unless you absolutely need cross-region certificate issuance (e.g., requesting a us-east-1 certificate from another region for CloudFront), then converting to use the Certificate construct (as @AbendGithub notes above) is your best bet. The Certificate construct does not have the same time-out constraints as DnsValidatedCertificate and uses CloudFormation’s internal workflow system for provisioning and validating.

If you must use DnsValidatedCertificate, give yourself the best possible chance of success by creating and deploying your Route53 HostedZone first, validating the domain with tools like dig, nslookup, etc., and only then adding the certificate to the deployment. See https://docs.aws.amazon.com/acm/latest/userguide/troubleshooting-DNS-validation.html for a list of common DNS validation troubleshooting tips. In particular, if something like % dig yourhostname.example.com does not return the 4 name servers associated with your hosted zone prior to starting the deployment, your certificate will never validate.

For those running with this problem, use instead the Certificate construct. It allows you to achieve the very same thing without time limit. Something like this:

        const certificate = new acm.Certificate(this, `${PREFIX}LandingPageAcmCertificate`, {
            domainName: SITE_DOMAIN,
            subjectAlternativeNames: [`www.${SITE_DOMAIN}`],
            validation: acm.CertificateValidation.fromDns(rootHostedZone)
        });

@papiro This AWS page is how.

The answer you want is in there