karpenter-provider-aws: Error when creating provisioner - failed calling webhook

Version

Karpenter: v0.5.4

Kubernetes: v1.21.5

Actual Behavior

Can’t install provisioner. When I try to apply the provisioner I get this error:

Error from server (InternalError): error when creating "provisioner.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.provisioners.karpenter.sh": Post "https://karpenter-webhook.karpenter.svc:443/default-resource?timeout=10s": context deadline exceeded

I don’t have ERROR logs neither in the pod of the controller like the one of the webhook.

Steps to Reproduce the Problem

I am following this documentation -> https://karpenter.sh/docs/getting-started-with-terraform/

Resource Specs and Logs

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 7
  • Comments: 29 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@prashantbhrgv

Try adding these:

  cluster_security_group_additional_rules = {
    ingress_nodes_karpenter_ports_tcp = {
      description                = "Karpenter readiness"
      protocol                   = "tcp"
      from_port                  = 8443
      to_port                    = 8443
      type                       = "ingress"
      source_node_security_group = true
    }
  }
  
  node_security_group_additional_rules = {
    aws_lb_controller_webhook = {
      description                   = "Cluster API to AWS LB Controller webhook"
      protocol                      = "all"
      from_port                     = 9443
      to_port                       = 9443
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }  

@prashantbhrgv

Try adding these:

  cluster_security_group_additional_rules = {
    ingress_nodes_karpenter_ports_tcp = {
      description                = "Karpenter readiness"
      protocol                   = "tcp"
      from_port                  = 8443
      to_port                    = 8443
      type                       = "ingress"
      source_node_security_group = true
    }
  }
  
  node_security_group_additional_rules = {
    aws_lb_controller_webhook = {
      description                   = "Cluster API to AWS LB Controller webhook"
      protocol                      = "all"
      from_port                     = 9443
      to_port                       = 9443
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }  

Hello, First of all thanks, it helped me figured out my problem, but important warning here when you specify protocol = “all” acccording to security group rule doc

Setting protocol = “all” or protocol = -1 with from_port and to_port will result in the EC2 API creating a security group rule with ALL PORTS OPEN.

It works for @prashantbhrgv because it openned all ports. So without specifying “all”, you have to check your webhook endpoints as @alekc said and open the right ports. Working with alb controller and karpenter everything works fine with this config for me :

cluster_security_group_additional_rules = {
    ingress_nodes_karpenter_ports_tcp = {
      description                = "Karpenter readiness"
      protocol                   = "tcp"
      from_port                  = 8443
      to_port                    = 8443
      type                       = "ingress"
      source_node_security_group = true
    }
  }

  node_security_group_additional_rules = {
    ingress_allow_alb_webhook_access_from_control_plane = {
      description                   = "Allow access from control plane to webhook port of AWS load balancer controller"
      protocol                      = "tcp"
      from_port                     = 9443
      to_port                       = 9443
      type                          = "ingress"
      source_cluster_security_group = true
    }
    ingress_allow_karpenter_webhook_access_from_control_plane = {
      description                   = "Allow access from control plane to webhook port of karpenter"
      protocol                      = "tcp"
      from_port                     = 8443
      to_port                       = 8443
      type                          = "ingress"
      source_cluster_security_group = true
    }
}

Are you sure that it’s not a security group issue? One of the most annoying “improvement” terraform-eks-module brought in v18 is that by default nodes cannot speak one with another because Sg are very restrictive (see https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1748), especially https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1748#issuecomment-1020226274

I would suggest to try with v17 as suggested in the docs first if possible.

Still relevant and still appears on both v0.19.3 and v0.20.0.

@prashantbhrgv

Try adding these:

  cluster_security_group_additional_rules = {
    ingress_nodes_karpenter_ports_tcp = {
      description                = "Karpenter readiness"
      protocol                   = "tcp"
      from_port                  = 8443
      to_port                    = 8443
      type                       = "ingress"
      source_node_security_group = true
    }
  }
  
  node_security_group_additional_rules = {
    aws_lb_controller_webhook = {
      description                   = "Cluster API to AWS LB Controller webhook"
      protocol                      = "all"
      from_port                     = 9443
      to_port                       = 9443
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }  

Thanks for sharing this. It worked 👍🏻 . Just one question. Might be naive.

How do you determine which ports/rules need to be added? in case I face a similar issue in the future.

@alekc It was that! Thanks a lot!