karpenter-provider-aws: ERROR controller.provisioning Provisioning failed, launching node, creating cloud provider instance, with fleet error(s), UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: XZX0joS

Version

Karpenter Version: v0.16.1

kubectl version: client (1.25) and server (1.22)

Kubernetes Version: v1.22

Expected Behavior

Karpenter is active and ready to begin provisioning nodes. Create some pods using a deployment, and watch Karpenter provision nodes in response.

Actual Behavior

Created some pods using a deployment, Karpenter failed to provision nodes.

INFO controller.provisioning Launching node with 5 pods requesting {“cpu”:“5125m”,“pods”:“7”} from types inf1.2xlarge, t3a.2xlarge, c5d.2xlarge, t3.2xlarge, m5.2xlarge and 308 other(s) {“commit”: “b157d45”, “provisioner”: “default”}

ERROR controller.provisioning Provisioning failed, launching node, creating cloud provider instance, with fleet error(s), UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: XZX0joSxj6TJ98

Steps to Reproduce the Problem

Terraform code to provision EKS cluster with Karpenter IRSA and instance profile.

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "3.14.4"

  name = "vpc-${local.cluster_name}"
  cidr = var.cidr

  azs                    = data.aws_availability_zones.available.names
  private_subnets        = var.private_subnets
  public_subnets         = var.public_subnets
  elasticache_subnets    = var.elasticache_subnets

  enable_nat_gateway     = true
  single_nat_gateway     = true
  one_nat_gateway_per_az = false
  enable_dns_hostnames   = true
  enable_dns_support     = true

  # VPC Flow Logs (Cloudwatch log group and IAM role will be created)
  enable_flow_log                      = true
  create_flow_log_cloudwatch_log_group = true
  create_flow_log_cloudwatch_iam_role  = true
  flow_log_max_aggregation_interval    = 60

  public_subnet_tags = {
    "kubernetes.io/cluster/${local.cluster_name}"      = "shared"
    "kubernetes.io/role/elb"                           = 1
    "karpenter.sh/discovery/${local.cluster_name}"    = local.cluster_name # for Karpenter auto-discovery
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${local.cluster_name}"     = "shared"
    "kubernetes.io/role/internal-elb"                 = 1
  }

  tags = local.tags
}
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.29.0"

  cluster_name    = local.cluster_name
  cluster_version = "1.22"

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  vpc_id                          = data.terraform_remote_state.vpc.outputs.vpc_id
  subnet_ids                      = data.terraform_remote_state.vpc.outputs.public_subnets

  cluster_enabled_log_types       = var.log_types

  manage_aws_auth_configmap       = true
  aws_auth_roles                  = var.aws_auth_roles
  aws_auth_users                  = var.aws_auth_users
  aws_auth_accounts               = var.aws_auth_accounts

  #Required for Karpenter role below
  enable_irsa                     = true

  create_cloudwatch_log_group            = false
  cloudwatch_log_group_retention_in_days = 3

  node_security_group_additional_rules = {
    ingress_nodes_karpenter_port = {
      description                   = "Cluster API to Node group for Karpenter webhook"
      protocol                      = "tcp"
      from_port                     = 8443
      to_port                       = 8443
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }

  node_security_group_tags = {
    # NOTE - if creating multiple security groups with this module, only tag the
    # security group that Karpenter should utilize with the following tag
    # (i.e. - at most, only one security group should have this tag in your account)
    "karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
  }

  # Only need one node to get Karpenter up and running.
  # This ensures core services such as VPC CNI, CoreDNS, etc. are up and running
  # so that Karpenter can be deployed and start managing compute capacity as required
  eks_managed_node_groups = {
    "${local.cluster_name}" = {
      #attach_cluster_primary_security_group = true
      capacity_type  = "ON_DEMAND"

      instance_types = ["m5.large"]
      # Not required nor used - avoid tagging two security groups with same tag as well
      create_security_group = false

      # Ensure enough capacity to run 2 Karpenter pods
      min_size     = 2
      max_size     = 3
      desired_size = 2

      iam_role_additional_policies = [
        "arn:${local.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore", # Required by Karpenter
        "arn:${local.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
        "arn:${local.partition}:iam::aws:policy/AmazonEKS_CNI_Policy",
        "arn:${local.partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly", #for access to ECR images
        "arn:${local.partition}:iam::aws:policy/CloudWatchAgentServerPolicy"
      ]

      tags = {
        # This will tag the launch template created for use by Karpenter
        "karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
      }
    }
  }
}

resource "aws_iam_instance_profile" "karpenter" {
  name = "KarpenterNodeInstanceProfile-${local.cluster_name}"
  role = module.eks.eks_managed_node_groups["${local.cluster_name}"].iam_role_name
}

module "karpenter_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "5.3.3"

  role_name                               = "${local.cluster_name}-karpenter"
  attach_karpenter_controller_policy      = true

  karpenter_tag_key                       = "karpenter.sh/discovery/${local.cluster_name}"
  karpenter_controller_cluster_id         = module.eks.cluster_id

  karpenter_controller_ssm_parameter_arns = [
    "arn:${local.partition}:ssm:*:*:parameter/aws/service/*"
  ]

  karpenter_controller_node_iam_role_arns = [
    module.eks.eks_managed_node_groups["${local.cluster_name}"].iam_role_arn
  ]

  oidc_providers = {
    ex = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["karpenter:karpenter"]
    }
  }
}

output.tf

output "cluster_arn" {
  description = "The Amazon Resource Name (ARN) of the cluster"
  value       = module.eks.cluster_arn
}

output "cluster_certificate_authority_data" {
  description = "Base64 encoded certificate data required to communicate with the cluster"
  value       = module.eks.cluster_certificate_authority_data
}

output "cluster_endpoint" {
  description = "Endpoint for EKS control plane."
  value       = module.eks.cluster_endpoint
}

output "cluster_id" {
  description = "The name/id of the EKS cluster. Will block on cluster creation until the cluster is really ready"
  value       = module.eks.cluster_id
}

output "cluster_oidc_issuer_url" {
  description = "The URL on the EKS cluster for the OpenID Connect identity provider"
  value       = module.eks.cluster_oidc_issuer_url
}

output "cluster_platform_version" {
  description = "Platform version for the cluster"
  value       = module.eks.cluster_platform_version
}

output "cluster_status" {
  description = "Status of the EKS cluster. One of `CREATING`, `ACTIVE`, `DELETING`, `FAILED`"
  value       = module.eks.cluster_status
}

output "cluster_primary_security_group_id" {
  description = "Cluster security group that was created by Amazon EKS for the cluster. Managed node groups use this security group for control-plane-to-data-plane communication."
  value       = module.eks.cluster_primary_security_group_id
}

output "cluster_region" {
  description = "The AWS region the cluster has been depoyed to"
  value       = var.region
}


output "eks_managed_node_groups" {
  description = "Map of attribute maps for all EKS managed node groups created."
  value       = module.eks.eks_managed_node_groups
}

output "cluster_iam_role_arn" {
  description = "IAM role ARN of the EKS cluster."
  value       = module.eks.cluster_iam_role_arn
}

output "cluster_iam_role_name" {
  description = "IAM role name of the EKS cluster."
  value       = module.eks.cluster_iam_role_name
}

output "cluster_iam_role_unique_id" {
  description = "Stable and unique string identifying the IAM role."
  value       = module.eks.cluster_iam_role_unique_id
}

output "oidc_provider_arn" {
  description = "The ARN of the OIDC Provider"
  value       = module.eks.oidc_provider_arn
}

output "karpenter_irsa_iam_role_arn" {
  description = "ARN of IAM role"
  value       = module.karpenter_irsa.iam_role_arn
}

output "karpenter_irsa_iam_role_name" {
  description = "Name of IAM role"
  value       = module.karpenter_irsa.iam_role_name
}

output "karpenter_irsa_iam_role_path" {
  description = "Path of IAM role"
  value       = module.karpenter_irsa.iam_role_path
}

output "karpenter_irsa_iam_role_unique_id" {
  description = "Unique ID of IAM role"
  value       = module.karpenter_irsa.iam_role_unique_id
}

output "aws_iam_instance_profile" {
  description = "Karpenter discovers the InstanceProfile using the name KarpenterNodeRole-ClusterName."
  value       = aws_iam_instance_profile.karpenter.name
}

output "vpc_ic" {
  description = "VPC ID"
  value       = data.terraform_remote_state.vpc.outputs.vpc_id
}

helm repo add karpenter https://charts.karpenter.sh/ helm repo update

helm upgrade --install --namespace karpenter --create-namespace \
  karpenter karpenter/karpenter \
  --version ${KARPENTER_VERSION} \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
  --set clusterName=${CLUSTER_NAME} \
  --set clusterEndpoint=${CLUSTER_ENDPOINT} \
  --set aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
  --wait # for the defaulting webhook to install before creating a Provisioner

Resource Specs and Logs

Provisioner specs

cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
  limits:
    resources:
      cpu: 1000
  providerRef:
    name: default
  ttlSecondsAfterEmpty: 30
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:
    karpenter.sh/discovery/${CLUSTER_NAME}: ${CLUSTER_NAME}
  securityGroupSelector:
    karpenter.sh/discovery:/${CLUSTER_NAME} ${CLUSTER_NAME}
EOF

Pod spec: This deployment uses the pause image

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1
EOF

kubectl scale deployment inflate --replicas 5

kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

ERROR controller.provisioning Provisioning failed, launching node, creating cloud provider instance, with fleet error(s), UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: C7p8L0t12N16ndxkKcGjcONv8J49w9BbgBmdY

kubectl logs karpenter-5c77486564-jvdm7 -n karpenter

Defaulted container "controller" out of: controller, webhook
{"level":"info","ts":1663558880.1839774,"logger":"fallback","caller":"injection/injection.go:61","msg":"Starting informers..."}
2022-09-19T03:41:20.178Z	INFO	Successfully created the logger.
2022-09-19T03:41:20.178Z	INFO	Logging level set to: debug
2022-09-19T03:41:20.184Z	INFO	controller	Initializing with version v0.16.1	{"commit": "b157d45"}
2022-09-19T03:41:20.184Z	INFO	controller	Setting GC memory limit to 966367641, container limit = 1073741824	{"commit": "b157d45"}
2022-09-19T03:41:20.203Z	DEBUG	controller.aws	Using AWS region us-east-1	{"commit": "b157d45"}
2022-09-19T03:41:20.403Z	DEBUG	controller.aws	Discovered caBundle, length 1099	{"commit": "b157d45"}
2022-09-19T03:41:20.403Z	INFO	controller	loading config from karpenter/karpenter-global-settings	{"commit": "b157d45"}
I0919 03:41:20.516867       1 leaderelection.go:243] attempting to acquire leader lease karpenter/karpenter-leader-election...
2022-09-19T03:41:20.517Z	INFO	controller	starting metrics server	{"commit": "b157d45", "path": "/metrics"}
E0919 03:41:20.559780       1 leaderelection.go:329] error initially creating leader election record: leases.coordination.k8s.io "karpenter-leader-election" already exists
2022-09-19T03:41:21.074Z	INFO	controller.aws.pricing	updated spot pricing with 558 instance types and 2629 offerings	{"commit": "b157d45"}
2022-09-19T03:41:22.060Z	INFO	controller.aws.pricing	updated on-demand pricing with 558 instance types	{"commit": "b157d45"}

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 27 (12 by maintainers)

Most upvoted comments

It is still happening on v0.27.0 and I am not using terraform at all.

@FernandoMiguel , It is not a bug of karpenter. It is the bug related to the module https://github.com/terraform-aws-modules/terraform-aws-iam/tree/v5.5.0/modules/iam-role-for-service-accounts-eks. I filed a request at https://github.com/terraform-aws-modules/terraform-aws-iam/issues/284. As of now, I downloaded the module iam-role-for-service-accounts-eks, made changes in policies.tf. The code is as follows.

data "aws_iam_policy_document" "karpenter_controller" {
  count = var.create_role && var.attach_karpenter_controller_policy ? 1 : 0
  statement {
    sid       = "Karpenter"
    effect    = "Allow"
    resources = ["*"]

    actions = [
      "ec2:CreateLaunchTemplate",
      "ec2:CreateFleet",
      "ec2:RunInstances",
      "ec2:CreateTags",
      "ec2:TerminateInstances",
      "ec2:DeleteLaunchTemplate",
      "ec2:DescribeLaunchTemplates",
      "ec2:DescribeInstances",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeSubnets",
      "ec2:DescribeImages",
      "ec2:DescribeInstanceTypes",
      "ec2:DescribeInstanceTypeOfferings",
      "ec2:DescribeAvailabilityZones",
      "ec2:DescribeSpotPriceHistory",
      "iam:PassRole",
      "ssm:GetParameter",
      "pricing:GetProducts"
    ]
  }
}

I added iam_instance_profile into the launch template. I am able to successfully bring up Ubuntu EC2 instances as EKS nodes. However, I have to use the policy AmazonEKS_Karpenter_Controller_Policy-karpenter-eks-dev that I created at AWS console instead of using the policy AmazonEKS_Karpenter_Controller_Policy-20220922191658668300000010 that is created by my terraform script. I got, “UnauthorizedOperation” if I use the policy AmazonEKS_Karpenter_Controller_Policy-20220922191658668300000010 (For completed errors please refer to the messages I posted earlier). The difference between these two policies are tags. Per @FernandoMiguel, I may use the module terraform-aws-eks-blueprints. I will look into it. Anyone has idea why the policy created by above Terraform code does not work? I referenced this doc https://karpenter.sh/v0.16.2/getting-started/getting-started-with-terraform/ for the above Terraform code. Or how to make improvement of “Getting Started with Terraform” to make the policy works?

You can decode the authorization failure message to understand what’s the issue about.

https://aws.amazon.com/premiumsupport/knowledge-center/ec2-not-auth-launch/