kops: Route53 mapper: pod can't be scheduled

I bumped into this:

No nodes are available that match all of the following predicates:: MatchNodeSelector (2), PodToleratesNodeTaints (1).

I’m trying to deploy the Route53 mapper 1.3.0 using kubectl apply -f <url/to/github/master>. Any ideas on how I could debug this?

I’ve been staring at my master node config and at my deployments and their differences, but I don’t see anything unusual about them.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 3
  • Comments: 27 (8 by maintainers)

Most upvoted comments

Adding this, correctly scheduled it on the master node for me:

      nodeSelector:
        node-role.kubernetes.io/master: ""
      tolerations:
        - key: "node-role.kubernetes.io/master"
          effect: NoSchedule

I pilfered this from the cluster autoscaler addon

I’m actually not sure what the <no value> is about, I’m sure a Kubernetes expert could explain this.

As to whether or not the taint is applied by default, I am pretty darn sure it is. There’s probably a lot of reasons, esp. ensuring master reliability and capacity, but there is one that is possibly AWS specific: IAM roles.

When you create a cluster via Kops, it applies a separate IAM role to masters vs other nodes. This separate IAM role needs extra permissions in some cases, like this one where it needs write access to Route53. You can see why you’d not want ordinary applications to have access to this, because they could effectively take over all domains that you grant Route53 mapper access to. Maybe your code wouldn’t take advantage of this, but keep in mind if anyone manages to gain RCE on any app running in the cluster, they would get this if the IAM role for nodes had access to Route53.

For this reason I think it would be unwise to remove the NoSchedule taint from the master node, you definitely want to limit what’s running on the master.

Anyways, I’m again no Kubernetes expert, but here’s my understanding:

node-role.kubernetes.io/master=<no value>:NoSchedule

This taint has three parts:

  • Key: node-role.kubernetes.io/master
  • Value: <no value> (oddly enough…)
  • Effect: NoSchedule

The first two (key and value) seem to largely be inconsequential and are probably designed to make it so that your deployments can be very specific about the type of taints they can tolerate.

I’m not sure how you’d specify that it should match no value, but maybe specifying an empty string would do the trick.

The error message actually tells you everything you need to know about why things aren’t working. Yours will probably differ at least a little, but let’s look at the one in the OP.

No nodes are available that match all of the following predicates:: MatchNodeSelector (2), PodToleratesNodeTaints (1).
  • MatchNodeSelector (2) tells us that the node selector in the deployment does not match 2 nodes.
  • PodToleratesNodeTaints (1) tells us that the tolerations in the deployment does not match 1 node.

What’s happening here is, Kubernetes is deciding what nodes the deployment can be scheduled on and is determining that, based on parameters, none of the nodes match the criteria.

That’s because this criteria will refuse non-master nodes:

      nodeSelector:
        kubernetes.io/role: master

We know that this criteria is working because presumably the OP has 3 nodes and the MatchNodeSelector is only rejecting 2 nodes, so it probably is matching the master only. But the problem is that the master has a taint preventing normal applications from being scheduled, and ultimately the deployment is not tolerating those taints.


OK, I hope my explanation of what’s going on here is adequate. But you’re probably wondering what to do next. I think what you want to do is download the Route53-mapper deployment and add a toleration matching your master node’s taint.

As an example, I just tried to take the v1.3.0 yaml and adjust it. It might not work, now would be a good final time to emphasize that I’m no expert.

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: route53-mapper
  namespace: kube-system
  labels:
    app: route53-mapper
    k8s-addon: route53-mapper.addons.k8s.io
spec:
  replicas: 1
  selector:
    matchLabels:
      app: route53-mapper
  template:
    metadata:
      labels:
        app: route53-mapper
      annotations:
        scheduler.alpha.kubernetes.io/tolerations: '[{"key":"node-role.kubernetes.io", "effect":"NoSchedule"}]'
    spec:
      nodeSelector:
        kubernetes.io/role: master
      containers:
        - image: quay.io/molecule/route53-kubernetes:v1.3.0
          name: route53-mapper

I believe there’s a more ‘first class’ way of specifying tolerations in the yaml in newer versions of Kubernetes, but I don’t know where in the podspec it belongs. More information in the documentation here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

SSL termination works for me in Kubernetes since at least 1.5 without any cluster addons. SSL certificate annotations have no impact on the DNS settings and visa versa.

  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:xxxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: '443'
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: 'http'
    dns.alpha.kubernetes.io/external: "domain.name"

The Documentation or the YAML should probably get an update so this works properly for new users.

ahah, I had scheduler.alpha.kubernetes.io/tolerations and used the annotations when I should have updated that to tolerations and moved to the pod spec. I also had some spurious ' mixed in to make the list object a string (from the annotation).

I can confirm, on 1.6.x, with a cluster deployed by kops, this is working for me… and either form:

tolerations: [{"key":"dedicated", "value":"master"}, {"key":"node-role.kubernetes.io/master", "effect":"NoSchedule"}]

or:

spec:
  ...
  template:
    ...
    spec:
      ...
      tolerations:
        - key: dedicated
          value: master
        - key: node-role.kubernetes.io/master
          effect: NoSchedule

Though what isn’t clear: should we update the spec in github, even if it only works for 1.6.x (perhaps: comment it out with a message to explain)?

Adding this to my deployment’s pod specification fixed it for me:

      tolerations: [{"key":"dedicated", "value":"master"}, {"key":"node-role.kubernetes.io/master", "effect": "NoSchedule"}]

You’re welcome, although to be honest I have abandoned this approach because

  annotations:
    dns.alpha.kubernetes.io/external: "xxx.example.com"

is a lot easier to use and requires no set-up