amazon-vpc-cni-k8s: Using Security Groups per pod with NodeLocal DNSCache doesn't work
What happened: I tried to attach security group to pod using official guide. Everything work as expected, but when I try to use NodeLocal DNS cache, I can’t connect to CoreDNS IP (172.20.0.10) from pods, to which I attached Security Group (I can connect from other pods). I Use this file as template for my installation. Here is my NodeLocalCache DaemonSet and ConfigMap manifests:
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: node-local-dns
name: node-local-dns
namespace: kube-system
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: node-local-dns
template:
metadata:
annotations:
prometheus.io/port: "9253"
prometheus.io/scrape: "true"
labels:
k8s-app: node-local-dns
spec:
containers:
- args:
- -localip
- 169.254.20.10,172.20.0.10
- -conf
- /etc/Corefile
- -upstreamsvc
- kube-dns-upstream
image: k8s.gcr.io/dns/k8s-dns-node-cache:1.16.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
host: 169.254.20.10
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: node-cache
ports:
- containerPort: 53
hostPort: 53
name: dns
protocol: UDP
- containerPort: 53
hostPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9253
hostPort: 9253
name: metrics
protocol: TCP
resources:
requests:
cpu: 25m
memory: 5Mi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /etc/coredns
name: config-volume
- mountPath: /etc/kube-dns
name: kube-dns-config
dnsPolicy: Default
hostNetwork: true
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: node-local-dns
serviceAccountName: node-local-dns
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
volumes:
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- configMap:
defaultMode: 420
name: kube-dns
optional: true
name: kube-dns-config
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile.base
name: node-local-dns
name: config-volume
updateStrategy:
rollingUpdate:
maxUnavailable: 10%
type: RollingUpdate
---
apiVersion: v1
data:
Corefile: |
cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 169.254.20.10 172.20.0.10
forward . __PILLAR__CLUSTER__DNS__
prometheus :9253
health 169.254.20.10:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.20.0.10
forward . __PILLAR__CLUSTER__DNS__
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.20.0.10
forward . __PILLAR__CLUSTER__DNS__
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.20.0.10
forward . __PILLAR__UPSTREAM__SERVERS__
prometheus :9253
}
kind: ConfigMap
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: node-local-dns
namespace: kube-system
Here is final /etc/Corefile file:
cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 169.254.20.10 172.20.0.10
forward . 172.20.209.13
prometheus :9253
health 169.254.20.10:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.20.0.10
forward . 172.20.209.13
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.20.0.10
forward . 172.20.209.13
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.20.0.10
forward . /etc/resolv.conf
prometheus :9253
}
Note, that I don’t use force_tcp option in CoreDNS configuration according official recommendation
Environment:
- Kubernetes version (use
kubectl version
): v1.18.9-eks-d1db3c - CNI Version: 1.7.8
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
):
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 15 (9 by maintainers)
The following workaround allows DNS Resolution to work for pods using SG.
In your
nodelocaldns
deployment we can disable the DaemonSet to apply the IP Table rules that are blocking DNS Resolution for pods using Security Group by passing the following additional argument.We now have to create a custom dnsPolicy for all the pods using Security Group for Pod feature and for that we need to find the kube DNS service IP using
KUBE_DNS_IP=$(kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP})
and the region for the cluster.After replacing the values in the dnsPolicy and using it in the pods using SGP, DNS resolution should go through. You can verify the same using the following command
It should return the output like this.
Just to provide more around this issue, documenting few behaviors below
Pods using security group doesn’t use
local
route table and therefore connection always goes out of branch ENI (using vlan) and to outside through trunk ENI even if these pods want to communicate to other pods on the same host. This ensures the egress on security group is applied. Therefore pods using security group will not be able to communicate to pods using host networking on the host if security group doesn’t allow for such communication.NodeLocalDNS pods Setup NodeLocalDNS pods performs two operations as follows,
local
routing table.Since pods using security group will go out of branchENI(VLAN) device and doesn’t know about local route table, packets will not reach nodeLocalDNS. Also, due to NOTRACK iptable rule, these pods won’t be able to communicate with actual clusterDNS service IP as well.
There are few workarounds for this, @abhipth is looking into this if we can use cluster DNSPolicy and ClusterDNS NameServer on pod spec to avoid this and configure NodeLocalDNS to not add NOTRACK iptable rules.
Other option we have is, add a flag in aws-node ipamd that indicates to add a special ip rule on the host to enable NodeLocalDNS traffic within the host (cni plugin can add
from all vlan use route table x
andip rule from NodeLocalDNS use route table x
andallow traffic to NodeLocalDNS to lookup local
). This might be better and right approach but open to suggestions on this.Sorry for the inconvenience this has caused, we will update our docs with whatever we decide as right path forward as localDNS is super useful feature and we want to support this on all pods.