kops: DNS mostly fails inside application pods on brand new cluster

Kops version: 1.8.0 (git-5099bc5) Kubernetes version: v1.8.6 (6260bb08c46c31eea6cb538b34a9ceb3e406689c) Cloud: AWS

The issue

When I create a brand new cluster, everything appears to be working fine. All the masters and workers are ready and I can deploy application pods. BUT, mostly the pods cannot resolve public DNS names like www.google.com, or even internal names like myservice.default. When I run ping www.google.com either the command takes a long time (over 10 seconds) and eventually says name not resolved, or it takes a long time and eventually starts pinging Google. It’s as if kube-dns is failing most of the time, but not always.

Things I have noticed:

Only application pods have this problem, all system pods (on masters or workers) appear to be able to resolve names.
All nodes (masters and workers) can resolve names

Steps used to create the cluster:

Created cluster configuration:

kops create cluster \
  --api-loadbalancer-type internal \
  --associate-public-ip=false \
  --cloud=aws \
  --dns private \
  --image "595879546273/CoreOS-stable-1632.2.1-hvm" \
  --master-count 3 \
  --master-size t2.small \
  --master-zones "us-east-1b,us-east-1c,us-east-1d" \
  --name=stg-us-east-1.k8s.local \
  --network-cidr 10.0.64.0/22 \
  --networking flannel \
  --node-count 5 \
  --node-size t2.small \
  --out . \
  --output json \
  --ssh-public-key ~/.ssh/mykey.pub \
  --state s3://mybucket \
  --target=terraform \
  --topology private \
  --vpc vpc-3153eb2e \
  --zones "us-east-1b,us-east-1c,us-east-1d"

Modified subnets (kops edit cluster) as per https://github.com/kubernetes/kops/blob/master/docs/run_in_existing_vpc.md
Updated cluster config (kops update cluster) and deployed everything (terraform apply).

Cluster manifest:

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-02-05T16:07:43Z
  name: stg-us-east-1.k8s.local
spec:
  api:
    loadBalancer:
      type: Internal
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://mybucket/stg-us-east-1.k8s.local
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1c
      name: c
    - instanceGroup: master-us-east-1d
      name: d
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1c
      name: c
    - instanceGroup: master-us-east-1d
      name: d
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.8.6
  masterInternalName: api.internal.stg-us-east-1.k8s.local
  masterPublicName: api.stg-us-east-1.k8s.local
  networkCIDR: 10.0.64.0/22
  networkID: vpc-73cfbb0a
  networking:
    flannel:
      backend: vxlan
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 10.0.65.0/24
    egress: nat-012ee02a09a7830d2
    id: subnet-de86e6f2
    name: us-east-1b
    type: Private
    zone: us-east-1b
  - cidr: 10.0.66.0/24
    egress: nat-012ee02a09a7830d2
    id: subnet-5fb5ef17
    name: us-east-1c
    type: Private
    zone: us-east-1c
  - cidr: 10.0.67.0/24
    egress: nat-012ee02a09a7830d2
    id: subnet-b13da5eb
    name: us-east-1d
    type: Private
    zone: us-east-1d
  - cidr: 10.0.64.32/27
    id: subnet-d68bebfa
    name: utility-us-east-1b
    type: Utility
    zone: us-east-1b
  - cidr: 10.0.64.96/27
    id: subnet-cbb0ea83
    name: utility-us-east-1c
    type: Utility
    zone: us-east-1c
  - cidr: 10.0.64.160/27
    id: subnet-f23ea6a8
    name: utility-us-east-1d
    type: Utility
    zone: us-east-1d
  topology:
    dns:
      type: Private
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-02-05T16:07:43Z
  labels:
    kops.k8s.io/cluster: stg-us-east-1.k8s.local
  name: master-us-east-1b
spec:
  associatePublicIp: false
  image: 595879546273/CoreOS-stable-1632.2.1-hvm
  machineType: t2.small
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1b
  role: Master
  subnets:
  - us-east-1b

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-02-05T16:07:43Z
  labels:
    kops.k8s.io/cluster: stg-us-east-1.k8s.local
  name: master-us-east-1c
spec:
  associatePublicIp: false
  image: 595879546273/CoreOS-stable-1632.2.1-hvm
  machineType: t2.small
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1c
  role: Master
  subnets:
  - us-east-1c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-02-05T16:07:43Z
  labels:
    kops.k8s.io/cluster: stg-us-east-1.k8s.local
  name: master-us-east-1d
spec:
  associatePublicIp: false
  image: 595879546273/CoreOS-stable-1632.2.1-hvm
  machineType: t2.small
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1d
  role: Master
  subnets:
  - us-east-1d

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-02-05T16:07:43Z
  labels:
    kops.k8s.io/cluster: stg-us-east-1.k8s.local
  name: nodes
spec:
  associatePublicIp: false
  image: 595879546273/CoreOS-stable-1632.2.1-hvm
  machineType: t2.small
  maxSize: 3
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - us-east-1b
  - us-east-1c
  - us-east-1d

Contents of resolv.conf on a node:

# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known DNS servers.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.0.64.2
search ec2.internal

Contents of resolv.conf on a system pod:

nameserver 10.0.64.2
search ec2.internal

Contents of resolv.conf on an application pod:

nameserver 100.64.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5

Things I have tried

I’ve tried destroying the cluster and recreating it, this problem is always present.
I’ve tried adding options single-request-reopen to resolv.conf on the application pods, as discussed in https://github.com/kubernetes/kubernetes/issues/56903 but this made no difference.
I’ve tried removing options ndots:5 from application pod resolv.conf, as described in other places but this made no difference.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 27 (5 by maintainers)

Most upvoted comments

Not sure where in kops provision process this should go… but it probably can be solved by dropping in a file to /etc/modules-load.d/ to load this kernel module.

echo br_netfilter > /etc/modules-load.d/br_netfilter.conf

If using cloud-init or Ignition, there are equivalent options

trinitronx on Feb 6, 2018

I fixed this in my own kops cluster by editing the cluster:

kops edit cluster stg-us-east-1.k8s.local --state s3://mybucket

and adding a hook:

  - manifest: |
      Type=oneshot
      ExecStart=/usr/sbin/modprobe br_netfilter
    name: fix-dns.service

joelittlejohn on Feb 7, 2018

@joelittlejohn :

I think this has to do with kubernetes/kubernetes#21613

Fix that appears to work for us is to run sudo modprobe br_netfilter on all cluster nodes.

This affected our clusters using CoreOS AMI also!

Symptoms are that DNS responses are coming from unexpected source IP. When a DNS lookup is made it sends packet to the kube-dns Service IP. Then it receives response from the Pod IP, which is then dropped because the sender doesn’t know it’s talking to the pod… just the Service IP.

Symptoms are (when doing lookup with dig against Service IP):

# Symptoms: dig svc-name.svc.cluster.local  returns "reply from unexpected source" error such as:
root@example-pod-64598c547d-z9vb4:/# dig @100.64.0.12 svc-name.svc.cluster.local
;; reply from unexpected source: 100.96.3.3#53, expected 100.64.0.12#53
;; reply from unexpected source: 100.96.3.3#53, expected 100.64.0.12#53
;; reply from unexpected source: 100.96.3.3#53, expected 100.64.0.12#53

trinitronx on Feb 6, 2018

Like @joelittlejohn , we were able to fix this on cluster create by adding the following hook via kops cluster edit:

Under spec:

hooks:
- name: fix-dns.service
  roles:
  - Node
  - Master
  before:
  - network-pre.target
  - kubelet.service
  manifest: |
    Type=oneshot
    ExecStart=/usr/sbin/modprobe br_netfilter
    [Unit]
    Wants=network-pre.target
    [Install]
    WantedBy=multi-user.target

trinitronx on Feb 9, 2018

@trinitronx Thanks loading this module completely fixed the problem! I had found a bunch of potential solution in the kubernetes issues list but not this one 😂

So it looks like kops for CoreOS should adding echo br_netfilter > /etc/modules-load.d/br_netfilter.conf (or something else to load this module) as part of provisioning a CoreOS cluster, because right now the CoreOS clusters that kops is creating are broken 🤔

joelittlejohn on Feb 7, 2018

/remove-lifecycle stale

Does anyone that has commented here know if this problem would still affect newly built clusters using all the latest versions? (kops 1.9, CoreOS, Kubernetes 1.9, flannel).

I’m loathe to just close this because it seems like such a massive bug: “DNS broken on brand new cluster”. It’s not yet clear to me whether this should be fixed in flannel, of kubernetes or kops.

joelittlejohn on May 14, 2018

I noticed in the logs of CoreOS-stable-1632.2.1-hvm

Feb 08 11:26:15 ip-10-250-101-49.eu-west-2.compute.internal kernel: bridge: 
filtering via arp/ip/ip6tables is no longer available by default. Update your scripts 
to load br_netfilter if you need this

It might be worth raising on the flannel repo to get an official response …

gambol99 on Feb 8, 2018