foundryvtt-docker: Container doesn't populate resolv.conf properly

šŸ› Bug Report

I’d like to supply my own dns server with the container using the --dns attribute. Yet this is not correctly picked up and inserted into the /etc/resolv.conf. This makes it impossible for the container to run in bridge networking mode

To Reproduce

Steps to reproduce the behavior:

  • Run the container using --dns 8.8.8.8
  • The contain will not run properly as it cannot resolve the DNS for foundryvtt.com

Expected behavior

I’m expecting the DNS to be populated into the /etc/resolv.conf. I don’t know how ever why it isn’t This works fine for all my other containers I’m running.

Any helpful log output

I use docker-compose

version: "3.3"

secrets:
  config_json:
    file: /share/Container/foundryvtt-secrets.json

services:
  foundry:
    image: felddy/foundryvtt:0.7.8
    hostname: foundryvtt
    mac_address: 24:5E:BE:00:00:F6
    dns:
    - 192.168.1.2
    - 8.8.8.8
    - 8.8.4.4
    networks:
        qnet-static-eth0-79e6cc:
            ipv4_address: 192.168.1.246
    volumes:
      - type: bind
        source: /share/Container/foundryvtt
        target: /data
    environment:
      - FOUNDRY_LICENSE_KEY=*
      - CONTAINER_CACHE=/data/container_cache
      - CONTAINER_PATCHES=/data/container_patches
    secrets:
      - source: config_json
        target: config.json
        
networks:
  qnet-static-eth0-79e6cc:
    external: true

Paste the results here:

Entrypoint | 2020-12-25 09:30:07 | [info] Starting felddy/foundryvtt container v0.7.8                                                                        
Entrypoint | 2020-12-25 09:30:07 | [info] Reading configured secrets from: /run/secrets/config.json                                                          
Entrypoint | 2020-12-25 09:30:09 | [info] No Foundry Virtual Tabletop installation detected.                                                                 
Entrypoint | 2020-12-25 09:30:09 | [info] Using FOUNDRY_USERNAME and FOUNDRY_PASSWORD to authenticate.                                                       
Authenticate | 2020-12-25 09:30:14 | [info] Requesting CSRF tokens from https://foundryvtt.com                                                               
Authenticate | 2020-12-25 09:30:19 | [error] Unable to authenticate: request to https://foundryvtt.com/ failed, reason: getaddrinfo EAI_AGAIN foundryvtt.com

The /etc/resolv.conf

nameserver 127.0.0.11
options ndots:0 

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 24 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve resolved the DNS issue I’ve been having while running this and other Alpine based images in Kubernetes clusters on my network.

Short answer: I turned off DNSSEC for my domain name managed by Cloudflare and everything started working.

Read on for details.

Some information about my setup:

  • I use Cloudflare DNS to setup DNS TXT entries for letsencrypt so that my internal only servers can browser trusted certificates.
  • I don’t use Cloudflare DNS for normal (A, AAAA, etc…) DNS records for my internal domain. I have an internal, Unbound DNS service for that.
  • Crucially, I had DNSSEC enabled for my internal domain in the Cloudflare DNS settings. I must have enabled it when I had different plans for that domain.

Some general information about what causes the problem for me (and possibly for you):

  • When Kubernetes starts a container, it adds search domains and options ndots:5 to /etc/resolve.conf inside the container
    • It copies the search domains from the host (my local domain, say, mylocaldomain.tld in my case) and adds a bunch of Kubernetes specific ones like cluster.local and svc.cluster.local.
    • This resolve.conf configuration has to do with looking up local services inside the cluster.
    • Aside: you can also override ndots to be ā€œ1ā€ in each pod spec to solve the problem in another way
  • Now, when a DNS lookup for, say, foundryvtt.com is performed inside of a container, all of those search domains are checked first. For example, foundryvtt.com.svc.cluster.local then foundryvtt.com.cluster.local and foundryvtt.com.mylocaldomain.tld. Finally, if none of those other domains ā€œresolveā€, then foundryvtt.com is checked.
    • The ...cluster.local domains are rejected by CoreDNS inside of the cluster, I guess. No beef with those.
    • foundryvtt.com.mylocaldomain.tld escapes the cluster and gets to my internal Unbound DNS server.
    • Unbound doesn’t recognize it, so passes it, transparently, to another DNS server (8.8.8.8, Google’s public DNS in my case).
      • Maybe I should configure Unbound to reject anything with that base domain that it doesn’t recognize?
    • That DNS server recognizes the mylocaldomain.tld part and asks Cloudflare how to resolve it because Cloudflare is the authority on that particular domain.
    • Cloudflare would normally respond with NXDOMAIN, which, I guess (not a DNS expert here) means ā€œdoesn’t existā€. Instead, because I had DNSSEC enabled, it responds with NOERROR, but doesn’t respond with an actual IP address. This is something like ā€œI can neither confirm nor deny the existence of that or related domainsā€. Read here about how Cloudflare justifies that response.
    • That ā€œno commentā€ response winds its way back to the original requestor. Any non musl-based DNS client library would then shrug and continue looking through the search domains until it got to the implied ā€˜.’ and tried ā€˜foundryvtt.com’ with a happy ending. musl will stop looking after recieving a NOERROR. Read here about how musl justifies that response.

Here are some links that helped me figure this out:

I could verify that this was a problem and that my fix worked using alpine/git and dig.

Before fix:

[jdmarble@jdmarble-desktop ~]$ kubectl run alpine-git --image=alpine/git --restart=Never -it --rm clone https://github.com/octocat/Spoon-Knife.git
fatal: unable to access 'https://github.com/octocat/Spoon-Knife.git/': Could not resolve host: github.com
...

(note that github.com did not resolve inside an Alpine based container inside of the cluster)

[jdmarble@jdmarble-desktop ~]$ dig github.com.mylocaldomain.tld
...
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26637
...
;; AUTHORITY SECTION:
mylocaldomain.tld.		1720	IN	SOA	cleo.ns.cloudflare.com. dns.cloudflare.com. ...
...

(note the NOERROR response)

After fix:

[jdmarble@jdmarble-desktop ~]$ kubectl run alpine-git --image=alpine/git --restart=Never -it --rm clone https://github.com/octocat/Spoon-Knife.git
Cloning into 'Spoon-Knife'...
remote: Enumerating objects: 16, done.
remote: Total 16 (delta 0), reused 0 (delta 0), pack-reused 16
Receiving objects: 100% (16/16), done.
Resolving deltas: 100% (3/3), done.

(note that github.com resolved inside an Alpine based container inside of the cluster)

[jdmarble@jdmarble-desktop ~]$ dig github.com.myinternaldomain.tld
...
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 56469
...
;; AUTHORITY SECTION:
myinternaldomain.tld.		1044	IN	SOA	cleo.ns.cloudflare.com. dns.cloudflare.com. ...
...

(note the NXDOMAIN response)

In my case, it was an easy decision to disable DNSSEC because the domain is only used internally and I’m not using Cloudflare for normal records. If you want to keep DNSSEC on, you may have to get creative or switch away from Cloudflare.

I ported to node:12-slim to successfully work around the problem. I’m running into a lot of DNS issues on alpine based images. Not sure if it’s my k8s cluster’s configuration, or what.

I’m seeing the same issue running in Kubernetes. Might be related to this bug in Alpine. Edit: scratch that. I rebuilt using node:12-alpine3.10 and still had the problem.

I’ll test again this on my 3 k8s clusters with the Alpine image (my default), and update here and in the other thread too. I’m still have the 8.8.8.8 on my CoreDNS so I’ll try both, and edit this post

My 3 clusters runs today K8S Version

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T20:01:24Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

Runnin CoreDNS k8s.gcr.io/coredns/coredns:v1.8.4

āžœ k describe replicaset coredns-78fcd69978 -n kube-system
Name:           coredns-78fcd69978
Namespace:      kube-system
Selector:       k8s-app=kube-dns,pod-template-hash=78fcd69978
Labels:         k8s-app=kube-dns
                pod-template-hash=78fcd69978
Annotations:    deployment.kubernetes.io/desired-replicas: 2
                deployment.kubernetes.io/max-replicas: 3
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/coredns
Replicas:       2 current / 2 desired
Pods Status:    2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=kube-dns
                    pod-template-hash=78fcd69978
  Service Account:  coredns
  Containers:
   coredns:
    Image:       k8s.gcr.io/coredns/coredns:v1.8.4
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:               ConfigMap (a volume populated by a ConfigMap)
    Name:               coredns
    Optional:           false
  Priority Class Name:  system-cluster-critical
Events:                 <none>

Confirmed with the same

Authenticate | 2022-01-24 19:52:07 | [error] Unable to authenticate: request to https://foundryvtt.com/auth/login/ failed, reason: getaddrinfo EAI_AGAIN foundryvtt.com

I have found something interesting that may solve the issue.

Though the call to dns.lookup() will be asynchronous from JavaScript’s perspective, it is implemented as a synchronous call to getaddrinfo(3) that runs on libuv’s threadpool. This can have surprising negative performance implications for some applications, see the UV_THREADPOOL_SIZE documentation for more information. and from:

https://nodejs.org/api/cli.html#cli_uv_threadpool_size_size more here: https://medium.com/@amirilovic/how-to-fix-node-dns-issues-5d4ec2e12e95

This solved my issue running 200 deployments.

I have not been able to fixt his yet but I suspect this may be an issue with core DNS.

Lookups for foundryvtt.com appear to be failing because passthrough does not seem to be working

from coredns logs

[INFO] 10.1.182.28:51321 - 64102 "A IN foundryvtt.com.svc.cluster.local. udp 50 false 512" NXDOMAIN qr,aa,rd 143 0.000390493s
[INFO] 10.1.182.28:51321 - 41623 "A IN foundryvtt.com.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000535954s
[INFO] 10.1.182.28:51321 - 17998 "A IN foundryvtt.com.local. udp 38 false 512" SERVFAIL qr,rd,ra 113 0.03611267s

no lookups for foundryvtt.com though.

In case this is helpful. I noticed that the felddy/foundryvtt:improvement-debian worked fine, however the following are errors in felddy/foundryvtt:latest

Entrypoint | 2021-03-16 16:15:16 | [debug] Timezone set to: UTC
Entrypoint | 2021-03-16 16:15:16 | [info] Starting felddy/foundryvtt container v0.7.9
Entrypoint | 2021-03-16 16:15:16 | [debug] CONTAINER_VERBOSE set.  Debug logging enabled.
Entrypoint | 2021-03-16 16:15:16 | [info] No Foundry Virtual Tabletop installation detected.
Entrypoint | 2021-03-16 16:15:16 | [info] Using FOUNDRY_USERNAME and FOUNDRY_PASSWORD to authenticate.
Authenticate | 2021-03-16 16:15:16 | [debug] Saving cookies to: cookiejar.json
Authenticate | 2021-03-16 16:15:16 | [info] Requesting CSRF tokens from https://foundryvtt.com
Authenticate | 2021-03-16 16:15:16 | [debug] Fetching: https://foundryvtt.com
Authenticate | 2021-03-16 16:15:16 | [error] Unable to authenticate: request to https://foundryvtt.com/ failed, reason: getaddrinfo ENOTFOUND foundryvtt.com

Results Locally

Unable to find image 'node:14-alpine' locally
14-alpine: Pulling from library/node
e95f33c60a64: Pull complete 
0f691a8bb887: Pull complete 
daf9b71c0a0d: Pull complete 
d92a928c7b7d: Pull complete 
Digest: sha256:a75f7cc536062f9266f602d49047bc249826581406f8bc5a6605c76f9ed18e98
Status: Downloaded newer image for node:14-alpine
Server:         8.8.8.8
Address:        8.8.8.8:53

Non-authoritative answer:
Name:   foundryvtt.com
Address: 44.234.61.225

Non-authoritative answer:

inside k3s: (yaml included) (this also worked setting the dns server to 8.8.8.8)

apiVersion: batch/v1
kind: Job
metadata:
  name: hello
spec:
  template:
    # This is the pod template
    spec:
      containers:
      - name: dns-test
        image: node:14-alpine
        command: ['nslookup', 'foundryvtt.com']
      restartPolicy: OnFailure
---

Server:         10.43.0.10
Address:        10.43.0.10:53

Non-authoritative answer:

Non-authoritative answer:
Name:   foundryvtt.com
Address: 44.234.61.225

Sure thing. I’ll test it this evening (or possibly tomorrow if I run out of time) and I’ll post back here

I also have this networking issue in my k3s cluster. @jdmarble’s repo worked šŸ˜„