concourse: DNS fails to resolve - missing iptables rule

Bug Report

DNS fails to resolve for check/get and all builds fail (yes fail, not error) since things like github.com do not resolve.

Temporarily disabling enforcement of all iptables rules on the worker makes the pipelines work again.

systemctl stop firewalld
systemctl restart concourse-worker

where

the systemd unit file is as follows:

[Unit]
Description=concourse worker

After=suspend.target
After=hibernate.target
After=hybrid-sleep.target
After=network.target

[Service]
Type=simple
Restart=always
RestartSec=60s
Environment=CONCOURSE_WORKER_NAME=abyss
Environment=CONCOURSE_TSA_HOST=ci.spearow.io
Environment=CONCOURSE_TSA_PORT=...
Environment=CONCOURSE_TSA_PUBLIC_KEY=...
Environment=CONCOURSE_WORKER_PRIVATE_KEY=...
Environment=CONCOURSE_WORK_DIR=...
Environment=CONCOURSE_KEY_DIR=...
#Environment=CONCOURSE_GARDEN_DNS_SERVER=...
ExecStart=/usr/local/bin/concourse \
		worker \
		--name=${CONCOURSE_WORKER_NAME} \
		--work-dir=${CONCOURSE_WORK_DIR} \
		--tsa-host=${CONCOURSE_TSA_HOST} \
		--tsa-port=${CONCOURSE_TSA_PORT} \
		--tsa-public-key=${CONCOURSE_KEY_DIR}/${CONCOURSE_TSA_PUBLIC_KEY} \
		--tsa-worker-private-key=${CONCOURSE_KEY_DIR}/${CONCOURSE_WORKER_PRIVATE_KEY}
#		--garden-dns-server=${CONCOURSE_GARDEN_DNS_SERVER} # no change
#		--garden-dns-proxy-enable # causes :53 address already in use

ExecStop=-/usr/local/bin/concourse \
		land-worker \
		--name=${CONCOURSE_WORKER_NAME} \
		--tsa-host=${CONCOURSE_TSA_HOST} \
		--tsa-port=${CONCOURSE_TSA_PORT} \
		--tsa-public-key=${CONCOURSE_KEY_DIR}/${CONCOURSE_TSA_PUBLIC_KEY} \
		--tsa-worker-private-key=${CONCOURSE_KEY_DIR}/${CONCOURSE_WORKER_PRIVATE_KEY}

[Install]
WantedBy=multi-user.target

One symptom is that guardian exits with exit code 128 which can be retrieved from the logs.

Reduced to the relevant handle:

Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.835707664","source":"guardian","message":"guardian.api.garden-server.get-properties.got-properties","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","session":"3.1.1651"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.871015787","source":"guardian","message":"guardian.run.started","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.871315002","source":"guardian","message":"guardian.run.exec.start","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.871395826","source":"guardian","message":"guardian.run.exec.prepare.start","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2.1"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.885740280","source":"guardian","message":"guardian.run.exec.prepare.finished","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2.1"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.885795832","source":"guardian","message":"guardian.run.exec.execrunner.start","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2.2"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.891973972","source":"guardian","message":"guardian.run.exec.execrunner.read-exit-fd","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2.2"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.947968721","source":"guardian","message":"guardian.run.exec.execrunner.runc-exit-status","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2.2","status":0}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.948189020","source":"guardian","message":"guardian.run.exec.execrunner.done","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2.2"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.948215723","source":"guardian","message":"guardian.run.exec.finished","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535.2"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.948238611","source":"guardian","message":"guardian.run.finished","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","path":"/opt/resource/check","session":"1535"}}
Sep 27 10:13:49 redbeard concourse[3619]: {"timestamp":"1506500029.948260307","source":"guardian","message":"guardian.api.garden-server.run.spawned","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"d8e0ac2a-a542-4b91-45be-2b0d357c2f78","session":"3.1.1652","spec":{"Path":"/opt/resource/check","Dir":"","User":"root","Limits":{},"TTY":null}}}
Sep 27 10:13:55 redbeard concourse[3619]: {"timestamp":"1506500035.085594416","source":"guardian","message":"guardian.api.garden-server.run.exited","log_level":1,"data":{"handle":"bdbaab09-b183-4838-7719-4f3666896a62","id":"d8e0ac2a-a542-4b91-45be-2b0d357c2f78","session":"3.1.1652","status":128}}

The worker still shows up, although with much lower count of containers than expected (= ~50):

fly -t spearow workers
name   containers  platform  tags  team  state    version
abyss  5           linux     none  none  running  1.2 

The following can also be handy:

  • Concourse version: 3.5.0
  • Deployment type (BOSH/Docker/binary): binary
  • Infrastructure/IaaS: Fedora / CentOS / RedHat
  • Browser (if applicable): firefox
  • Did this used to work? Broken at least since 3.4.0, not tested before.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 5
  • Comments: 35 (8 by maintainers)

Most upvoted comments

I was able to get this working on Ubuntu 17.10 by running the following before starting the concourse worker: /sbin/iptables -P FORWARD ACCEPT. That worked for v3.8.0 through v3.14.1, which we’re currently running.

Is there any update of this, this is still broken on Ubuntu 18.04 even with the workaround mentioned.

EDIT: I have been able to get it working by specifying a --garden-dns-server manually on the Concourse Worker binary. On AWS it seems to work by specifying the AWS provided local DNS --garden-dns-server 169.254.169.253

I can confirm that adding CONCOURSE_GARDEN_DNS_SERVER=8.8.8.8 into /etc/worker_environment worked for me on Ubuntu 18.04 server with Concourse 5.1 (binary + systemd installation)

We’ll have docs covering this soon. They’re written already but we need to wait on 5.1 before publishing them since they depend on a fix to setting garden config. 👍

On Fri, Apr 5, 2019, 10:05 AM Vasco Figueira notifications@github.com wrote:

If you’re on Ubuntu 18.04 TLS and the above mentioned –garden-dns-server option is no longer available (concourse versions 5.0.0+, perhaps earlier), you can set CONCOURSE_GARDEN_DNS_SERVER=1.1.1.1 in the shell launching your worker.

Of course DNS servers other than 1.1.1.1 will also do, depending on the case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/concourse/concourse/issues/1667#issuecomment-480287939, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHWDWz6HkBNfPTRZVcrN_H_wYCDr7cks5vd1hCgaJpZM4PlUsq .

@clintjedwards no. But the standard iptables policy for forward is accept. So I just didn’t changed it.

Ah yes, sorry. I’m using the binary distribution, and managing it via systemd.

@drahnr did you find a solution to this? We are seeing what seems to be the same problem.

resource script '/opt/resource/check []' failed: exit status 128

stderr:
Cloning into '/tmp/git-resource-repo-cache'...
fatal: unable to access 'https://github.dev.mycompany.com/Repo/Name.git/': Could not resolve proxy: proxy.tpz.mycompany.com