rancher: Web service ip not resolving in nginx container

Rancher Versions: 1.3.0 Docker Version: 1.13.0 OS and where are the hosts located? cloud Setup Details: digital ocean: 1 rancher server + 1 host Environment Type: Cattle

Steps to Reproduce: When I deploy using rancher-compose command I got the following error from my nginx container:

2017/02/05 02:59:59 [emerg] 9#9: host not found in upstream "web" in /etc/nginx/conf.d/default.conf:13

This is similar to issue: https://github.com/rancher/rancher/issues/2628

The weird thing is when I push images to a host created using docker machine, everything works as fine no problem:

docker-machine -f staging.yml up -d # see staging.yml below

I’m not sure what difference in the rancher-created host vs. my docker-machine-created host that would cause this issue.

This is my rancher-compose.yml

version: '2'
services:
  web:
    scale: 1
    start_on_create: true
  nginx:
    scale: 1
    start_on_create: true
  postgres:
    scale: 1
    start_on_create: true

This is my staging.yml docker compose file: Note that I’m using envsubst so I can use environment variables in my nginx conf files as recommended by the official library/nginx documentation on Docker Hub.

version: "2"

services:
  web:
    image: myacct/web_staging:0.0.13
    stdin_open: true
    tty: true
    restart: always
    expose:
      - "8000"
    networks:
      - backend
    volumes:
      - django-static:/usr/src/collectstatic 
      - backup:/backup
    env_file: .env
    environment:
      DEBUG: 'false'
      DB_PASS: secretepw
      EMAIL_ENABLE_NOTIFICATION: 'true'
    entrypoint: /usr/src/app/backend/docker-entrypoint.sh postgres 5432
    command: /bin/bash /usr/src/app/backend/start.sh 
    labels:
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity: myhostname=host01

  nginx:
    image: myacct/nginx_staging:0.0.18
    restart: always
    ports:
      - "80:80"
    volumes_from:
      - web
    env_file: .env
    networks:
      - backend
    labels:
      io.rancher.sidekicks: web
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity: myhostname=host01
    command: /bin/sh -c "envsubst < /etc/nginx/conf.d/django_project.template > /etc/nginx/conf.d/default.conf && nginx -g 'daemon off;'"

  postgres:
    restart: always
    image: myacct/postgres_staging:0.0.6
    stdin_open: true
    tty: true
    volumes:
      - pgdata:/var/lib/postgresql/data/
    networks:
      - backend
    environment:
      POSTGRES_PASSWORD: secretpw
    labels:
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity: myhostname=host01

volumes:
  django-static:
  pgdata:
  # used to backup and restore django data
  backup: 

networks:
  backend:

This is my nginx.conf:

user  nginx;
worker_processes  4;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

And this is my /etc/nginx/conf.d/default.conf

server {

    listen 80;
    server_name example.org;
    charset utf-8;
    client_max_body_size 1000M;

    location /static {
        alias /usr/src/collectstatic;
    }

    location / {
        proxy_pass http://web:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

}

When go to /etc/resolve.conf in my nginx container I get:

# cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8
nameserver 8.8.4.4

Is there any more information I can provide?

Thanks. -Paul

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 17 (5 by maintainers)

Most upvoted comments

Just a general hint with Docker and nginx: nginx makes just one DNS lookup at service start, if your backend container gets a new IP you have to restart the nginx container.

You can fix this by setting an DNS resolver : resolver 169.254.169.250 valid=5s ipv6=off;

and filling an variable with your DNS name : set $backendweb web; proxy_pass http://:8000$backendweb;

By this configuration nginx looks up DNS all 5 seconds again and does not fail on startup if your web container is not yet started or has no IP assigned

+10

HenryTheSir on Feb 6, 2017

I can confirm that the problem still exists on Rancher 1.6.2.

rbhaddon on Jun 28, 2017

@janeczku I know about those. The problem is Rancher not returning A records for some load balancers (<serviceName>). We think we can replicate it by removing the service (two Rancher LBs) and then redeploying it. After this is done, Rancher stops returning A records for <serviceName>, only <serviceName>.<stackName> etc are successfully queried. Furthermore, restarting Rancher’s DNS container fixes it but only in a single host setup, it breaks the DNS resolution in our two host setup. I.e. if we restart DNS on host1, host2 can’t query and vice versa.

jurajseffer on Nov 16, 2017

@jurajseffer

There are known issues with Alpine not playing nice with domain search, because it’s not using the glib standard c library. https://github.com/gliderlabs/docker-alpine/issues/8#issuecomment-255600445 In short: <serviceName> does not resolve, but <serviceName>.<stackName> does.

janeczku on Feb 14, 2017