rancher: Web service ip not resolving in nginx container

Rancher Versions: 1.3.0 Docker Version: 1.13.0 OS and where are the hosts located? cloud Setup Details: digital ocean: 1 rancher server + 1 host Environment Type: Cattle

Steps to Reproduce: When I deploy using rancher-compose command I got the following error from my nginx container:

2017/02/05 02:59:59 [emerg] 9#9: host not found in upstream "web" in /etc/nginx/conf.d/default.conf:13

This is similar to issue: https://github.com/rancher/rancher/issues/2628

The weird thing is when I push images to a host created using docker machine, everything works as fine no problem:

docker-machine -f staging.yml up -d # see staging.yml below

I’m not sure what difference in the rancher-created host vs. my docker-machine-created host that would cause this issue.

This is my rancher-compose.yml

version: '2'
services:
  web:
    scale: 1
    start_on_create: true
  nginx:
    scale: 1
    start_on_create: true
  postgres:
    scale: 1
    start_on_create: true

This is my staging.yml docker compose file: Note that I’m using envsubst so I can use environment variables in my nginx conf files as recommended by the official library/nginx documentation on Docker Hub.

version: "2"

services:
  web:
    image: myacct/web_staging:0.0.13
    stdin_open: true
    tty: true
    restart: always
    expose:
      - "8000"
    networks:
      - backend
    volumes:
      - django-static:/usr/src/collectstatic 
      - backup:/backup
    env_file: .env
    environment:
      DEBUG: 'false'
      DB_PASS: secretepw
      EMAIL_ENABLE_NOTIFICATION: 'true'
    entrypoint: /usr/src/app/backend/docker-entrypoint.sh postgres 5432
    command: /bin/bash /usr/src/app/backend/start.sh 
    labels:
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity: myhostname=host01

  nginx:
    image: myacct/nginx_staging:0.0.18
    restart: always
    ports:
      - "80:80"
    volumes_from:
      - web
    env_file: .env
    networks:
      - backend
    labels:
      io.rancher.sidekicks: web
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity: myhostname=host01
    command: /bin/sh -c "envsubst < /etc/nginx/conf.d/django_project.template > /etc/nginx/conf.d/default.conf && nginx -g 'daemon off;'"

  postgres:
    restart: always
    image: myacct/postgres_staging:0.0.6
    stdin_open: true
    tty: true
    volumes:
      - pgdata:/var/lib/postgresql/data/
    networks:
      - backend
    environment:
      POSTGRES_PASSWORD: secretpw
    labels:
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity: myhostname=host01

volumes:
  django-static:
  pgdata:
  # used to backup and restore django data
  backup: 

networks:
  backend:

This is my nginx.conf:

user  nginx;
worker_processes  4;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

And this is my /etc/nginx/conf.d/default.conf

server {

    listen 80;
    server_name example.org;
    charset utf-8;
    client_max_body_size 1000M;

    location /static {
        alias /usr/src/collectstatic;
    }

    location / {
        proxy_pass http://web:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

}

When go to /etc/resolve.conf in my nginx container I get:

# cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8
nameserver 8.8.4.4

Is there any more information I can provide?

Thanks. -Paul

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (5 by maintainers)

Most upvoted comments

Just a general hint with Docker and nginx: nginx makes just one DNS lookup at service start, if your backend container gets a new IP you have to restart the nginx container.

You can fix this by setting an DNS resolver : resolver 169.254.169.250 valid=5s ipv6=off;

and filling an variable with your DNS name : set $backendweb web; proxy_pass http://:8000$backendweb;

By this configuration nginx looks up DNS all 5 seconds again and does not fail on startup if your web container is not yet started or has no IP assigned

I can confirm that the problem still exists on Rancher 1.6.2.

@janeczku I know about those. The problem is Rancher not returning A records for some load balancers (<serviceName>). We think we can replicate it by removing the service (two Rancher LBs) and then redeploying it. After this is done, Rancher stops returning A records for <serviceName>, only <serviceName>.<stackName> etc are successfully queried. Furthermore, restarting Rancher’s DNS container fixes it but only in a single host setup, it breaks the DNS resolution in our two host setup. I.e. if we restart DNS on host1, host2 can’t query and vice versa.

@jurajseffer

There are known issues with Alpine not playing nice with domain search, because it’s not using the glib standard c library. https://github.com/gliderlabs/docker-alpine/issues/8#issuecomment-255600445 In short: <serviceName> does not resolve, but <serviceName>.<stackName> does.