containerpilot: Interface missing/ignored running with overlay weave network

ContainerPilot can’t seem to find one of the interfaces so is registering the wrong IP.

This is happening in docker cloud with a weave overlay network. The service interface list is set to ["ethwe","eth0"] and it is choosing eth0 always even though ethwe interface exists when running ip addr. If I remove eth0 from the interface list CP complains about not finding any interfaces.

Here is an example containerpilot.json

{
  "consul": "consul:8500",
  "preStart": "/opt/containerpilot/containerconfig.sh",
  "services": [
    {
      "name": "wo",
      "port": 56789,
      "health": "/usr/bin/curl -o /dev/null --fail -s http://localhost:56789",
      "poll": 10,
      "ttl": 25,
      "tags": [ "{{.APPNAME}}"],
      "interfaces": [
        "ethwe",
        "eth0"
      ]
    }
  ],
  "backends": [
  ]
}

And the interface list.

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
12: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:06 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.6/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:6/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
22: ethwe: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:c4:d0:37:9f:48 brd ff:ff:ff:ff:ff:ff
    inet 10.7.0.17/16 scope global ethwe
       valid_lft forever preferred_lft forever
    inet6 fe80::f8c4:d0ff:fe37:9f48/64 scope link
       valid_lft forever preferred_lft forever

Notably, if I place the command ip addr in the preStart script, it prints output identical to below, but the service gets registered with the eth0:inet address. If I remove eth0 from the interfaces list I sometimes get a message that there are no available interfaces that match or sometimes it works.

Maybe if we could add some debugging logs around the search for interfaces, the sorting and the selection to see if it can be tracked down.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 16 (11 by maintainers)

Most upvoted comments

Ok, you might want to report that to the Docker project if you can figure out where to report the problem and get suggestions from them. In the meantime, I’d recommend leaving out the ethwe from your containerpilot.json and then using a preStart to add it back in after you’ve ensured that the ethwe interface is available. Something like this:

{
  "consul": "consul:8500",
  "preStart": "/opt/containerpilot/containerconfig.sh",
  "services": [
    {
      "name": "wo",
      "port": 56789,
      "health": "/usr/bin/curl -o /dev/null --fail -s http://localhost:56789",
      "poll": 10,
      "ttl": 25,
      "tags": [ "{{.APPNAME}}"],
      "interfaces": [ "eth0" ]
    }
  ],
  "backends": [
  ]
}

With a preStart something like (probably needs some tweaking / testing!):

#!/bin/bash

while true; do
    # check a lock first
    mkdir /tmp/lock || break   
    # if we have a ethwe interface, rewrite the ContainerPilot config and SIGHUP ourselves
    ip addr | grep ethwe \
        && sed -i 's/"eth0"/"ethwe"' /etc/containerpilot.json \
        && kill -SIGHUP 1 \
        && break
    # otherwise wait and retry
    sleep 1
done