moby: Missing from Swarmmode --cap-add

Some form of --cap-add or optional elevated privilege system would be required for accessing GPIO pins on ARM devices. Since ARM is becoming better supported by the Docker engine I would like to raise this to attention.

We tend to need to write to /dev/mem and there is currently a capability for that in “regular flavoured swarm”.

I would like to build out some IoT PoCs with Docker and swarmmode and support for this would really help. CC/ @DieterReuter @StefanScherer

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 121
  • Comments: 104 (38 by maintainers)

Commits related to this issue

Most upvoted comments

FYI. This is very unofficial information but I try tell something what I know because people are very eagerly asking about this feature.

Because of Mirantis to acquired Docker Enterprise and some of Docker Inc employees was moved there it is currently very unclear when they are able to get release process working again which why at least I don’t know that what will be the next Docker version or when it will be released.

However, whole feature is implemented and works as far I can see so who ever want to test it can do it by downloading latest nightly build of Docker engine (dockerd) from https://master.dockerproject.org and my custom build version of Docker CLI from https://github.com/olljanat/cli/releases/tag/beta1 You can also find usage examples for CLI from https://github.com/docker/cli/pull/2199 and for Stack from https://github.com/docker/cli/pull/1940 If you find bugs from those please leave comment to correct PR. Also notice that syntax might still change during review.

FYI. I got bored to follow this not progressing discussion so I started to implement this one.

It will need multiple PRs:

  • Allow give exact list of capabilities instead of add/drop default ones: #38380
  • Moby bump to Swarmkit
  • Swarmkit side implementation docker/swarmkit#2795
  • Swarmkit bump to Moby
  • Another PR to moby to support Swarmkit side Capabilities setting #39173
  • Swarmkit and Moby bump to docker/cli
  • Client side implementation with stack docker/cli#1940
  • Client side implementation without stack docker/cli#2199
  • Docs update.

So these will not be ready anytime soon but maybe before summer…

+1 for at least being able to add NET_ADMIN capability.

FWIW, elasticsearch requires the IPC_LOCK capability, making it impossible to deploy a Swarm Mode stack with ElasticSearch until this is resolved…

@megastef @redhog status and schedule on https://github.com/moby/moby/issues/25885#issuecomment-447657852 is still valid. First part of solution ( #38380 ) will be as part of API version 1.40 (which is released as part of Docker 19.03) and rest of the solution will be part of 1.41 API version (what ever Docker version will contain it).

It needed two versions as old solution needed quite big refactor (you can see whole discussion on #38380 if you are interested).

Need cap_add NET_ADMIN for kylemanna/openvpn on docker stack

Status update: This have been a bit hold during summer but I got now cap_add/cap_drop/privileged settings working with stack using https://github.com/docker/cli/pull/1940 PTAL and provide comments to that. I will create separate PR to provide docker service command flags.

+1

+1

@kidfrom there is full implementation on included Docker engine codebase already and it will be part of next version. Cli support is also implemented but unfortunately those PRs have been open soon one year waiting for someone of maintainers to finalize review. Last status message you can see on https://github.com/moby/moby/issues/25885#issuecomment-557790402

@tconrado I have no idea who is @tjmehta or why you did ping him/her here but from my side ETA is still next version. 19.03 looks to be delayed so I assume that there will not be 19.06 so most probably it is 19.09 (which code freeze is on September) and hopefully that is released still during this year

One of the principles of keeping Docker Swarm simple is that everything is avail in service create, service update and stack yaml. You can expect that a feature is implemented in all three. Teams have reasons for going with services-only or stacks-only, so I’d prefer not to see this diverge from the original goals. Any real-world service command is “ugly” in that it’s hundreds of characters and not easily typed, but not every use case can/will use stacks.

My vote is it’s in all the commands, or it’ll be less useful.

Any news adding capabilities to swarm services?

Please add! needed for vault

@albers Sorry for that. I just pasted the URI which I use in the compose file to pull the image from elastic.co. Installation docs They use cap_add: and - IPC_LOCK in their example file.

I would like to use keepalived in a swarm, which requires NET_ADMIN capability.

Building on the answer from @akomelj (thank you so much for this!), I’ve expanded it slightly to better mimic privileged mode.

Looking at https://github.com/docker/swarmkit/issues/1030#issuecomment-231144514, there are more things to do, specifically regarding device mounts, and to apply every capability in existence. See code.

#!/usr/bin/python3
import json
import os
import pathlib
from typing import List

import sys

# default runc binary
NEXT_RUNC = "/usr/bin/runc"

# capabilities to add to every container
# http://man7.org/linux/man-pages/man7/capabilities.7.html
ADDITIONAL_CAPABILITIES = [
    "CAP_AUDIT_CONTROL", "CAP_AUDIT_READ", "CAP_AUDIT_WRITE", "CAP_BLOCK_SUSPEND",
    "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_DAC_READ_SEARCH", "CAP_FOWNER", "CAP_FSETID",
    "CAP_IPC_LOCK", "CAP_IPC_OWNER", "CAP_KILL", "CAP_LEASE", "CAP_LINUX_IMMUTABLE",
    "CAP_MAC_ADMIN", "CAP_MAC_OVERRIDE", "CAP_MKNOD", "CAP_NET_ADMIN",
    "CAP_NET_BIND_SERVICE", "CAP_NET_BROADCAST", "CAP_NET_RAW", "CAP_SETGID",
    "CAP_SETFCAP", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_ADMIN", "CAP_SYS_BOOT",
    "CAP_SYS_CHROOT", "CAP_SYS_MODULE", "CAP_SYS_NICE", "CAP_SYS_PACCT",
    "CAP_SYS_PTRACE", "CAP_SYS_RAWIO", "CAP_SYS_RESOURCE", "CAP_SYS_TIME",
    "CAP_SYS_TTY_CONFIG", "CAP_SYSLOG", "CAP_WAKE_ALARM"
]


# mimics GetDevices in
# https://github.com/opencontainers/runc/blob/master/libcontainer/devices/devices.go
def get_devices(path: pathlib.Path) -> List[pathlib.Path]:
    result = []
    children = list(path.iterdir())
    for c in children:
        if c.is_dir():
            if c.name not in ["pts", "shm", "fd", "mqueue",
                              ".lxc", ".lxd-mounts", ".udev"]:
                result.extend(get_devices(c))
        elif c.name == "console" or c.name.startswith("video"):
            continue
        else:
            result.append(c)

    result = [d for d in result
              if d.exists() and (d.is_block_device() or d.is_char_device())]

    return result


# adds capabilities and devices to a bundle by extending its config.json
def add_capabilities(bundle, capabilities):
    with open(bundle + "/config.json") as config_file:
        config = json.load(config_file)

    config["process"]["capabilities"]["bounding"].extend(capabilities)
    config["process"]["capabilities"]["effective"].extend(capabilities)
    config["process"]["capabilities"]["inheritable"].extend(capabilities)
    config["process"]["capabilities"]["permitted"].extend(capabilities)

    for c in config["linux"]["resources"]["devices"]:
        c["allow"] = True

    # mimics WithDevices in
    # https://github.com/moby/moby/blob/master/daemon/oci_linux.go
    device_paths = get_devices(pathlib.Path("/dev/"))
    config["linux"]["devices"] = [
        {
            "type": "c",
            "path": str(d),
            "minor": os.minor(os.stat(str(d.resolve())).st_rdev),
            "access": "rwm",
            "allow": True,
            "major": os.major(os.stat(str(d.resolve())).st_rdev),
            "uid": 0,
            "gid": 0,
            "filemode": 777
        }
        for d in device_paths
    ]

    with open(bundle + "/config.json", "w") as config_file:
        json.dump(config, config_file)

    with open("/tmp/runcdebug.json", "w") as debug_file:
        json.dump(config, debug_file)


def main():
    for i in range(len(sys.argv)):
        if sys.argv[i] == "--bundle":
            bundle_filename = sys.argv[i + 1]
            add_capabilities(bundle_filename, ADDITIONAL_CAPABILITIES)
            break

    os.execv(NEXT_RUNC, sys.argv)


if __name__ == '__main__':
    main()

Changes include:

To apply changes, do the following:

#!/bin/sh

set -e
set -u

# runc-hack.py is the above spaghetti
cp runc-hack.py /root/runc-hack
chmod u+x /root/runc-hack

cp /etc/docker/daemon.json /etc/docker/daemon.json.old || true
if [ -f /etc/docker/daemon.json ];
    then cat /etc/docker/daemon.json
    else echo "{}"
fi \
    | jq '.+ {"runtimes": {"runc-hack": {"path": "/root/runc-hack"}},
"default-runtime": "runc-hack"}' \
    | tee /etc/docker/daemon.json.new
mv /etc/docker/daemon.json.new /etc/docker/daemon.json

systemctl daemon-reload
systemctl restart docker

Verified on a Swarm worker with Engine 19.03.1 on Debian 9, the master did not have this fix applied.

This is still a huge hack and it Works For Me™. Don’t use it irresponsibly. It was the least bad solution to my problem and I feel very dirty using it. But hey, it’s up to everyone to decide for themselves.

edit@2019-12-30: a slight misscripting in the deployment section edit@2020-01-10: add failing on error to the deployment script

Is NET_ADMIN capability in Swarm mode going to be a thing?

I’m trying to run a container to redirect traffic to an non containerised destination using netfilter (iptables) and this container should be reachable through a Traefik swarm deployment, configuring it just using variables and stack definitions.

Scenario With NET_ADMIN caps :

Traefik → “host1 match” → container1_running_apache_service Traefik → “host2 match” → container2_running_nextcloud_service Traefik → “host3 match” → container3_with_net_admin_caps(redir to) → Non-containerised-destination

Scenario without them:

Traefik → “host1 match” → container1_running_apache_service Traefik → “host2 match” → container2_running_nextcloud_service Traefik → “host3 match” → container3_ANOTHER_PROXY → Non-containerised-destination

Any update?

+1

You can also expand the entrypoint.sh file and add the following before # pull latest image version:

# does a docker login first
if [ -n "${LOGIN_USER}" ] && [ -n "${LOGIN_PASSWORD}" ]; then
  echo "Logging in"
  echo "${LOGIN_PASSWORD}" | docker login -u "${LOGIN_USER}" --password-stdin ${LOGIN_REGISTRY}
fi

For convenience, I’ve packed everything in a docker repository here: ixdotai/swarm-launcher

@arseniybanayev thanks for replying and excellent solution to this problem.

I actually had to test this as I’m dying to get rid of hacked runc provisioning on my Swarm and it works flawlessly! I created a general purpose light-weight image from docker:latest - this image simply spins up a new container based on passed-in environment variables.

In case anyone tries the same route - here are Dockerfile configuration and entrypoint.sh script for building your own launcher image. Admittedly, launch could be done with a single environment variable but I wanted to split configuration of child containers to multiple variables just for clarity. Both files should be self-explanatory.

Dockerfile:

# official Docker (CLI) image
FROM docker:latest

# launch parameters
ENV LAUNCH_IMAGE            hello-world
ENV LAUNCH_PULL             false
ENV LAUNCH_CONTAINER_NAME=  
ENV LAUNCH_PRIVILEGED       false
ENV LAUNCH_INTERACTIVE      false
ENV LAUNCH_TTY              false
ENV LAUNCH_HOST_NETWORK     false
ENV LAUNCH_ENVIRONMENT=
ENV LAUNCH_VOLUMES=
ENV LAUNCH_EXTRA_ARGS=

# add entrypoint.sh launcher script
ADD entrypoint.sh   /

# run the image
ENTRYPOINT /entrypoint.sh

entrypoint.sh:

#!/bin/sh
# pull latest image version
if [ "$LAUNCH_PULL" = true ]; then
    echo "Pulling $LAUNCH_IMAGE: docker pull $LAUNCH_IMAGE"
    docker pull $LAUNCH_IMAGE
fi

# build launch parameters
DOCKER_ARGS="run --rm"
[ -n "$LAUNCH_CONTAINER_NAME" ] && DOCKER_ARGS="$DOCKER_ARGS --name $LAUNCH_CONTAINER_NAME"
[ "$LAUNCH_PRIVILEGED" = true ] && DOCKER_ARGS="$DOCKER_ARGS --privileged"
[ "$LAUNCH_INTERACTIVE" = true ] && DOCKER_ARGS="$DOCKER_ARGS -i"
[ "$LAUNCH_TTY" = true ] && DOCKER_ARGS="$DOCKER_ARGS -t"
[ "$LAUNCH_HOST_NETWORK" = true ] && DOCKER_ARGS="$DOCKER_ARGS --net host"
[ "$LAUNCH_PRIVILEGED" = true ] && DOCKER_ARGS="$DOCKER_ARGS --privileged"
DOCKER_ARGS="$DOCKER_ARGS $LAUNCH_ENVIRONMENT $LAUNCH_VOLUMES $LAUNCH_EXTRA_ARGS $LAUNCH_IMAGE"

echo "Running $LAUNCH_IMAGE: exec docker $DOCKER_ARGS"
exec docker $DOCKER_ARGS

And here are the relevant Stack parts using launcher image from above to launch another container.

version: "3.5"

services:
  gate:
    image: registry.aember.com:5000/aember/swarm-launcher:latest

    environment:
      LAUNCH_IMAGE: registry.aember.com:5000/sh-btq-gate:latest
      LAUNCH_PULL: "true"
      LAUNCH_PRIVILEGED: "true"
      LAUNCH_HOST_NETWORK: "true"
      LAUNCH_ENVIRONMENT: "--env INSTANCE={{.Node.Hostname}}"
      LAUNCH_VOLUMES: "-v /var/run/btq.json:/btq.json -v /docker/data/btq:/var/run/btq -v /etc/localtime:/etc/localtime:ro"

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

openjdk( jmap -heap) need open SYS_PTRACE

+1 Can not run the million12/haproxy servcie in swarmmode when missing the CAP_ADD NET_ADMIN

Please add! needed for fuse

+1

Depending on the use case, a workaround is to bind-mount /var/run/docker.sock from the swarm host(s) to the service, then run docker run --privileged ... or docker run --cap-add ... from the service for executing your actual privileged commands. (You’ll have to install docker cli on the image for the service.) The innermost container that you docker run in this way will have the privileges/capabilities of the swarm host rather than of the service, and the service just becomes a thin container layer.

My use case was a Jenkins agent swarm cloud (see https://github.com/jenkinsci/docker-swarm-plugin/issues/58), and I already had the host’s /var/run/docker.sock bind-mounted onto the service for doing things like docker stack deploy ..., so this was a natural workaround for running commands in a Jenkins build that required capabilities (like mounting an NFS drive for deployment).

+1 for NET_ADMIN

3 years and counting…

+1 needed for headless chrome/puppeteer

is there an ETA yet?

@albers check this gist to check the config of both keepalived. And also the CMD command of the Dockerfile -> https://gist.github.com/joaquin386/44293cc729f1715601b18b5c8e6fdfda What I saw as important was: -create a macvlan network (10.100.11.0) (from puppet) docker_network { “external-10.100.11.0”: ensure => present, driver => ‘macvlan’, subnet => “10.100.11.107/24”, gateway => “10.100.11.1”, options => [“macvlan_mode=bridge”,“parent=ens160”], } On Dockerfile add the network: networks: frontend: external: name: external-10.100.11.0

ON the Dockerfile on the services add: networks: frontend: ipv4_address: 10.100.11.107 sysctls: - net.ipv4.ip_nonlocal_bind=1

I have this value also because I use it for OPEN VPN (I do not know if this one is needed for keepalived but for sure it is needed for OpenVPN): cap_add: - NET_ADMIN

You will have in your docker image an eth0 interface which will be used for the keepalived.

Guys I am really trying to follow up you on that but I’m unable so I am asking you if you could help please; maybe @tlex or @akomelj.

What I am having as probably most of us discussed in here is container that I need to run with cap-add=NET_ADMIN and devices=/dev/net/tun:/dev/net/tun (this is required for pulling up a openvpn connection from docker worker container actually) OR it works also without this flags but with --privileged.

My ready-to work images are laying down on nodes by name of “dvv”.

This is what worked when I externally was establishing openvpn connection:

sudo docker service create -e access_token=something --mode global --name "DVV" dvv

Now, I want to move connection inside of the container. I’ve done it; but as I require to run all this in swarm and swarm obviously does not support the higher privileges, I am trying to understand how to do it actually with either yaml docker-compose file or a single command. I don’t have a lot of experience with docker service creation. I am trying the following:

sudo docker service create -e LAUNCH_IMAGE=dvv -e LAUNCH_PRIVILEGED="true" -e LAUNCH_ENVIRONMENTS="access_token=something" ixdotai/swarm-launcher:dev-master

But it does not seems to work… I think it works if I run it manually with docker run -v /var/run/docker.sock:/var/run/docker.sock ... ... but -v yet again is not supported by services… Can you please guide me through this situation. On how exactly is to run privileged container via this wrapper. Consider that I am didn’t build a lot of docker services 😄

Thanks

@information-security completion is not part of the PRs yet. @olljanat I can take care of bash completion when your PRs are merged.

Hello, In which version of docker it will be available to be used as parameters of the docker compose? Thanks!

That sounds like the best option, and could be implemented by the manager explicitly sending the set of capabilities along with any task, even when the default set is requested (by whatever means that is expressed).

That’s a bit of a grey area; IIRC, there have been some discussions in the past about “altering” the create/update requests server-side. Those boiled down to; an API call to create a service, followed by an API call to inspect that service should produce the same information (baring current ‘state’ etc.).

I commented similar things on a couple of other PR’s; what would (likely) be needed is a way for the client to get the defaults from the manager/daemon, so sequence of events would be something like;

Create a service:

  • fetch defaults
  • apply config set by user to the defaults
  • send create request to the daemon/manager

Update a service

  • fetch current service-spec
  • apply changes set by user
  • send update request to the daemon/manager

Why not —cap-add looks it is in containers ?

@prologic because then switches on service update would be –cap-add-add and –cap-add-rm which is ugly. It is mentioned on old commets/PRs and was biggest reason why original implementation was not approved couple of years ago.

EDIT: link to original comment https://github.com/moby/moby/pull/26849#discussion_r80228719

@trajano there is proposal on docker/swarmkit#2682

Comment to there if it fits to your needs?

EDIT: There now there looks to be suggested solution on this message: https://github.com/moby/moby/issues/24862#issuecomment-428308152

any update on this? is there any plans to include the feature in swarm mode.

I’m currently using swarm standalone to deploy my containers with cap_add in a cloud but I’m encountering many issues… swarm mode would ease the pain

please give us an ETA on this

thanks

It would be worth trying. I checked with the guys at Pimoroni and they advised against using these interfaces claiming high latency.