moby: Missing from Swarmmode --cap-add

Some form of --cap-add or optional elevated privilege system would be required for accessing GPIO pins on ARM devices. Since ARM is becoming better supported by the Docker engine I would like to raise this to attention.

We tend to need to write to /dev/mem and there is currently a capability for that in “regular flavoured swarm”.

I would like to build out some IoT PoCs with Docker and swarmmode and support for this would really help. CC/ @DieterReuter @StefanScherer

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 121
Comments: 104 (38 by maintainers)

Commits related to this issue

Add --cap-add as and --cap-drop to swarm services Fixes #25885 Having this in place we can now use capabilities in the following CLI calls respectively in the corresponding API calls. - docker servi... — committed to alex-berger/docker by deleted user 8 years ago
until https://github.com/moby/moby/issues/25885 is fixed — committed to duncan-brown/ce-it-infrastructure by duncan-brown 5 years ago
until https://github.com/moby/moby/issues/25885 is fixed — committed to cosmic-explorer/ce-it-infrastructure by duncan-brown 5 years ago

Most upvoted comments

FYI. This is very unofficial information but I try tell something what I know because people are very eagerly asking about this feature.

Because of Mirantis to acquired Docker Enterprise and some of Docker Inc employees was moved there it is currently very unclear when they are able to get release process working again which why at least I don’t know that what will be the next Docker version or when it will be released.

However, whole feature is implemented and works as far I can see so who ever want to test it can do it by downloading latest nightly build of Docker engine (dockerd) from https://master.dockerproject.org and my custom build version of Docker CLI from https://github.com/olljanat/cli/releases/tag/beta1 You can also find usage examples for CLI from https://github.com/docker/cli/pull/2199 and for Stack from https://github.com/docker/cli/pull/1940 If you find bugs from those please leave comment to correct PR. Also notice that syntax might still change during review.

+36

olljanat on Nov 23, 2019

FYI. I got bored to follow this not progressing discussion so I started to implement this one.

It will need multiple PRs:

Allow give exact list of capabilities instead of add/drop default ones: #38380
Moby bump to Swarmkit
Swarmkit side implementation docker/swarmkit#2795
Swarmkit bump to Moby
Another PR to moby to support Swarmkit side Capabilities setting #39173
Swarmkit and Moby bump to docker/cli
Client side implementation with stack docker/cli#1940
Client side implementation without stack docker/cli#2199
Docs update.

So these will not be ready anytime soon but maybe before summer…

+33

olljanat on Nov 17, 2019

+1 for at least being able to add NET_ADMIN capability.

+22

greyarch on Aug 22, 2017

FWIW, elasticsearch requires the IPC_LOCK capability, making it impossible to deploy a Swarm Mode stack with ElasticSearch until this is resolved…

+21

sirlatrom on Mar 27, 2017

+18

azzeddinefaik on Apr 25, 2017

@megastef @redhog status and schedule on https://github.com/moby/moby/issues/25885#issuecomment-447657852 is still valid. First part of solution ( #38380 ) will be as part of API version 1.40 (which is released as part of Docker 19.03) and rest of the solution will be part of 1.41 API version (what ever Docker version will contain it).

It needed two versions as old solution needed quite big refactor (you can see whole discussion on #38380 if you are interested).

+15

olljanat on Apr 2, 2019

Need cap_add NET_ADMIN for kylemanna/openvpn on docker stack

+15

adminrezo on Jul 2, 2018

Status update: This have been a bit hold during summer but I got now cap_add/cap_drop/privileged settings working with stack using https://github.com/docker/cli/pull/1940 PTAL and provide comments to that. I will create separate PR to provide docker service command flags.

+12

olljanat on Sep 10, 2019

+10

cost6 on Feb 19, 2019

eyga27 on Feb 19, 2019

@kidfrom there is full implementation on included Docker engine codebase already and it will be part of next version. Cli support is also implemented but unfortunately those PRs have been open soon one year waiting for someone of maintainers to finalize review. Last status message you can see on https://github.com/moby/moby/issues/25885#issuecomment-557790402

olljanat on Jun 1, 2020

denis-isaev on Jun 15, 2018

@tconrado I have no idea who is @tjmehta or why you did ping him/her here but from my side ETA is still next version. 19.03 looks to be delayed so I assume that there will not be 19.06 so most probably it is 19.09 (which code freeze is on September) and hopefully that is released still during this year

olljanat on Jul 18, 2019

One of the principles of keeping Docker Swarm simple is that everything is avail in service create, service update and stack yaml. You can expect that a feature is implemented in all three. Teams have reasons for going with services-only or stacks-only, so I’d prefer not to see this diverge from the original goals. Any real-world service command is “ugly” in that it’s hundreds of characters and not easily typed, but not every use case can/will use stacks.

My vote is it’s in all the commands, or it’ll be less useful.

BretFisher on Jun 11, 2019

Any news adding capabilities to swarm services?

megastef on Mar 27, 2019

Please add! needed for vault

dafenko on Mar 15, 2018

@albers Sorry for that. I just pasted the URI which I use in the compose file to pull the image from elastic.co. Installation docs They use cap_add: and - IPC_LOCK in their example file.

a-jung on Apr 5, 2017

I would like to use keepalived in a swarm, which requires NET_ADMIN capability.

albers on Mar 16, 2017

Building on the answer from @akomelj (thank you so much for this!), I’ve expanded it slightly to better mimic privileged mode.

Looking at https://github.com/docker/swarmkit/issues/1030#issuecomment-231144514, there are more things to do, specifically regarding device mounts, and to apply every capability in existence. See code.

#!/usr/bin/python3
import json
import os
import pathlib
from typing import List

import sys

# default runc binary
NEXT_RUNC = "/usr/bin/runc"

# capabilities to add to every container
# http://man7.org/linux/man-pages/man7/capabilities.7.html
ADDITIONAL_CAPABILITIES = [
    "CAP_AUDIT_CONTROL", "CAP_AUDIT_READ", "CAP_AUDIT_WRITE", "CAP_BLOCK_SUSPEND",
    "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_DAC_READ_SEARCH", "CAP_FOWNER", "CAP_FSETID",
    "CAP_IPC_LOCK", "CAP_IPC_OWNER", "CAP_KILL", "CAP_LEASE", "CAP_LINUX_IMMUTABLE",
    "CAP_MAC_ADMIN", "CAP_MAC_OVERRIDE", "CAP_MKNOD", "CAP_NET_ADMIN",
    "CAP_NET_BIND_SERVICE", "CAP_NET_BROADCAST", "CAP_NET_RAW", "CAP_SETGID",
    "CAP_SETFCAP", "CAP_SETPCAP", "CAP_SETUID", "CAP_SYS_ADMIN", "CAP_SYS_BOOT",
    "CAP_SYS_CHROOT", "CAP_SYS_MODULE", "CAP_SYS_NICE", "CAP_SYS_PACCT",
    "CAP_SYS_PTRACE", "CAP_SYS_RAWIO", "CAP_SYS_RESOURCE", "CAP_SYS_TIME",
    "CAP_SYS_TTY_CONFIG", "CAP_SYSLOG", "CAP_WAKE_ALARM"
]


# mimics GetDevices in
# https://github.com/opencontainers/runc/blob/master/libcontainer/devices/devices.go
def get_devices(path: pathlib.Path) -> List[pathlib.Path]:
    result = []
    children = list(path.iterdir())
    for c in children:
        if c.is_dir():
            if c.name not in ["pts", "shm", "fd", "mqueue",
                              ".lxc", ".lxd-mounts", ".udev"]:
                result.extend(get_devices(c))
        elif c.name == "console" or c.name.startswith("video"):
            continue
        else:
            result.append(c)

    result = [d for d in result
              if d.exists() and (d.is_block_device() or d.is_char_device())]

    return result


# adds capabilities and devices to a bundle by extending its config.json
def add_capabilities(bundle, capabilities):
    with open(bundle + "/config.json") as config_file:
        config = json.load(config_file)

    config["process"]["capabilities"]["bounding"].extend(capabilities)
    config["process"]["capabilities"]["effective"].extend(capabilities)
    config["process"]["capabilities"]["inheritable"].extend(capabilities)
    config["process"]["capabilities"]["permitted"].extend(capabilities)

    for c in config["linux"]["resources"]["devices"]:
        c["allow"] = True

    # mimics WithDevices in
    # https://github.com/moby/moby/blob/master/daemon/oci_linux.go
    device_paths = get_devices(pathlib.Path("/dev/"))
    config["linux"]["devices"] = [
        {
            "type": "c",
            "path": str(d),
            "minor": os.minor(os.stat(str(d.resolve())).st_rdev),
            "access": "rwm",
            "allow": True,
            "major": os.major(os.stat(str(d.resolve())).st_rdev),
            "uid": 0,
            "gid": 0,
            "filemode": 777
        }
        for d in device_paths
    ]

    with open(bundle + "/config.json", "w") as config_file:
        json.dump(config, config_file)

    with open("/tmp/runcdebug.json", "w") as debug_file:
        json.dump(config, debug_file)


def main():
    for i in range(len(sys.argv)):
        if sys.argv[i] == "--bundle":
            bundle_filename = sys.argv[i + 1]
            add_capabilities(bundle_filename, ADDITIONAL_CAPABILITIES)
            break

    os.execv(NEXT_RUNC, sys.argv)


if __name__ == '__main__':
    main()

Changes include:

adding all capabilities from man 7 capabilities
allowing all char/block device access
actually binding devices to the container according to the policies in
- opencontainers/runc:libcontainer/devices/devices.go#GetDevices
- moby/moby:daemon/oci_linux.go#WithDevices
providing a debug bundle file in /tmp/runcdebug.json just because
PEP8 compliance, why not

To apply changes, do the following:

#!/bin/sh

set -e
set -u

# runc-hack.py is the above spaghetti
cp runc-hack.py /root/runc-hack
chmod u+x /root/runc-hack

cp /etc/docker/daemon.json /etc/docker/daemon.json.old || true
if [ -f /etc/docker/daemon.json ];
    then cat /etc/docker/daemon.json
    else echo "{}"
fi \
    | jq '.+ {"runtimes": {"runc-hack": {"path": "/root/runc-hack"}},
"default-runtime": "runc-hack"}' \
    | tee /etc/docker/daemon.json.new
mv /etc/docker/daemon.json.new /etc/docker/daemon.json

systemctl daemon-reload
systemctl restart docker

Verified on a Swarm worker with Engine 19.03.1 on Debian 9, the master did not have this fix applied.

This is still a huge hack and it Works For Me™. Don’t use it irresponsibly. It was the least bad solution to my problem and I feel very dirty using it. But hey, it’s up to everyone to decide for themselves.

edit@2019-12-30: a slight misscripting in the deployment section edit@2020-01-10: add failing on error to the deployment script

sstanovnik on Jan 10, 2020

Is NET_ADMIN capability in Swarm mode going to be a thing?

I’m trying to run a container to redirect traffic to an non containerised destination using netfilter (iptables) and this container should be reachable through a Traefik swarm deployment, configuring it just using variables and stack definitions.

Scenario With NET_ADMIN caps :

Traefik → “host1 match” → container1_running_apache_service Traefik → “host2 match” → container2_running_nextcloud_service Traefik → “host3 match” → container3_with_net_admin_caps(redir to) → Non-containerised-destination

Scenario without them:

Traefik → “host1 match” → container1_running_apache_service Traefik → “host2 match” → container2_running_nextcloud_service Traefik → “host3 match” → container3_ANOTHER_PROXY → Non-containerised-destination

droberin on May 3, 2018

Any update?

dtitov on Apr 27, 2018

arruw on Apr 7, 2018

You can also expand the entrypoint.sh file and add the following before # pull latest image version:

# does a docker login first
if [ -n "${LOGIN_USER}" ] && [ -n "${LOGIN_PASSWORD}" ]; then
  echo "Logging in"
  echo "${LOGIN_PASSWORD}" | docker login -u "${LOGIN_USER}" --password-stdin ${LOGIN_REGISTRY}
fi

For convenience, I’ve packed everything in a docker repository here: ixdotai/swarm-launcher

tlex on Jan 27, 2020

@arseniybanayev thanks for replying and excellent solution to this problem.

I actually had to test this as I’m dying to get rid of hacked runc provisioning on my Swarm and it works flawlessly! I created a general purpose light-weight image from docker:latest - this image simply spins up a new container based on passed-in environment variables.

In case anyone tries the same route - here are Dockerfile configuration and entrypoint.sh script for building your own launcher image. Admittedly, launch could be done with a single environment variable but I wanted to split configuration of child containers to multiple variables just for clarity. Both files should be self-explanatory.

Dockerfile:

# official Docker (CLI) image
FROM docker:latest

# launch parameters
ENV LAUNCH_IMAGE            hello-world
ENV LAUNCH_PULL             false
ENV LAUNCH_CONTAINER_NAME=  
ENV LAUNCH_PRIVILEGED       false
ENV LAUNCH_INTERACTIVE      false
ENV LAUNCH_TTY              false
ENV LAUNCH_HOST_NETWORK     false
ENV LAUNCH_ENVIRONMENT=
ENV LAUNCH_VOLUMES=
ENV LAUNCH_EXTRA_ARGS=

# add entrypoint.sh launcher script
ADD entrypoint.sh   /

# run the image
ENTRYPOINT /entrypoint.sh

entrypoint.sh:

#!/bin/sh
# pull latest image version
if [ "$LAUNCH_PULL" = true ]; then
    echo "Pulling $LAUNCH_IMAGE: docker pull $LAUNCH_IMAGE"
    docker pull $LAUNCH_IMAGE
fi

# build launch parameters
DOCKER_ARGS="run --rm"
[ -n "$LAUNCH_CONTAINER_NAME" ] && DOCKER_ARGS="$DOCKER_ARGS --name $LAUNCH_CONTAINER_NAME"
[ "$LAUNCH_PRIVILEGED" = true ] && DOCKER_ARGS="$DOCKER_ARGS --privileged"
[ "$LAUNCH_INTERACTIVE" = true ] && DOCKER_ARGS="$DOCKER_ARGS -i"
[ "$LAUNCH_TTY" = true ] && DOCKER_ARGS="$DOCKER_ARGS -t"
[ "$LAUNCH_HOST_NETWORK" = true ] && DOCKER_ARGS="$DOCKER_ARGS --net host"
[ "$LAUNCH_PRIVILEGED" = true ] && DOCKER_ARGS="$DOCKER_ARGS --privileged"
DOCKER_ARGS="$DOCKER_ARGS $LAUNCH_ENVIRONMENT $LAUNCH_VOLUMES $LAUNCH_EXTRA_ARGS $LAUNCH_IMAGE"

echo "Running $LAUNCH_IMAGE: exec docker $DOCKER_ARGS"
exec docker $DOCKER_ARGS

And here are the relevant Stack parts using launcher image from above to launch another container.

version: "3.5"

services:
  gate:
    image: registry.aember.com:5000/aember/swarm-launcher:latest

    environment:
      LAUNCH_IMAGE: registry.aember.com:5000/sh-btq-gate:latest
      LAUNCH_PULL: "true"
      LAUNCH_PRIVILEGED: "true"
      LAUNCH_HOST_NETWORK: "true"
      LAUNCH_ENVIRONMENT: "--env INSTANCE={{.Node.Hostname}}"
      LAUNCH_VOLUMES: "-v /var/run/btq.json:/btq.json -v /docker/data/btq:/var/run/btq -v /etc/localtime:/etc/localtime:ro"

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

akomelj on Jan 12, 2020

openjdk（ jmap -heap） need open SYS_PTRACE

An0nymous0 on Jun 25, 2018

IanGrima on Apr 20, 2018

+1 Can not run the million12/haproxy servcie in swarmmode when missing the CAP_ADD NET_ADMIN

mengzyou on Apr 17, 2018

Please add! needed for fuse

bitsofinfo on Feb 9, 2018

bdeluca on Sep 26, 2017

Depending on the use case, a workaround is to bind-mount /var/run/docker.sock from the swarm host(s) to the service, then run docker run --privileged ... or docker run --cap-add ... from the service for executing your actual privileged commands. (You’ll have to install docker cli on the image for the service.) The innermost container that you docker run in this way will have the privileges/capabilities of the swarm host rather than of the service, and the service just becomes a thin container layer.

My use case was a Jenkins agent swarm cloud (see https://github.com/jenkinsci/docker-swarm-plugin/issues/58), and I already had the host’s /var/run/docker.sock bind-mounted onto the service for doing things like docker stack deploy ..., so this was a natural workaround for running commands in a Jenkins build that required capabilities (like mounting an NFS drive for deployment).

arseniybanayev on Jan 12, 2020

+1 for NET_ADMIN

3 years and counting…

redhog on Apr 2, 2019

+1 needed for headless chrome/puppeteer

is there an ETA yet?

benpetsch on Apr 4, 2018

@albers check this gist to check the config of both keepalived. And also the CMD command of the Dockerfile -> https://gist.github.com/joaquin386/44293cc729f1715601b18b5c8e6fdfda What I saw as important was: -create a macvlan network (10.100.11.0) (from puppet) docker_network { “external-10.100.11.0”: ensure => present, driver => ‘macvlan’, subnet => “10.100.11.107/24”, gateway => “10.100.11.1”, options => [“macvlan_mode=bridge”,“parent=ens160”], } On Dockerfile add the network: networks: frontend: external: name: external-10.100.11.0

ON the Dockerfile on the services add: networks: frontend: ipv4_address: 10.100.11.107 sysctls: - net.ipv4.ip_nonlocal_bind=1

I have this value also because I use it for OPEN VPN (I do not know if this one is needed for keepalived but for sure it is needed for OpenVPN): cap_add: - NET_ADMIN

You will have in your docker image an eth0 interface which will be used for the keepalived.

joaquin386 on Jan 19, 2018

Guys I am really trying to follow up you on that but I’m unable so I am asking you if you could help please; maybe @tlex or @akomelj.

What I am having as probably most of us discussed in here is container that I need to run with cap-add=NET_ADMIN and devices=/dev/net/tun:/dev/net/tun (this is required for pulling up a openvpn connection from docker worker container actually) OR it works also without this flags but with --privileged.

My ready-to work images are laying down on nodes by name of “dvv”.

This is what worked when I externally was establishing openvpn connection:

sudo docker service create -e access_token=something --mode global --name "DVV" dvv

Now, I want to move connection inside of the container. I’ve done it; but as I require to run all this in swarm and swarm obviously does not support the higher privileges, I am trying to understand how to do it actually with either yaml docker-compose file or a single command. I don’t have a lot of experience with docker service creation. I am trying the following:

sudo docker service create -e LAUNCH_IMAGE=dvv -e LAUNCH_PRIVILEGED="true" -e LAUNCH_ENVIRONMENTS="access_token=something" ixdotai/swarm-launcher:dev-master

But it does not seems to work… I think it works if I run it manually with docker run -v /var/run/docker.sock:/var/run/docker.sock ... ... but -v yet again is not supported by services… Can you please guide me through this situation. On how exactly is to run privileged container via this wrapper. Consider that I am didn’t build a lot of docker services 😄

Thanks

sxiii on Feb 13, 2020

@information-security completion is not part of the PRs yet. @olljanat I can take care of bash completion when your PRs are merged.

albers on Dec 17, 2019

Hello, In which version of docker it will be available to be used as parameters of the docker compose? Thanks!

gpulido on Oct 31, 2019

That sounds like the best option, and could be implemented by the manager explicitly sending the set of capabilities along with any task, even when the default set is requested (by whatever means that is expressed).

That’s a bit of a grey area; IIRC, there have been some discussions in the past about “altering” the create/update requests server-side. Those boiled down to; an API call to create a service, followed by an API call to inspect that service should produce the same information (baring current ‘state’ etc.).

I commented similar things on a couple of other PR’s; what would (likely) be needed is a way for the client to get the defaults from the manager/daemon, so sequence of events would be something like;

Create a service:

fetch defaults
apply config set by user to the defaults
send create request to the daemon/manager

Update a service

fetch current service-spec
apply changes set by user
send update request to the daemon/manager

thaJeztah on Jun 12, 2019

Why not —cap-add looks it is in containers ?

@prologic because then switches on service update would be –cap-add-add and –cap-add-rm which is ugly. It is mentioned on old commets/PRs and was biggest reason why original implementation was not approved couple of years ago.

EDIT: link to original comment https://github.com/moby/moby/pull/26849#discussion_r80228719

olljanat on Jun 11, 2019

@trajano there is proposal on docker/swarmkit#2682

Comment to there if it fits to your needs?

EDIT: There now there looks to be suggested solution on this message: https://github.com/moby/moby/issues/24862#issuecomment-428308152

olljanat on Oct 9, 2018

any update on this? is there any plans to include the feature in swarm mode.

I’m currently using swarm standalone to deploy my containers with cap_add in a cloud but I’m encountering many issues… swarm mode would ease the pain

please give us an ETA on this

thanks

magg on Apr 13, 2017

It would be worth trying. I checked with the guys at Pimoroni and they advised against using these interfaces claiming high latency.

alexellis on Sep 13, 2016