libnetwork: Cannot start container: subnet sandbox join failed for "10.0.0.0/24": error creating vxlan interface: file exists

(Similar, but closed, issues: #562, #751)

Very occasionally, I see this error message when starting a container in my swarm:

Error response from daemon: Cannot start container <container hash>: subnet sandbox join failed for "10.0.0.0/24": error creating vxlan interface: file exists

This error persists until I reboot the docker hosts. A comment on #751 suggested that restarting iptables would suffice; I have not tried this yet. I also have tried the solution mentioned in #562 previously, and I believe that worked as well, but I cannot remember for sure.

docker version:

Client:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 18:18:11 2016
 OS/Arch:      darwin/amd64

Server:
 Version:      swarm/1.1.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   a0fd82b
 Built:        Thu Feb  4 08:55:18 UTC 2016
 OS/Arch:      linux/amd64

docker info:

Note: this was not captured when the error was occurring. If it happens again, I will comment with the info.

Containers: 32
 Running: 29
 Paused: 0
 Stopped: 3
Images: 222
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 4
 druid-vms.historical1: <host3>:2376
  └ Status: Healthy
  └ Containers: 4
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: executiondriver=native-0.2, historical=yes, kernelversion=3.16.0-30-generic, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T15:00:03Z
 druid-vms.historical2: <host4>:2376
  └ Status: Healthy
  └ Containers: 4
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: executiondriver=native-0.2, historical=yes, kernelversion=3.16.0-30-generic, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T15:00:24Z
 druid-vms.ingestion1: <host2>:2376
  └ Status: Healthy
  └ Containers: 5
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: executiondriver=native-0.2, ingestion=yes, kernelversion=3.16.0-30-generic, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T14:59:45Z
 druid-vms.queen: <host1>:2376
  └ Status: Healthy
  └ Containers: 19
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: broker=yes, consul=yes, deployment-master=yes, executiondriver=native-0.2, kernelversion=3.16.0-30-generic, metadata=yes, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs, swarm-master=yes
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T15:00:22Z
Plugins: 
 Volume: 
 Network: 
Kernel Version: 3.16.0-30-generic
Operating System: linux
Architecture: amd64
CPUs: 8
Total Memory: 32.75 GiB
Name: druid-vms.queen
Http Proxy: <snip>
Https Proxy: <snip>
No Proxy: <snip>

uname -a:

Local: Darwin <hostname> 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64
Docker hosts: Linux <hostname> 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.):

Docker Swarm, 4 pre-existing VMs, generic driver

How reproducible:

Intermittent; not reliably reproducible (and I wouldn’t begin to know how to try). Seems to occur sometime after destroying and re-creating the swarm using docker-machine.

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 3
Comments: 34 (7 by maintainers)

Commits related to this issue

changelog for 0.7.0-rc.1 - Fixes https://github.com/docker/libnetwork/issues/985 - Fixes https://github.com/docker/libnetwork/issues/945 - Log time taken to set sandbox key - Limit number of concurre... — committed to mavenugo/libnetwork by mavenugo 8 years ago
Vendor Libnetwork v0.7.0-rc.1 - Fixes https://github.com/docker/libnetwork/issues/1051 - Fixes https://github.com/docker/libnetwork/issues/985 - Fixes https://github.com/docker/libnetwork/issues/945 ... — committed to mavenugo/docker by mavenugo 8 years ago
Vendor Libnetwork v0.7.0-rc.1 - Fixes https://github.com/docker/libnetwork/issues/1051 - Fixes https://github.com/docker/libnetwork/issues/985 - Fixes https://github.com/docker/libnetwork/issues/945 ... — committed to tiborvass/docker by mavenugo 8 years ago

Most upvoted comments

This is still occurring on 1.11.1. (The file exists problem, specifically)

+11

brettdh on May 16, 2016

I saw this now.

Docker version 17.05.0-ce, build 89658be

Restarting the docker daemon is not fixing this.

“starting container failed: subnet sandbox join failed for “10.0.2.0/24”: error creating vxlan interface: file exists”

alexanderkjeldaas on May 10, 2017

Still happening on docker swarm at ubuntu 18.04 Solution that worked for me was to remove stack and redeploy it

docker stack rm jenkins docker stack deploy -c docker-compose.yml jenkins

 Debug Mode: false

Server:
 Containers: 50
  Running: 30
  Paused: 0
  Stopped: 20
 Images: 114
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: sywquid85sgsma3puf4gw6u68
  Is Manager: true
  ClusterID: zh6x2htbsuevss97ytz2kbkn5
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: true
  Root Rotation In Progress: false
  Node Address: 51.77.42.145
  Manager Addresses:
   51.77.42.145:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-74-generic
 Operating System: Ubuntu 18.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.2GiB
 Name: ns3145583
 ID: O3MT:IN6V:IFUN:MMQ4:77FH:H7A2:CUUP:3ZIU:3FSS:JKBW:JADU:SKQ3
 Docker Root Dir: /home/dockerd
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support```

ProteanCode on Jan 16, 2020

@Michael-Hamburger / all this will be fixed by https://github.com/docker/libnetwork/pull/1574

I’ve applied this patch manually and haven’t had any issues so far.

marcosnils on Nov 29, 2016

I’ve just experienced it in a Swarm cluster running 1.12.3.

subnet sandbox join failed for \"10.0.2.0/24\": error creating vxlan interface: file exists

It has been produced when I did:

docker service rm logspout
docker service create --mode global --name logspout ...

bvis on Nov 4, 2016

Closed via https://github.com/docker/libnetwork/pull/1065 and is vendored into docker/docker.

Can someone try the docker/docker master and confirm the fix ?

mavenugo on Mar 31, 2016

Happened to us again, restarting docker didn’t help, stopping docker, flushing iptables and starting docker did not help. Had to reboot that machine

tomashejatko on Mar 2, 2016