libnetwork: Cannot start container: subnet sandbox join failed for "10.0.0.0/24": error creating vxlan interface: file exists

(Similar, but closed, issues: #562, #751)

Very occasionally, I see this error message when starting a container in my swarm:

Error response from daemon: Cannot start container <container hash>: subnet sandbox join failed for "10.0.0.0/24": error creating vxlan interface: file exists

This error persists until I reboot the docker hosts. A comment on #751 suggested that restarting iptables would suffice; I have not tried this yet. I also have tried the solution mentioned in #562 previously, and I believe that worked as well, but I cannot remember for sure.

docker version:

Client:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 18:18:11 2016
 OS/Arch:      darwin/amd64

Server:
 Version:      swarm/1.1.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   a0fd82b
 Built:        Thu Feb  4 08:55:18 UTC 2016
 OS/Arch:      linux/amd64

docker info:

Note: this was not captured when the error was occurring. If it happens again, I will comment with the info.

Containers: 32
 Running: 29
 Paused: 0
 Stopped: 3
Images: 222
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 4
 druid-vms.historical1: <host3>:2376
  └ Status: Healthy
  └ Containers: 4
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: executiondriver=native-0.2, historical=yes, kernelversion=3.16.0-30-generic, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T15:00:03Z
 druid-vms.historical2: <host4>:2376
  └ Status: Healthy
  └ Containers: 4
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: executiondriver=native-0.2, historical=yes, kernelversion=3.16.0-30-generic, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T15:00:24Z
 druid-vms.ingestion1: <host2>:2376
  └ Status: Healthy
  └ Containers: 5
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: executiondriver=native-0.2, ingestion=yes, kernelversion=3.16.0-30-generic, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T14:59:45Z
 druid-vms.queen: <host1>:2376
  └ Status: Healthy
  └ Containers: 19
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 8.188 GiB
  └ Labels: broker=yes, consul=yes, deployment-master=yes, executiondriver=native-0.2, kernelversion=3.16.0-30-generic, metadata=yes, operatingsystem=Ubuntu 14.04.2 LTS, provider=generic, storagedriver=aufs, swarm-master=yes
  └ Error: (none)
  └ UpdatedAt: 2016-02-11T15:00:22Z
Plugins: 
 Volume: 
 Network: 
Kernel Version: 3.16.0-30-generic
Operating System: linux
Architecture: amd64
CPUs: 8
Total Memory: 32.75 GiB
Name: druid-vms.queen
Http Proxy: <snip>
Https Proxy: <snip>
No Proxy: <snip>

uname -a:

  • Local: Darwin <hostname> 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64
  • Docker hosts: Linux <hostname> 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.):

  • Docker Swarm, 4 pre-existing VMs, generic driver

How reproducible:

  • Intermittent; not reliably reproducible (and I wouldn’t begin to know how to try). Seems to occur sometime after destroying and re-creating the swarm using docker-machine.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 3
  • Comments: 34 (7 by maintainers)

Commits related to this issue

Most upvoted comments

This is still occurring on 1.11.1. (The file exists problem, specifically)

I saw this now.

Docker version 17.05.0-ce, build 89658be

Restarting the docker daemon is not fixing this.

“starting container failed: subnet sandbox join failed for “10.0.2.0/24”: error creating vxlan interface: file exists”

Still happening on docker swarm at ubuntu 18.04 Solution that worked for me was to remove stack and redeploy it

docker stack rm jenkins docker stack deploy -c docker-compose.yml jenkins

 Debug Mode: false

Server:
 Containers: 50
  Running: 30
  Paused: 0
  Stopped: 20
 Images: 114
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: sywquid85sgsma3puf4gw6u68
  Is Manager: true
  ClusterID: zh6x2htbsuevss97ytz2kbkn5
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: true
  Root Rotation In Progress: false
  Node Address: 51.77.42.145
  Manager Addresses:
   51.77.42.145:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-74-generic
 Operating System: Ubuntu 18.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.2GiB
 Name: ns3145583
 ID: O3MT:IN6V:IFUN:MMQ4:77FH:H7A2:CUUP:3ZIU:3FSS:JKBW:JADU:SKQ3
 Docker Root Dir: /home/dockerd
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support```

@Michael-Hamburger / all this will be fixed by https://github.com/docker/libnetwork/pull/1574

I’ve applied this patch manually and haven’t had any issues so far.

I’ve just experienced it in a Swarm cluster running 1.12.3.

subnet sandbox join failed for \"10.0.2.0/24\": error creating vxlan interface: file exists

It has been produced when I did:

docker service rm logspout
docker service create --mode global --name logspout ...

Closed via https://github.com/docker/libnetwork/pull/1065 and is vendored into docker/docker.

Can someone try the docker/docker master and confirm the fix ?

Happened to us again, restarting docker didn’t help, stopping docker, flushing iptables and starting docker did not help. Had to reboot that machine