moby: IPv6 address pool subnet smaller than /80 causes dockerd to consume all available RAM
Description
It is documented that IPv6 pool “should” be at least /80 so that MAC address can fit in the last 48 bits.
Using a default-address-pools size larger than 80 causes dockerd to consume too much RAM - the longer the prefix (smaller subnet), the more RAM dockerd will use:
- At /81 - /90 range the RAM usage increase is negligible in the range of few GB.
- At /94 - /96 range the RAM usage is in the tens to hundreds of GB.
The pool prefix length could be set to larger by 80 by a typing error or a mistake and if this leads to dockerd consuming copious amounts of RAM will cause the administrator to possibly lose time troubleshooting the situation.
It looks like prefix length like /96 is totally unusable and dockerd should refuse to start instead of starting to allocate ridiculous amounts of RAM.
At minimum a warning message should be printed.
Steps to reproduce the issue:
- Set IPv6 address pool prefix length longer than 80:
"default-address-pools": [
{ "base": "192.0.2.0/16", "size": 24 },
{ "base": "2001:db8:1:1f00::/64", "size": 96 }
],
- Start Docker.
- Watch the server grind to a halt and kernel OOM killer being invoked.
Describe the results you received: dockerd consumes very large amounts of RAM (tens of GB).
Describe the results you expected: Either IPv6 pool prefix lengths longer than 80 should work, or dockerd should refuse start with a configuration that cannot be used.
At minimum a warning message should be printed for prefix lengths longer than 80.
The documentation does not mention the RAM usage effect either:
The subnet for Docker containers should at least have a size of /80, so that an IPv6 address can end with the container’s MAC address and you prevent NDP neighbor cache invalidation issues in the Docker layer.
Additional information you deem important (e.g. issue happens only occasionally): 100% reproducible.
Output of docker version
:
Docker version 19.03.5, build 633a0ea
Output of docker info
:
Client:
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 19.03.5
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: fluentd
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Security Options:
seccomp
Profile: default
selinux
Kernel Version: 3.10.0-1062.4.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 47.15GiB
Name: docker.domain
ID: xxx
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 5
- Comments: 21 (7 by maintainers)
Commits related to this issue
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- fixup! Fixes #40275: Generate split subnets on-demand — committed to akerouanton/docker by akerouanton a year ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- Fixes #40275: Generate split subnets on-demand This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by... — committed to akerouanton/docker by akerouanton 3 years ago
- libnet/ipam: Lazily sub-divide pools into subnets A new Subnetter structure is added to lazily sub-divide an address pool into subnets. This fixes #40275. Prior to this change, the list of NetworkTo... — committed to akerouanton/docker by akerouanton a year ago
- libnet/ipam: Lazily sub-divide pools into subnets A new Subnetter structure is added to lazily sub-divide an address pool into subnets. This fixes #40275. Prior to this change, the list of NetworkTo... — committed to akerouanton/docker by akerouanton a year ago
- too small network doesnt work for docker https://github.com/moby/moby/issues/40275 — committed to Enucatl/puppet-control-repo by Enucatl 9 months ago
I can’t advise on swarm and my experience with GUA network had various gotchas I ran into that I didn’t find time to document that better, but you may find these IPv6 with Docker docs I wrote helpful?
It shows how to setup with Docker CLI or Docker Compose. The official Docker IPv6 docs were in worse shape until recently (May) when they received a big revision (I provided some review feedback). My unofficial docs might provide a helpful resource though 😅
You can definitely create an IPv6 network via the CLI and reference it via
compose.yaml
. My linked docs should mention that IIRC (NOTE: the link is not entirely stable as it’s waiting on av13
release of the project, while the linkededge
version in future will probably break when the docs are moved around).Here’s a preview:
If you’re using IPv4 NAT (default), IPv6 ULA works well at providing IPv6 networking that works in the same way between containers and the default enabled
userland-proxy
.IPv6 ULA benefit over IPv6 GUA:
iptables
rules).AWS is delegating a prefix up to a /80 per instance: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-prefix-eni.html#ec2-prefix-basics
GCP is delegating a prefix of /96 per instance: https://cloud.google.com/compute/docs/ip-addresses/configure-ipv6-address#ipv6-assignment
There are other cloud providers that are offering even smaller sizes for their prefix delegations.
Docker should work with smaller allocations too.