moby: Default Docker NAT network gateway is missing on Windows Server 2019

Description

Docker default nat network gateway is missing after startup, restart the dockerd service can help fix the issue on Windows server 2019. However the issue will come back after reboot. Is there any approach that can make sure dockerd nat network gateway is filled even after reboot?

Steps to reproduce the issue:

  1. Install Docker 18.09.0 per the steps https://docs.docker.com/install/windows/docker-ee/#use-a-script-to-install-docker-ee
  2. After start dockerd the first time, the subnet of nat network is 0.0.0.0/0, and the gateway is missing
  3. Restart dockerd could fix the issue
  4. However, this issue will come back again after reboot…

Describe the results you received: PS C:\Users\jesun> docker inspect nat [ { “Name”: “nat”, “Id”: “b7b85981ae2d7f843a9c5fe9535a0bf9f2053dc7de84d825c66e699427b7a275”, “Created”: “2018-12-12T14:20:06.3060877Z”, “Scope”: “local”, “Driver”: “nat”, “EnableIPv6”: false, “IPAM”: { “Driver”: “windows”, “Options”: null, “Config”: [ { “Subnet”: “0.0.0.0/0” } ] }, “Internal”: false, “Attachable”: false, “Ingress”: false, “ConfigFrom”: { “Network”: “” }, “ConfigOnly”: false, “Containers”: {}, “Options”: { “com.docker.network.windowsshim.hnsid”: “B145BBB5-14C3-4DE2-9309-73FF64067014”, “com.docker.network.windowsshim.networkname”: “nat” }, “Labels”: {} } ]

Describe the results you expected: PS C:\Users\jesun> Restart-Service docker PS C:\Users\jesun> docker inspect nat [ { “Name”: “nat”, “Id”: “1490d59f1065f4aa1986425f25953bd8128a6c159a7c4b0d57f52af724253c5c”, “Created”: “2018-12-12T14:59:28.8669915Z”, “Scope”: “local”, “Driver”: “nat”, “EnableIPv6”: false, “IPAM”: { “Driver”: “windows”, “Options”: null, “Config”: [ { “Subnet”: “172.22.48.0/20”, “Gateway”: “172.22.48.1” } ] }, “Internal”: false, “Attachable”: false, “Ingress”: false, “ConfigFrom”: { “Network”: “” }, “ConfigOnly”: false, “Containers”: {}, “Options”: { “com.docker.network.windowsshim.hnsid”: “B145BBB5-14C3-4DE2-9309-73FF64067014”, “com.docker.network.windowsshim.networkname”: “nat” }, “Labels”: {} } ] The subnet and gateway information here should not be gone after reboot. Ideally the gateway should never be missing.

Additional information you deem important (e.g. issue happens only occasionally): I ran into the same missing gateway issue on Windows server 2016(10.0.14393.0), and restart dockerd could fix the issue. Most importantly, it seems on Windows server 2016 the fixed gateway and subnet will be persisted, and will not be gone even after reboot. Also Get-NetNat on Windows server 2019 returns nothing even docker nat network is good.

Output of docker version:

PS C:\Users\jesun> docker version
Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.3
 Git commit:        33a45cd0a2
 Built:             unknown-buildtime
 OS/Arch:           windows/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.24)
  Go version:       go1.10.3
  Git commit:       33a45cd0a2
  Built:            11/07/2018 00:24:12
  OS/Arch:          windows/amd64
  Experimental:     false

Output of docker info:

PS C:\Users\jesun> docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.09.0
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows Server 2019 Datacenter Version 1809 (OS Build 17763.134)
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 4GiB
Name: jesun-19
ID: FTGG:C47Y:IXRL:7QZ7:5XS7:HWXO:QSM5:U6DV:BLRN:63VL:XALV:UQEN
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): Azure VM and Physical machines

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 33 (9 by maintainers)

Commits related to this issue

Most upvoted comments

FYI. As there looks to be many issues which prevents users to Win Srv 2019 on production I created EPIC of it to #38498 Please share Microsoft internal tracking when there is any so I can put them to list and people can refer it when contacting support.

I believe there is 2 issues conflated here in this thread that show very similar symptoms. One issue is that network ID isn’t being updated after reboot on WS2019 which has a change in it so that NAT networks aren’t being persisted. This should be fixed via Docker EE 18.09.3 release: https://github.com/docker/engine/pull/149

The other issue is that transparent networks are not being persisted after reboot, which is a Windows platform bug. The RegKey I posted earlier is a workaround for this. I am still confirming which KB will include that fix but will post it here once I know for sure.

@marknitek The fix for this is being backported. Can you set the following regkey in the meanwhile?

EDIT: The fix for transparent networks not being persisted has been released here: https://support.microsoft.com/en-us/help/4482887/windows-10-update-kb4482887

Thanks @olljanat , I got confirmation from Windows HNS team that they will clean up all NAT and ICS internal networks starting from RS5, and the suggestion to fix the issue from them is create the network before starting dockerd.

I suggested to add back the flag so that NAT and ICS networks can be persisted across reboots, but they denied. Also they said this is a docker team well known zero day issue and current priority of fixing it is low.

I tried to add a Group Policy startup script to create the network and then start dockerd manually. It works, but is ugly, and who knows someday later the script does not work and dockerd will not get started.

Now that we cannot restart dockerd as workaround of this gateway missing bug starting from RS5, can you please help prioritize fix of the issue? Thanks a lot.

Thanks @olljanat, I think you are right even though I haven’t tried. But the key issue here is that nat network gateway field is missing, which is a dependency of one of our partner services that will bind to some port of this gateway so that services running in container can communicate with the service running on local host machine.