moby: Docker Windows Swarm nodes do not show all CPU cores
Description I tried to search similar issues, could not find one.
We have few Windows nodes with Windows Server 2016 connected to our Swarm (mixed Swarm w, Linux and Windows nodes). 2/4 of our Windows nodes do not show all CPU cores available in the Swarm. The HW for the nodes are:
2x Intel Xeon(R)E5-2667 v4 (32 logical cores in total), 128 Gb RAM -> This node shows all cores in the Swarm
2x Intel Xeon(R)E5-2667 v4 (32 logical cores in total), 128 Gb RAM -> This node shows all cores in the Swarm
2x Intel Xeon Gold 6154 3,0 Ghz (72 logical cores in total) 128 Gb RAM -> This node shows only 36 cores in the Swarm
2x Intel Xeon Gold 6154 3,0 Ghz (72 logical cores in total) 128 Gb RAM -> This node shows only 36 cores in the Swarm
These are all native HW nodes w. HT enabled. All of the nodes have same Docker versions installed and are same all over the Swarm for each node:
Docker Engine 18.09.1
Compose 1.23.2
PS C:\Program Files\Docker\Docker> .\DockerCli.exe -Version
Docker Desktop
Version: 2.0.0.2 (30215)
Steps to reproduce the issue:
1. Join Windows machine w. alot of cores to the Swarm
2. Login to Swarm manager node
3. Run docker node inspect "node_hash"
4. Or docker info on the Windows host
Describe the results you received: Node shows only half of the cores available in the 72 core nodes:
"Resources": {
"NanoCPUs": 36000000000,
"MemoryBytes": 137070596096
},
Describe the results you expected: Should show:
"Resources": {
"NanoCPUs": 72000000000,
"MemoryBytes": 137070596096
},
Output of docker version
:
Client: Docker Engine - Community
Version: 18.09.1
API version: 1.39
Go version: go1.10.6
Git commit: 4c52b90
Built: Wed Jan 9 19:34:26 2019
OS/Arch: windows/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.1
API version: 1.39 (minimum version 1.24)
Go version: go1.10.6
Git commit: 4c52b90
Built: Wed Jan 9 19:50:10 2019
OS/Arch: windows/amd64
Experimental: false
Output of docker info
:
Containers: 16
Running: 15
Paused: 0
Stopped: 1
Images: 248
Server Version: 18.09.1
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: active
NodeID: 5udtuv3xkblmx9uwf8ilduvhb
Is Manager: false
Node Address: X.X.X.X
Manager Addresses:
X.X.X.X:2377
X.X.X.X:2377
X.X.X.X:2377
Default Isolation: process
Kernel Version: 10.0 14393 (14393.2724.amd64fre.rs1_release.181231-1751)
Operating System: Windows Server 2016 Standard Version 1607 (OS Build 14393.2724)
OSType: windows
Architecture: x86_64
**CPUs: 36**
Total Memory: 127.7GiB
Name: xxxxxxxxx
ID: XXXXX
Docker Root Dir: D:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: -1
Goroutines: 401
System Time: 2019-03-26T14:10:53.9629383+01:00
EventsListeners: 17
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
Additional environment details (AWS, VirtualBox, physical, etc.): Physical server information: Intel Xeon Gold 6154 3,0 Ghz (2 processors, 18/36 cores each) 128 GB RAM 256 SSD © 4,2 TB (D)
br. Ranchester
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 17 (8 by maintainers)
I don’t have access to a windows machine, but with the following code, I build with an online compiler https://rextester.com/l/cpp_online_compiler_visual:
The output seems to be promising:
If the above code could be validated with more than 64 cores, then we could port the above C code to golang for
numCPU()
I think.+1
is there any update to this? Still experiencing this issue.
As was mentioned above, many of the Windows processor affinity functions will only return the information for a single Processor Group. Since a single group can only hold 64 logical processors, this means when there are more they will be divided into multiple groups (as seen above, where on a 72 LP system there are 2 groups of 36 LPs).
There are a few different ways to handle this. In hcsshim, we use GetActiveProcessorCount, as can be seen here. This will return the number of host LPs regardless of processor group divisions, though I don’t think it will take into affect the process’s affinity.
/cc @kevpar @jterry75
we have the same issue on GCP. without the proper cpu information, it’s a bit hard to schedule the workloads to the right machine with k8s. probably also related to #33743
@ddebroy ^^