moby: Docker Windows Swarm nodes do not show all CPU cores

Description I tried to search similar issues, could not find one.

We have few Windows nodes with Windows Server 2016 connected to our Swarm (mixed Swarm w, Linux and Windows nodes). 2/4 of our Windows nodes do not show all CPU cores available in the Swarm. The HW for the nodes are:

2x Intel Xeon(R)E5-2667 v4 (32 logical cores in total), 128 Gb RAM -> This node shows all cores in the Swarm
2x Intel Xeon(R)E5-2667 v4 (32 logical cores in total), 128 Gb RAM -> This node shows all cores in the Swarm
2x Intel Xeon Gold 6154 3,0 Ghz (72 logical cores in total) 128 Gb RAM -> This node shows only 36 cores in the Swarm
2x Intel Xeon Gold 6154 3,0 Ghz (72 logical cores in total) 128 Gb RAM -> This node shows only 36 cores in the Swarm

These are all native HW nodes w. HT enabled. All of the nodes have same Docker versions installed and are same all over the Swarm for each node:

Docker Engine 18.09.1
Compose 1.23.2
PS C:\Program Files\Docker\Docker> .\DockerCli.exe -Version
 
Docker Desktop
Version: 2.0.0.2 (30215)

Steps to reproduce the issue:

1. Join Windows machine w. alot of cores to the Swarm
2. Login to Swarm manager node
3. Run docker node inspect "node_hash"
4. Or docker info on the Windows host

Describe the results you received: Node shows only half of the cores available in the 72 core nodes:

"Resources": {
                "NanoCPUs": 36000000000,
                "MemoryBytes": 137070596096
            },

Describe the results you expected: Should show:

"Resources": {
                "NanoCPUs": 72000000000,
                "MemoryBytes": 137070596096
            },

Output of docker version:

Client: Docker Engine - Community
 Version:           18.09.1
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        4c52b90
 Built:             Wed Jan  9 19:34:26 2019
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.1
  API version:      1.39 (minimum version 1.24)
  Go version:       go1.10.6
  Git commit:       4c52b90
  Built:            Wed Jan  9 19:50:10 2019
  OS/Arch:          windows/amd64
  Experimental:     false

Output of docker info:

Containers: 16
 Running: 15
 Paused: 0
 Stopped: 1
Images: 248
Server Version: 18.09.1
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: active
 NodeID: 5udtuv3xkblmx9uwf8ilduvhb
 Is Manager: false
 Node Address: X.X.X.X
 Manager Addresses:
  X.X.X.X:2377
  X.X.X.X:2377
  X.X.X.X:2377
Default Isolation: process
Kernel Version: 10.0 14393 (14393.2724.amd64fre.rs1_release.181231-1751)
Operating System: Windows Server 2016 Standard Version 1607 (OS Build 14393.2724)
OSType: windows
Architecture: x86_64
**CPUs: 36**
Total Memory: 127.7GiB
Name: xxxxxxxxx
ID: XXXXX
Docker Root Dir: D:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 401
 System Time: 2019-03-26T14:10:53.9629383+01:00
 EventsListeners: 17
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.): Physical server information: Intel Xeon Gold 6154 3,0 Ghz (2 processors, 18/36 cores each) 128 GB RAM 256 SSD © 4,2 TB (D)

br. Ranchester

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I don’t have access to a windows machine, but with the following code, I build with an online compiler https://rextester.com/l/cpp_online_compiler_visual:

#include <iostream>
#include <Windows.h>

int main()
{
    std::cout << "Hello, world!\n";
    
    DWORD ReturnedLength = 0;
    BOOL Returned = GetLogicalProcessorInformationEx(RelationGroup, NULL, &ReturnedLength);
    if (GetLastError() == ERROR_INSUFFICIENT_BUFFER && ReturnedLength != 0) {
        PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX Buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)malloc(ReturnedLength);
        if (Buffer != NULL) {
            if (GetLogicalProcessorInformationEx(RelationGroup, Buffer, &ReturnedLength)) {
                DWORD TotalActiveProcessorCount = 0;
                std::cout << "Active Group Count:" << Buffer->Group.ActiveGroupCount << std::endl;
                for (DWORD i = 0; i < Buffer->Group.ActiveGroupCount; i++) {
                    std::cout << "Active Processor for Group " << i << ": " << (DWORD)Buffer->Group.GroupInfo[i].ActiveProcessorCount << std::endl;
                    TotalActiveProcessorCount += Buffer->Group.GroupInfo[i].ActiveProcessorCount;
                }
                std::cout << "Total Active Processor Count: " << TotalActiveProcessorCount << std::endl;
            }
            free(Buffer);
        }
    }
}

The output seems to be promising:

Hello, world!
Active Group Count:1
Active Processor for Group 0: 4
Total Active Processor Count: 4

If the above code could be validated with more than 64 cores, then we could port the above C code to golang for numCPU() I think.

is there any update to this? Still experiencing this issue.

+1

is there any update to this? Still experiencing this issue.

As was mentioned above, many of the Windows processor affinity functions will only return the information for a single Processor Group. Since a single group can only hold 64 logical processors, this means when there are more they will be divided into multiple groups (as seen above, where on a 72 LP system there are 2 groups of 36 LPs).

There are a few different ways to handle this. In hcsshim, we use GetActiveProcessorCount, as can be seen here. This will return the number of host LPs regardless of processor group divisions, though I don’t think it will take into affect the process’s affinity.

we have the same issue on GCP. without the proper cpu information, it’s a bit hard to schedule the workloads to the right machine with k8s. probably also related to #33743