cloudstack: Custom Constrained Service Offering vs Qemu - CPU topology doesn't match maximum vcpu count
ISSUE TYPE
- Bug Report
COMPONENT NAME
CPU definition for qemu
CLOUDSTACK VERSION
4.17.0.1
CONFIGURATION
We are running a Cloudstack platform based on Ubuntu hosts and KVM. We have set the guest CPU model to host-mode guest.cpu.mode=host-model
Phyiscal host CPU config
lscpu | egrep "Thread\(s\)|Core\(s\)|Socket\(s\)"
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
OS / ENVIRONMENT
Ubuntu 20.04.4 Cloudstack Version 4.17.0.1 QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)
SUMMARY
When trying to deploy a VM with a “Custom - Contrained” Service offering and defining more than 2 vCPUs, for example 4, we face the following error message:
2022-10-05 08:12:12,660 INFO [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-65:ctx-395eb8b5 job-95370/job-95371 ctx-9e930d7d) (logid:45597e1f) Unable to start VM on Host {"id": "46", "name": " xxxxx", "uuid": "c361ef51-a8ab-4ec2-9a32-e5a7ce7886f5", "type"="Routing"} due to unsupported configuration: CPU topology doesn't match maximum vcpu count
Maybe it’s important to note we are trying to deploy the VM via the UI. We will try to replicate the issue via API.
Using a fixed service offering, providing exactly the same resources, the same host and storage tags, we are able to deploy the VM without issues.
Looking at the generated XML, we find the following CPU related difference - this example is for a VM with 2 VCPU and 4 GB memory, because i couldnt find a way to predict the xml of a failed VM with 4 vCPUs
Custom - Constrained Service Offering (max 16 vCPUs in this example)
...
<currentMemory unit='KiB'>4194304</currentMemory>
<vcpu placement='static' current='2'>16</vcpu>
<cputune>
<shares>800</shares>
</cputune>
...
and
...
<feature policy='disable' name='mpx'/>
<feature policy='disable' name='intel-pt'/>
<numa>
<cell id='0' cpus='0-15' memory='4194304' unit='KiB'/>
</numa>
</cpu>
...
Compared to the XML of the Fixed Service Offering:
...
<currentMemory unit='KiB'>4194304</currentMemory>
<vcpu placement='static'>2</vcpu>
<cputune>
<shares>800</shares>
</cputune>
...
and no numa definition
...
<feature policy='disable' name='mpx'/>
<feature policy='disable' name='intel-pt'/>
</cpu>
...
Looking at the qemu commandline, we see the following difference
Custom constrained Service Offering
... -overcommit mem-lock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-15,mem=4096 -uuid 54f79107-bfbc-4320-a845-ef07f896a1fc ...
versus the fixed Service Offering
... -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 2e3aba64-e6d8-4b37-a470-5dd075fcc14f ...
STEPS TO REPRODUCE
As i am not sure yet exactly where the rootcause of this issue is, it’s hard to say how exactly reproduce in other environments.
In our environment, we encounter the issue only when deploying a VM with
- Custom constrained Service Offering
- specify more than 2 vCPUs (tested 4,8,16)
We don’t encounter the issue when deploying a VM with
- Fixed Service Offering
- any valid number of vCPU is possible (tested with 2,4,8)
EXPECTED RESULTS
I would expect to be able to start a VM with the same resource configuration no matter if deployed via custom constrained or fixed Service offering. The result should be the same.
ACTUAL RESULTS
A VM with custom constrained service offering, defining more than 2 vCPUs does not start with the error message
Unable to start VM on Host {"id": "46", "name": "
xxxxx", "uuid": "c361ef51-a8ab-4ec2-9a32-e5a7ce7886f5", "type"="Routing"} due to unsupported configuration: CPU topology doesn't match maximum vcpu count
Deploying a VM with the same set of resources (vCPU and memory) using a fixed service offering works without issues.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 35 (17 by maintainers)
I can confirm the patch is working.
Many thanks for your efforts on this one!
Finally i had to dig into building cloudstack from source - a valuable learning 😉
@DaanHoogland thanks for the hint - i didn’t see the most obvious possibility 😉 The documentation is quite clear, so i’ll give the build a go in the next days
ah, that is a good addition. I am working to get more capacity in my virtual lab. I’ll update over the weekend?
@DaanHoogland just to avoid confusions, the issue with enabled dynamic scaling is the same for production and testing.
During the process of troubleshooting, i just discovered that i didn’t see the issue in my test environment, because dynamic scaling was disabled there.
We are using Ubuntu 20.04.4 Cloudstack Version 4.17.0.1 QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)