cloudstack: Custom Constrained Service Offering vs Qemu - CPU topology doesn't match maximum vcpu count

ISSUE TYPE
  • Bug Report
COMPONENT NAME
CPU definition for qemu
CLOUDSTACK VERSION
4.17.0.1
CONFIGURATION

We are running a Cloudstack platform based on Ubuntu hosts and KVM. We have set the guest CPU model to host-mode guest.cpu.mode=host-model

Phyiscal host CPU config

lscpu | egrep "Thread\(s\)|Core\(s\)|Socket\(s\)"
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
OS / ENVIRONMENT

Ubuntu 20.04.4 Cloudstack Version 4.17.0.1 QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)

SUMMARY

When trying to deploy a VM with a “Custom - Contrained” Service offering and defining more than 2 vCPUs, for example 4, we face the following error message:

2022-10-05 08:12:12,660 INFO [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-65:ctx-395eb8b5 job-95370/job-95371 ctx-9e930d7d) (logid:45597e1f) Unable to start VM on Host {"id": "46", "name": " xxxxx", "uuid": "c361ef51-a8ab-4ec2-9a32-e5a7ce7886f5", "type"="Routing"} due to unsupported configuration: CPU topology doesn't match maximum vcpu count

Maybe it’s important to note we are trying to deploy the VM via the UI. We will try to replicate the issue via API.

Using a fixed service offering, providing exactly the same resources, the same host and storage tags, we are able to deploy the VM without issues.

Looking at the generated XML, we find the following CPU related difference - this example is for a VM with 2 VCPU and 4 GB memory, because i couldnt find a way to predict the xml of a failed VM with 4 vCPUs

Custom - Constrained Service Offering (max 16 vCPUs in this example)

...
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static' current='2'>16</vcpu>
  <cputune>
    <shares>800</shares>
  </cputune>
...

and

...
    <feature policy='disable' name='mpx'/>
    <feature policy='disable' name='intel-pt'/>
    <numa>
      <cell id='0' cpus='0-15' memory='4194304' unit='KiB'/>
    </numa>
  </cpu>
...

Compared to the XML of the Fixed Service Offering:

...
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <shares>800</shares>
  </cputune>
...

and no numa definition

...
    <feature policy='disable' name='mpx'/>
    <feature policy='disable' name='intel-pt'/>
  </cpu>
...

Looking at the qemu commandline, we see the following difference

Custom constrained Service Offering

... -overcommit mem-lock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-15,mem=4096  -uuid 54f79107-bfbc-4320-a845-ef07f896a1fc ...

versus the fixed Service Offering

... -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 2e3aba64-e6d8-4b37-a470-5dd075fcc14f ...
STEPS TO REPRODUCE

As i am not sure yet exactly where the rootcause of this issue is, it’s hard to say how exactly reproduce in other environments.

In our environment, we encounter the issue only when deploying a VM with

  • Custom constrained Service Offering
  • specify more than 2 vCPUs (tested 4,8,16)

We don’t encounter the issue when deploying a VM with

  • Fixed Service Offering
  • any valid number of vCPU is possible (tested with 2,4,8)
EXPECTED RESULTS
I would expect to be able to start a VM with the same resource configuration no matter if deployed via custom constrained or fixed Service offering. The result should be the same.
ACTUAL RESULTS

A VM with custom constrained service offering, defining more than 2 vCPUs does not start with the error message

Unable to start VM on Host {"id": "46", "name": "
xxxxx", "uuid": "c361ef51-a8ab-4ec2-9a32-e5a7ce7886f5", "type"="Routing"} due to unsupported configuration: CPU topology doesn't match maximum vcpu count

Deploying a VM with the same set of resources (vCPU and memory) using a fixed service offering works without issues.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 35 (17 by maintainers)

Commits related to this issue

Most upvoted comments

I can confirm the patch is working.

Many thanks for your efforts on this one!

Finally i had to dig into building cloudstack from source - a valuable learning 😉

@DaanHoogland thanks for the hint - i didn’t see the most obvious possibility 😉 The documentation is quite clear, so i’ll give the build a go in the next days

ah, that is a good addition. I am working to get more capacity in my virtual lab. I’ll update over the weekend?

@DaanHoogland just to avoid confusions, the issue with enabled dynamic scaling is the same for production and testing.

During the process of troubleshooting, i just discovered that i didn’t see the issue in my test environment, because dynamic scaling was disabled there.

We are using Ubuntu 20.04.4 Cloudstack Version 4.17.0.1 QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)