Typically administrators have a perception that inorder to increase performance of a workload , the VM’s virtual hardware capacity needs to be increased, perhaps by providing more RAM /CPU.
While this may be true in some cases, there needs to be closer inspection of the performance needs of the workload.
In this article, I will discuss performance impact of Symmetric Multiprocessor VM and significance of CSTP metric.
Note: CSTP is displayed in the default view of esxtop, refer screenshot
What is CSTP & What is the reference value ?
CSTP -Co-Stop is an indicator of lag/latency in scheduling of a vCPUs of SMP Virtual Machine to physical cpu/core by the ESX scheduler
Reference value for CSTP is 3( or lesser).
Workload profile types
Applications either use Synchronous Threads or Asynchronous Threads.
Synchronous threaded application, follow linear scheduling
time: 0:00:00: -> Start thread A (will run for 5 seconds)
time: 0:00:05: -> Start thread B (will run for 10 seconds)
time: 0:00:15: ———> Start Thread B1 (wil run for 5 seconds)
time: 0:00:20: -> Start thread C (will run for 15 seconds)
end application: total runtime will be 35 seconds
Asynchronous threaded applications will run multiple threads at the same time,
time: 0:00:00: -> Start thread A (will run for 5 seconds) / Start thread B (will run for 10 seconds) / Start thread C (will run for 15 seconds)
time: 0:00:10: —–> Start Thread B1 (wil run for 5 seconds)
end application: total runtime will be 15 seconds (because the longest runtime is thread C and also Thread B + B1 with 15 seconds)
In a scenario wherein provisioned CPU is constrained or highly utilized, will increasing number of vCPUs help ? ? ?
- If by design an application is synchronous, increasing the number of vCPUs will not make any significant performance impact.
- If the application is asynchronous , providing more CPUs could possibly increase performance.
Under the hood
With several VM’s competing for underlying hardware , theCPU scheduler has the following goals
When a VM is ready to be scheduled, all of the allocated vCPUs need to be scheduled together or with a very minimal but acceptable delay. i.e. CPU scheduler cannot schedule 2 out of 4 VCPUs of a VM to start executing the threads.
The minimal but acceptable delay is defined by (Relaxed) Co-Scheduling.
A guest operating system requires synchronous progress on all its CPUs, otherwise the OS and application will crash or fail. As one can imagine this being similar to taking out a CPU or one of the CPUs allocated had a failure.
In order to ensure that such workloads are not impacted even if the underlying cores are severely constrained, the CPU scheduler places the VM(worlds) in a Co-Stop(CSTP) state.
In simpler words, in the event that the scheduler is unable to provide the CPUs allocated to the VM, it pauses the VM temporarily.
This ofcourse can have adverse effect on the performance and is reflected as CSTP value being high.
Now that we clearly understand what CSTP is, assess if the extra CPUs are really needed or if you can scale down for some of the VMs. If this is not possible you would need to increase the underlying hardware resources or provision more hosts in the cluster
Additional references : http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
Note : This metric needs to be coupled with RDY metric of esxtop, which is fairly elementary to understand – RDY percentage is indicative that a VM was ready to run, but not yet scheduled. The more time a VM is in ready(RDY) state , it reflects that its in contention for resources and this article is to understand if it is due to multiple CPUs allocated to the VM by assessing CSTP metric.