Best Practices for Running SQL Server on VMware vSphere & vCenter (Part 1: CPU and NUMA Considerations)

Running Microsoft SQL Server on VMware vSphere offers flexibility, high availability, and scalability, but only when the environment is configured correctly. Misconfigurations at the cluster, host, or VM level can create unnecessary bottlenecks. Below are best practices for tuning your vCenter and vSphere cluster to support SQL Server workloads effectively.


1. vCenter Statistics Collection

By default, vCenter rolls up performance statistics starting at one-hour intervals. This is too coarse for SQL Server troubleshooting since short-lived performance spikes may be lost.

Best Practice

  • Adjust the statistics collection levels to:
    • 5 minutes → save 1 day, Level 4
    • 30 minutes → save 1 week, Level 4
    • 2 hours → save 1 month, Level 3
    • 1 day → save 1 year, Level 2

This ensures administrators have granular data for both real-time troubleshooting and long-term trend analysis. Make sure the vCenter appliance has enough database space to retain this history (use the built-in calculator to estimate storage).


2. vCenter Runtime Settings

Always configure a primary managed IP address for vCenter. Many VMware tools and integrations depend on this setting for communication. Leaving it blank or incorrectly set can cause failures with monitoring, backups, or disaster recovery solutions.


3. Enhanced vMotion Compatibility (EVC)

SQL Server VMs are highly sensitive to CPU model and instruction set differences between hosts. Even if your hosts are identical today, future upgrades might introduce newer processors.

Best Practice

  • Enable EVC mode at the highest baseline supported across all hosts.
  • This ensures vMotion compatibility for future CPU generations without downtime.
  • Enabling EVC requires powering down all VMs once during setup but pays off with smooth migrations later.

4. High Availability (HA) Configuration

vSphere HA protects SQL Server VMs from host failures by automatically restarting them on surviving hosts.

Recommendations:

  • Host Failure Response → Restart VMs.
  • Admission Control → Reserve capacity for at least one host failure.
  • Performance Degradation Tolerance → Do not use this for SQL workloads (it relies on “active memory,” which does not reflect true SQL Server usage).
  • Heartbeat Datastores → Configure at least two datastores on different storage arrays for maximum resiliency.

5. Dynamic Resource Scheduling (DRS)

DRS balances workloads across hosts. For SQL Server, the default “active memory” metric is misleading.

Best Practice

  • Enable DRS.
  • Change Memory Metric for Load Balancing to “consumed memory” rather than “active memory.”
  • Monitor vCPU Ready Time instead of relying only on CPU overcommit ratios. This gives a more accurate picture of SQL Server performance pressure.

6. Affinity & Anti-Affinity Rules

For SQL Server Always On Availability Groups or Failover Cluster Instances, never allow all replicas to reside on the same host. A single host failure could take the entire HA solution offline.

Best Practice

  • Use anti-affinity rules to keep replicas on different hosts.
  • Group hosts with VM/Host Groups and define rules so SQL Server HA members are evenly distributed.

7. Resource Pools

Resource pools allow you to prioritize critical workloads during contention.

Best Practice

  • Create three tiers of resource pools:
    • Tier 1 → High priority (mission-critical SQL workloads)
    • Tier 2 → Normal priority
    • Tier 3 → Low priority
  • Use “Shares” (High, Normal, Low) instead of hard reservations to avoid skewing allocations as the cluster grows.
  • Do not use resource pools as “folders” for organization; this can lead to unnecessary CPU scheduling overhead.

8. ESXi Host-Level Settings

At the host level, a few tuning steps can significantly improve SQL Server performance:

  1. ESXi Version → Stay on a current, supported release (7.0 U1 or later recommended).
  2. Hyper-Threading → Leave enabled but don’t count logical cores as full physical cores when sizing SQL Server.
  3. CPU Power Policy → Set to High Performance to avoid CPU frequency throttling.
  4. Jumbo Frames → Enable 9000 MTU for vMotion, iSCSI, and heavy SQL backup traffic (end-to-end validation required).
  5. Multi-NIC vMotion → Configure at least two NICs for vMotion to speed up SQL VM migrations.
  6. Time Sync → Configure NTP on hosts to prevent time drift (which can cause SQL HA outages).
  7. Storage Multipathing → Use Round Robin with IOPS=1 for aggressive load balancing.
  8. Adapter Queue Depths → Increase from defaults (32) to at least 64–128 for flash-based arrays.
  9. DSNRO (Disk.SchedNumReqOutstanding) → Adjust if multiple VMs on the same datastore show high I/O queueing.

9. VM-Level CPU, NUMA, and Memory Settings

SQL Server is NUMA-aware, and VMware handles vNUMA automatically—but there are pitfalls.

vCPU & NUMA

  • VMware may ignore the “cores per socket” UI setting. Instead, it maps vCPUs to physical NUMA nodes.
  • Large VMs can span NUMA nodes without you realizing it.
  • Validate NUMA mapping using ESXTOP (check local vs remote memory usage).

Best Practice

  • If needed, override with advanced parameters:
    • numa.vcpu.followcorespersocket = 1
    • cpuid.coresPerSocket = [desired cores]
    • numa.autosize.vcpu.maxPerVirtualNode = [value]
  • Consider using PreferHT to improve NUMA locality on large VMs.

CPU Hot Plug

Avoid enabling CPU Hot Plug for SQL Server VMs. It disables vNUMA, which can severely degrade performance.

Memory Settings

  • Never size SQL Server VMs based on VMware Active Memory.
  • For critical SQL Servers, enable Reserve All Guest Memory to prevent ballooning or swapping.
  • Memory Hot Plug is supported (vSphere 6.5+), but use sparingly.
  • Always validate NUMA memory balance using ESXTOP.

10. VM Disk Controllers & Layout

SQL Server performance often bottlenecks at the disk controller layer.

Best Practice

  • Use Paravirtual SCSI (PVSCSI) controllers, not LSI SAS (default).
  • PVSCSI has higher queue depth and lower CPU overhead.
  • Use up to 4 PVSCSI controllers to spread I/O.
  • Separate major SQL Server components (system DBs, user DBs, logs, TempDB, backups) onto different virtual disks.
  • This improves concurrency and flexibility.

11. VM Time Synchronization

VMware Tools can unexpectedly reset guest time during vMotion, suspend/resume, or snapshots—even if time sync is disabled.

Best Practice

Explicitly disable VMware Tools time sync with advanced parameters:

tools.syncTime = 0
time.synchronize.continue = 0
time.synchronize.restore = 0
time.synchronize.resume.disk = 0
time.synchronize.shrink = 0
time.synchronize.tools.startup = 0
time.synchronize.tools.enable = 0
time.synchronize.resume.host = 0
  • Always rely on Windows Time Service with domain NTP for SQL Server HA/AG consistency.

12. VMware Tools

Keep VMware Tools updated on SQL Server VMs. Bug fixes and performance improvements are often delivered through VMware Tools updates. Plan updates during maintenance windows since network drivers may briefly disconnect during upgrade.


Final Thoughts (Part 1)

This first part focused on vCenter, host cluster, and VM-level best practices for running SQL Server on VMware. Correct NUMA awareness, HA/DRS configuration, proper statistics retention, and optimized VM disk and memory settings ensure the platform delivers predictable performance.

👉 Part 2 will cover SQL Server–specific tuning inside the VM to complement these VMware infrastructure optimizations.

Workshop: Validating CPU, NUMA, and vSphere Configuration for SQL Server VMs

Objective
This hands-on workshop walks you through verifying VMware and SQL Server settings that directly affect CPU scheduling, NUMA balance, and VM performance. By the end, you’ll know how to confirm proper NUMA alignment, detect vCPU pressure, and document VMware host configurations for SQL optimization.


Step 1: Prepare the Lab

  • Use a non-production SQL Server VM running on vSphere 7.0 U1 or later.
  • Ensure vCenter access and ESXTOP privileges.
  • Install VMware Tools and make sure Windows NTP is configured and running.

Step 2: Check NUMA Node Mapping

Inside vSphere Client:

  1. Select your SQL VM → Monitor → Hardware → NUMA.
  2. Confirm the VM is contained within one physical NUMA node (ideal for VMs ≤ half the host’s physical cores).
  3. If you see “Wide VM” (spanning nodes), note it for later tuning.

From the guest OS (SQL Server):





SELECT 
  node_id, memory_node_id, online_scheduler_mask, cpu_count
FROM sys.dm_os_nodes
WHERE node_state_desc = 'ONLINE';
  • Compare the number of NUMA nodes SQL detects to what vSphere reports.
  • If there’s a mismatch, vNUMA alignment might be off.

Step 3: Review vCPU and Core Allocation

vSphere Client → VM → Edit Settings:

  • Note the number of vCPUs and Cores per Socket.
  • Record them in your baseline sheet.
  • Verify you’re not over-allocating: ideally, keep CPU overcommit ratio under 3:1 for SQL VMs.

Check CPU Reservation (optional):

Get-VM "SQLVM01" | Get-VMResourceConfiguration

Confirm reservations are either blank (using shares) or proportionate to workload priority.


Step 4: Verify NUMA Settings in Advanced Parameters

In vSphere Client → VM → Edit Settings → VM Options → Advanced → Configuration Parameters:
Add or verify these keys:

numa.vcpu.followcorespersocket = 1
numa.autosize.vcpu.maxPerVirtualNode = 8

⚙️ This keeps NUMA boundaries consistent and avoids cross-node memory access latency.

If testing hyperthreading locality, optionally add:

sched.cpu.preferHT = TRUE

Step 5: Monitor CPU Ready and NUMA Load with ESXTOP

  1. SSH into the ESXi host.
  2. Run esxtop.
  3. Press c for CPU view.
  4. Press V to filter for your SQL VM.
  5. Check:
    • %RDY → Should stay under 5% (CPU contention).
    • %CSTP → Should stay under 3% (co-stop scheduling delay).
    • %MLMTD → 0 means no CPU limit applied.
  6. Press m for Memory view and note NUMA Locality (%L).
    • Values below 80% mean remote memory access — investigate NUMA misalignment.

Step 6: Validate SQL Server’s NUMA Awareness

Run inside SQL Server:

SELECT 
  scheduler_id, parent_node_id, cpu_id, status, is_online, is_idle
FROM sys.dm_os_schedulers
WHERE status = 'VISIBLE ONLINE';
  • Each parent_node_id should map evenly to your vNUMA nodes.
  • If you see all schedulers under a single node, verify VM configuration.

Step 7: Adjust vSphere HA and DRS Behavior

  • Under Cluster → Configure → vSphere DRS → Edit Settings, ensure “Load balancing based on consumed memory” is active.
  • For Always On or clustered SQL VMs, add anti-affinity rules to ensure replicas never share a host.
  • Rebalance manually if necessary, then recheck %RDY and NUMA Locality via ESXTOP.

Step 8: Document and Benchmark

Record results:

TestObservationAction Needed
vNUMA alignmentWithin single nodeâś… OK
CPU Ready Time< 5%âś… OK
NUMA Locality> 85%âś… OK
Hyperthreading EnabledYesâś… OK
Power PolicyHigh Performanceâś… OK

Then benchmark SQL Server with a workload (e.g., HammerDB or custom query load) and record CPU utilization, batch requests/sec, and page life expectancy.


Outcome
After completing this workshop, you’ll have verified NUMA alignment, CPU scheduling efficiency, and HA/DRS compatibility for SQL Server on VMware. These checks ensure predictable CPU behavior and minimize cross-node memory latency — foundational before applying in-guest SQL tuning in Part 2.


Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading