A Complete Playbook for Modern DBAs
SQL Server health checks are one of the few things that never go away in a DBA’s career. It does not matter if you inherit a system, support a system that has been running for years, or walk into a new environment where the last DBA left suddenly. Someone will eventually ask the same question:
Is our SQL Server healthy, and what should we fix first?
Most DBAs know how to run DMV queries. That part is easy. What most DBAs struggle with is delivering a health check that actually answers the question. Management does not want screenshots, random graphs, or a spreadsheet full of numbers. They want a clear diagnosis, evidence, and a clean recommendation that reduces risk and improves stability.
This guide pulls together the full SQLYard way of running a modern health check. Everything here is safe to run on production servers, including real-time OLTP workloads, and everything includes an explanation of why the check matters and how to explain your findings in plain English.
This is not a quick list of DMVs. This is a full professional health assessment that walks through instance-level configuration, storage, TempDB, MAXDOP, parallelism, memory, backups, corruption checks, statistics health, index health, security posture, job health, waits, latch contention, I/O diagnostics, and Availability Group state. All scripts are inline and all Microsoft Docs references are included for further reading.
Run this entire playbook and you will have everything you need to deliver a full health assessment the same day, even on servers you have never touched before.
Part 1 – Why This Health Check Exists
(And What Management Actually Wants)
Sooner or later someone will ask you for a health check. It usually comes in one of these forms:
- “The system feels slow. Can you check SQL Server?”
- “We upgraded hardware. Is SQL using it correctly?”
- “A consultant said our settings were wrong. Can you confirm?”
- “Can you validate the server before we migrate it?”
What management is really asking for is much simpler:
- A clear status
- Evidence to back it up
- What to fix first
This playbook includes all three. You do not need any paid tools. Everything runs on SQL Server 2019 and 2022 and is safe to run in production.
When you finish, you should be able to summarize the instance in one page:
- Overall health: Healthy, Needs Attention, or At Risk
- High-priority findings
- Medium-priority findings
- Low-priority clean-up items
- Next recommended actions
Everything in this guide supports that one-page summary.
Part 2 – Instance Overview, Memory, Settings, Trace Flags, and Startup Health
2.1 Instance Overview
Always start with the basics. This is the header for every health check.
SELECT
SERVERPROPERTY('MachineName') AS MachineName,
SERVERPROPERTY('ServerName') AS ServerName,
SERVERPROPERTY('Edition') AS Edition,
SERVERPROPERTY('ProductVersion') AS ProductVersion,
SERVERPROPERTY('ProductLevel') AS ProductLevel,
SERVERPROPERTY('EngineEdition') AS EngineEdition,
sqlserver_start_time AS SqlServerStartTime
FROM sys.dm_os_sys_info;
Reference:
https://learn.microsoft.com/sql/t-sql/functions/serverproperty-transact-sql
What you report:
SQL Server version, edition, and uptime. Note whether it is patched, outdated, or running a discontinued servicing branch.
2.2 Memory Configuration and Pressure
2.2.1 Max Server Memory vs Physical RAM
WITH mem AS
(
SELECT
total_physical_memory_kb / 1024.0 AS physical_memory_MB,
available_physical_memory_kb / 1024.0 AS available_memory_MB
FROM sys.dm_os_sys_memory
),
cfg AS
(
SELECT value_in_use AS max_server_memory_MB
FROM sys.configurations
WHERE name = 'max server memory (MB)'
)
SELECT
m.physical_memory_MB,
c.max_server_memory_MB,
CAST(m.physical_memory_MB * 0.50 AS int) AS recommended_min_MB,
CAST(m.physical_memory_MB * 0.80 AS int) AS recommended_max_MB,
m.available_memory_MB AS current_free_OS_memory_MB,
CASE
WHEN c.max_server_memory_MB IS NULL THEN 'Not configured'
WHEN c.max_server_memory_MB > m.physical_memory_MB * 0.85 THEN 'Likely too high'
WHEN c.max_server_memory_MB < m.physical_memory_MB * 0.40 THEN 'Likely too low'
ELSE 'Within reasonable range'
END AS recommendation
FROM mem m
CROSS JOIN cfg c;
Reference:
https://learn.microsoft.com/sql/database-engine/configure-windows/server-memory-server-configuration-options
What you report:
SQL Server should get about 50 to 80 percent of RAM depending on workload.
2.2.2 Memory Pressure Indicators
SELECT
'OS memory' AS source,
total_physical_memory_kb / 1024.0 AS total_physical_MB,
available_physical_memory_kb / 1024.0 AS available_physical_MB,
system_memory_state_desc
FROM sys.dm_os_sys_memory
UNION ALL
SELECT
'SQL process' AS source,
physical_memory_in_use_kb / 1024.0 AS total_physical_MB,
large_page_allocations_kb / 1024.0 AS available_physical_MB,
CASE
WHEN process_physical_memory_low = 1 THEN 'SQL under memory pressure'
ELSE 'SQL memory state OK'
END AS system_memory_state_desc
FROM sys.dm_os_process_memory;
Reference:
https://learn.microsoft.com/sql/relational-databases/system-dynamic-management-views/sys-dm-os-process-memory-transact-sql
2.3 Dangerous Instance-Level Settings
SELECT
c.name,
c.value_in_use,
CASE
WHEN c.name IN (
'priority boost',
'lightweight pooling',
'affinity mask',
'affinity64 mask',
'affinity I/O mask',
'affinity64 I/O mask'
) THEN 'HIGH risk'
WHEN c.name IN ('max worker threads', 'min memory per query (KB)') THEN 'MEDIUM risk'
ELSE 'NORMAL'
END AS risk_level
FROM sys.configurations AS c
WHERE c.name IN (
'affinity mask',
'affinity64 mask',
'affinity I/O mask',
'affinity64 I/O mask',
'lightweight pooling',
'priority boost',
'max worker threads',
'min memory per query (KB)'
)
ORDER BY risk_level DESC;
Reference:
https://learn.microsoft.com/sql/database-engine/configure-windows/server-configuration-options-sql-server
2.4 Trace Flag Inventory and Risk Assessment
DBCC TRACESTATUS(-1);
Reference:
https://learn.microsoft.com/sql/t-sql/database-console-commands/dbcc-tracestatus-transact-sql
Classify trace flags into:
- Common: 3226
- Legacy on modern builds: 1117, 1118, 2371
- Diagnostic spam: 3605, 1204, 1222
- Advanced tuning: 8048, 7457, 4199, 834
Do not remove flags without documented reasoning.
2.5 OS-Level Best Practices
2.5.1 NTFS Allocation Unit Size
SQL Server data, log, TempDB drives should use 64 KB allocation units.
Reference:
https://learn.microsoft.com/sql/sql-server/install/hardware-and-software-requirements-for-installing-sql-server
2.5.2 Windows Indexing
Ensure “Allow files on this drive to have contents indexed” is disabled unless FILESTREAM is used.
2.5.3 Instant File Initialization (IFI)
SQL confirms IFI at startup. Verify using:
EXEC xp_readerrorlog 0, 1, 'Database Instant File Initialization';
Reference:
https://learn.microsoft.com/sql/relational-databases/databases/instant-file-initialization
IFI must be enabled at Windows level using the privilege
Perform volume maintenance tasks.
2.5.4 Optimize for Ad Hoc Workloads
EXEC sp_configure 'optimize for ad hoc workloads';
Reference:
https://learn.microsoft.com/sql/relational-databases/performance/optimize-for-ad-hoc-workloads
2.5.5 Soft NUMA
Check Soft NUMA with:
SELECT node_id, memory_node_id, cpu_count, online_scheduler_count
FROM sys.dm_os_nodes
WHERE node_state_desc = 'ONLINE';
Reference:
https://learn.microsoft.com/sql/database-engine/configure-windows/soft-numa
2.5.6 TLS, Certificates, and Drivers
Check the clients:
SELECT
session_id,
client_interface_name,
client_version,
program_name,
host_name
FROM sys.dm_exec_connections;
If you see ODBC Driver 18 or SqlClient 5.x errors:
Search error log:
EXEC xp_readerrorlog 0, 1, 'TLS';
EXEC xp_readerrorlog 0, 1, 'certificate';
EXEC xp_readerrorlog 0, 1, 'failed to initialize encryption';
Reference:
https://learn.microsoft.com/sql/connect/odbc/download-odbc-driver-for-sql-server
Part 3 – CPU, MAXDOP, Parallelism, and TempDB Health
3.1 NUMA and Scheduler Layout
SELECT
parent_node_id,
scheduler_id,
cpu_id,
is_online,
is_idle
FROM sys.dm_os_schedulers
WHERE status = 'VISIBLE ONLINE'
ORDER BY parent_node_id, scheduler_id;
3.2 MAXDOP and Cost Threshold for Parallelism
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'max degree of parallelism';
EXEC sp_configure 'cost threshold for parallelism';
Reference:
https://learn.microsoft.com/sql/database-engine/configure-windows/configure-the-max-degree-of-parallelism-server-configuration-option
Use Query Store or cached plan analysis to validate real parallelism usage:
SELECT
COUNT(*) AS total_plans,
SUM(CASE WHEN qp.query_plan.exist('//RelOp[@Parallel = 1]') = 1 THEN 1 ELSE 0 END) AS parallel_plans
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
WHERE cp.cacheobjtype = 'Compiled Plan';
3.3 TempDB Configuration
3.3.1 File Layout
USE tempdb;
SELECT
name,
file_id,
size * 8 / 1024 AS size_MB,
growth * 8 / 1024 AS growth_MB,
physical_name
FROM sys.database_files;
Reference:
https://learn.microsoft.com/sql/relational-databases/databases/tempdb-database
3.3.2 Version Store
SELECT
SUM(version_store_reserved_page_count) * 8 / 1024 AS version_store_MB
FROM sys.dm_db_file_space_usage;
3.3.3 Latch Contention
SELECT TOP 20
wait_type,
wait_time_ms,
waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGELATCH_%'
ORDER BY wait_time_ms DESC;
Reference:
https://learn.microsoft.com/sql/relational-databases/diagnose/tempdb-contention
Part 4 – Database Safety, Backups, Integrity, VLFs, Growth, Recovery
4.1 Database Safety Configuration
SELECT
name,
state_desc,
recovery_model_desc,
compatibility_level,
page_verify_option_desc,
is_auto_close_on,
is_auto_shrink_on,
is_encrypted,
is_read_committed_snapshot_on
FROM sys.databases
WHERE database_id > 4;
4.2 Backup Health
SELECT
bs.database_name,
MAX(CASE WHEN bs.type = 'D' THEN backup_finish_date END) AS last_full_backup,
MAX(CASE WHEN bs.type = 'I' THEN backup_finish_date END) AS last_diff_backup,
MAX(CASE WHEN bs.type = 'L' THEN backup_finish_date END) AS last_log_backup
FROM msdb.dbo.backupset bs
GROUP BY bs.database_name;
4.2.1 DBCC CHECKDB History
EXEC xp_readerrorlog 0, 1, 'DBCC CHECKDB';
Reference:
https://learn.microsoft.com/sql/t-sql/database-console-commands/dbcc-checkdb-transact-sql
4.3 VLF Health
SELECT
DB_NAME(database_id) AS database_name,
COUNT(*) AS vlf_count
FROM sys.dm_db_log_info(NULL)
GROUP BY database_id
ORDER BY vlf_count DESC;
Reference:
https://learn.microsoft.com/sql/relational-databases/troubleshoot/troubleshoot-transaction-log-vlfs
Part 5 – Index Health, Fragmentation, Stats, Heaps, Design Issues
5.1 Heaps
SELECT
s.name AS schema_name,
t.name AS table_name,
p.rows
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id = 0
WHERE t.is_ms_shipped = 0;
5.2 Heaps with Forwarded Records
SELECT
OBJECT_NAME(object_id) AS table_name,
forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED')
WHERE index_id = 0 AND forwarded_record_count > 0;
5.3 Fragmentation Snapshot (All Databases)
IF OBJECT_ID('tempdb..#frag') IS NOT NULL DROP TABLE #frag;
CREATE TABLE #frag (
database_name sysname,
object_name sysname,
index_name sysname,
index_type nvarchar(60),
avg_fragmentation_in_percent float,
page_count int
);
DECLARE @db sysname, @sql nvarchar(max);
DECLARE dbs CURSOR FAST_FORWARD FOR
SELECT name FROM sys.databases
WHERE database_id > 4 AND state = 0; -- online user dbs
OPEN dbs;
FETCH NEXT FROM dbs INTO @db;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = N'
USE ' + QUOTENAME(@db) + ';
INSERT INTO #frag
SELECT
DB_NAME(),
OBJECT_NAME(ips.object_id),
i.name,
i.type_desc,
ips.avg_fragmentation_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, ''SAMPLED'') ips
JOIN sys.indexes i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
WHERE ips.index_id > 0
AND ips.page_count >= 1000;';
EXEC (@sql);
FETCH NEXT FROM dbs INTO @db;
END
CLOSE dbs; DEALLOCATE dbs;
SELECT *
FROM #frag
ORDER BY avg_fragmentation_in_percent DESC;
5.4 Missing Indexes
SELECT TOP 25
user_seeks, user_scans, avg_total_user_cost, avg_user_impact,
DB_NAME(database_id) AS dbname,
OBJECT_NAME(object_id) AS table_name,
equality_columns, inequality_columns, included_columns
FROM sys.dm_db_missing_index_details;
Reference:
https://learn.microsoft.com/sql/relational-databases/system-dynamic-management-views/sys-dm-db-missing-index-details-transact-sql
5.5 Overlapping Indexes
/* Overlapping / duplicate index detection */
;WITH index_cols AS
(
SELECT
i.object_id,
i.index_id,
i.name AS index_name,
i.is_unique,
i.type_desc,
ic.key_ordinal,
c.name AS column_name,
ic.is_included_column,
ic.is_descending_key
FROM sys.indexes i
JOIN sys.index_columns ic
ON i.object_id = ic.object_id
AND i.index_id = ic.index_id
JOIN sys.columns c
ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE i.is_hypothetical = 0
AND i.type_desc = 'NONCLUSTERED'
),
agg AS
(
SELECT
c1.object_id,
c1.index_id,
c1.index_name,
c1.is_unique,
c1.type_desc,
key_columns =
STUFF((SELECT ',' + c2.column_name
FROM index_cols c2
WHERE c2.object_id = c1.object_id
AND c2.index_id = c1.index_id
AND c2.is_included_column = 0
ORDER BY c2.key_ordinal
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'), 1, 1, ''),
include_columns =
STUFF((SELECT ',' + c2.column_name
FROM index_cols c2
WHERE c2.object_id = c1.object_id
AND c2.index_id = c1.index_id
AND c2.is_included_column = 1
ORDER BY c2.column_name
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'), 1, 1, '')
FROM index_cols c1
GROUP BY
c1.object_id,
c1.index_id,
c1.index_name,
c1.is_unique,
c1.type_desc
)
SELECT
s.name AS schema_name,
t.name AS table_name,
a1.index_name AS index_name_1,
a1.key_columns AS key_columns_1,
a1.include_columns AS include_columns_1,
a2.index_name AS index_name_2,
a2.key_columns AS key_columns_2,
a2.include_columns AS include_columns_2
FROM agg a1
JOIN agg a2
ON a1.object_id = a2.object_id
AND a1.index_id < a2.index_id
AND a1.key_columns = a2.key_columns
JOIN sys.tables t
ON a1.object_id = t.object_id
JOIN sys.schemas s
ON t.schema_id = s.schema_id
WHERE t.is_ms_shipped = 0
ORDER BY
s.name,
t.name,
a1.key_columns,
a1.index_name,
a2.index_name;
5.6 Statistics Health
SELECT
OBJECT_NAME(sp.object_id) AS table_name,
s.name AS stat_name,
sp.last_updated,
sp.modification_counter
FROM sys.stats s
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
WHERE sp.last_updated IS NOT NULL
ORDER BY sp.last_updated ASC;
5.7 Top 10 Largest Tables
SELECT TOP 10
s.name AS schema_name,
t.name AS table_name,
SUM(p.rows) AS row_count,
SUM(a.total_pages) * 8 / 1024 AS total_size_MB
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.partitions p ON t.object_id = p.object_id
JOIN sys.allocation_units a ON p.partition_id = a.container_id
GROUP BY s.name, t.name
ORDER BY total_size_MB DESC;
Part 6 – Security, Jobs, Waits, I/O, Latches, Blocking, Top Queries
6.1 Server Roles
SELECT
l.name AS principal_name,
r.name AS role_name
FROM sys.server_principals l
JOIN sys.server_role_members rm ON l.principal_id = rm.member_principal_id
JOIN sys.server_principals r ON rm.role_principal_id = r.principal_id;
6.2 Database Roles
/* Database-level powerful role membership across all databases */
IF OBJECT_ID('tempdb..#db_roles') IS NOT NULL
DROP TABLE #db_roles;
CREATE TABLE #db_roles
(
database_name sysname,
principal_name sysname,
principal_type nvarchar(60),
role_name sysname
);
DECLARE @db sysname;
DECLARE @sql nvarchar(max);
DECLARE db_cursor CURSOR FAST_FORWARD FOR
SELECT name
FROM sys.databases
WHERE database_id > 4 -- skip system DBs
AND state = 0; -- ONLINE
OPEN db_cursor;
FETCH NEXT FROM db_cursor INTO @db;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = N'
INSERT INTO #db_roles (database_name, principal_name, principal_type, role_name)
SELECT
N''' + @db + ''' AS database_name,
dp.name AS principal_name,
dp.type_desc AS principal_type,
dr.name AS role_name
FROM ' + QUOTENAME(@db) + '.sys.database_principals dp
JOIN ' + QUOTENAME(@db) + '.sys.database_role_members drm
ON dp.principal_id = drm.member_principal_id
JOIN ' + QUOTENAME(@db) + '.sys.database_principals dr
ON drm.role_principal_id = dr.principal_id
WHERE dr.name IN (''db_owner'', ''db_securityadmin'', ''db_ddladmin'');
';
EXEC (@sql);
FETCH NEXT FROM db_cursor INTO @db;
END
CLOSE db_cursor;
DEALLOCATE db_cursor;
SELECT *
FROM #db_roles
ORDER BY database_name, role_name, principal_name;
6.3 SQL Agent Job Health
SELECT
j.name,
MAX(h.run_date) AS last_run_date,
SUM(CASE WHEN h.run_status = 0 THEN 1 ELSE 0 END) AS failure_count
FROM msdb.dbo.sysjobs j
LEFT JOIN msdb.dbo.sysjobhistory h ON j.job_id = h.job_id
GROUP BY j.name;
6.4 Wait Stats
SELECT TOP 20
wait_type, wait_time_ms, signal_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type NOT LIKE 'SLEEP%'
ORDER BY wait_time_ms DESC;
Reference:
https://learn.microsoft.com/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql
6.5 I/O Health
SELECT
DB_NAME(vfs.database_id) AS dbname,
vfs.io_stall_read_ms / NULLIF(vfs.num_of_reads, 0) AS read_latency_ms,
vfs.io_stall_write_ms / NULLIF(vfs.num_of_writes, 0) AS write_latency_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs
JOIN sys.master_files mf ON vfs.database_id = mf.database_id
ORDER BY read_latency_ms DESC;
Reference:
https://learn.microsoft.com/sql/relational-databases/system-dynamic-management-views/sys-dm-io-virtual-file-stats-transact-sql
6.6 Latch Diagnostics
SELECT TOP 20
wait_type, wait_time_ms, waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGELATCH%'
ORDER BY wait_time_ms DESC;
6.7 Top CPU Queries
SELECT TOP 20
total_worker_time / 1000 AS total_cpu_ms,
execution_count,
total_worker_time / execution_count / 1000 AS avg_cpu_ms,
text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle);
Part 7 – Availability Groups and HADR Health
7.1 Replica State
SELECT
ag.name,
ar.replica_server_name,
rs.role_desc,
rs.synchronization_health_desc,
rs.connected_state_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states rs ON ar.replica_id = rs.replica_id;
7.2 Database Replica State
SELECT
ag.name,
dbcs.database_name,
dbcs.is_failover_ready,
dbcs.synchronization_state_desc,
dbcs.synchronization_health_desc,
dbcs.log_send_queue_size,
dbcs.redo_queue_size
FROM sys.dm_hadr_database_replica_cluster_states dbcs
JOIN sys.availability_groups ag ON dbcs.group_id = ag.group_id;
7.3 AG Redo / Send Queue Diagnostics
SELECT
ar.replica_server_name,
drs.database_id,
DB_NAME(drs.database_id) AS database_name,
drs.redo_queue_size,
drs.redo_rate,
drs.log_send_queue_size,
drs.log_send_rate
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id;
Reference:
https://learn.microsoft.com/sql/database-engine/availability-groups/windows/monitor-availability-groups
Part 8 – Workshop Flow and Final Executive Summary
Run Order for a Same-Day Health Check
- Instance overview
- Memory / CPU / settings
- TempDB
- Database safety
- Backup and CHECKDB
- Index and statistics health
- Security
- SQL Agent jobs
- Waits, latches, I/O
- AG state
Executive Summary Template
SQL Server Health Summary – <Instance>
Overall Status: Needs Attention
High Priority
- Issues that impact stability or recoverability
Medium Priority
- Issues that impact performance
Low Priority
- Cleanup or tuning improvements
Next Steps
- Recommended actions with timelines
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


