SQL Server Health Monitoring with AI: DMVs, Alerts, and Automated Collection

Building a Complete SQL Server Health Monitoring System with AI – SQLYARD

Building a Complete SQL Server Health Monitoring System with AI


SQL Server environments rarely fail suddenly. Most performance problems develop slowly through patterns such as increasing CPU usage, growing I/O pressure, blocking chains, or poorly performing queries.

Experienced database administrators understand that the key to stable systems is proactive monitoring, not reactive troubleshooting. Modern monitoring systems collect telemetry continuously and provide early warning signals before users experience downtime.

Artificial intelligence tools such as Amazon Q, Claude, and other AI assistants can now help engineers generate monitoring scripts, interpret metrics, and design monitoring architectures faster than ever before. However, AI should be used as an acceleration tool, not a replacement for database expertise.

In this article you will learn:

  • How SQL Server monitoring architecture works
  • Which performance metrics matter most
  • How to build a monitoring framework from scratch
  • How to use Dynamic Management Views (DMVs) for telemetry
  • How AI can accelerate script generation and diagnostics

Why SQL Server Monitoring Matters

SQL Server workloads constantly change. Queries evolve, indexes grow, data volumes increase, and new applications introduce unpredictable workloads.

Without monitoring, problems appear as slow application response, blocked transactions, database timeouts, excessive CPU utilization, and high disk latency.

Microsoft documents that DMVs provide real-time insight into SQL Server health and performance and are one of the most important tools available to DBAs. These views allow engineers to observe active queries, resource utilization, wait statistics, I/O activity, and session blocking. Monitoring systems collect this information continuously and store it for historical analysis.

Core Metrics Every Monitoring System Should Track

A complete monitoring system tracks several categories of performance signals.

CPU Utilization

CPU pressure occurs when SQL Server processes large query workloads or inefficient execution plans. Monitoring CPU usage helps identify poorly optimized queries, missing indexes, expensive joins, and parallelism issues. Microsoft recommends monitoring sys.dm_os_schedulers and wait statistics to identify CPU pressure patterns.

Memory Pressure

SQL Server uses memory aggressively to cache data pages. Memory pressure can cause Page Life Expectancy drops, excessive disk reads, and query slowdown. Key indicators include Page Life Expectancy (PLE), memory grants pending, and buffer pool pressure.

Blocking Sessions

Blocking occurs when one session holds locks that prevent another session from executing. This often causes application timeouts and user complaints. SQL Server exposes blocking details through sys.dm_exec_requests. Blocking detection is one of the most important monitoring checks for transactional systems.

I/O Performance

Disk I/O latency can significantly degrade database performance. Monitoring disk activity helps detect slow storage subsystems, excessive log writes, large scans, and tempdb pressure. SQL Server exposes disk activity through sys.dm_io_virtual_file_stats.

Long Running Queries

Queries that run too long often indicate missing indexes, inefficient joins, large table scans, or parameter sniffing issues. SQL Server stores query performance statistics inside sys.dm_exec_query_stats, which provides historical execution metrics.

SQL Server Monitoring Architecture

A typical monitoring architecture follows this layered structure, where each layer has a specific responsibility:

SQL Server Instance Dynamic Management Views (DMVs) Monitoring Collection Scripts Data Collection Tables Reporting Layer Dashboards / Alerts

Enterprise monitoring platforms such as SQL Server Management Data Warehouse, Redgate SQL Monitor, and Azure Monitor all follow this same architecture.

The AI Layer in SQL Server Monitoring

Traditional monitoring systems rely on static thresholds and manual analysis — alert if CPU > 80%, alert if blocking exists, alert if a query runs longer than a set value. This approach works, but it produces too many false alerts, lacks pattern recognition, and is entirely reactive.

This is where AI changes the system. The updated architecture places an AI Analysis Layer between data collection and alerting:

SQL Server Instance → DMVs → Collection Scripts → Data Tables AI Analysis Layer Alerts / Dashboards / Automation

Instead of fixed rules, AI analyzes patterns across historical data. It can detect anomalies in query performance, identify unusual blocking behavior, recognize execution time regressions, and correlate CPU, memory, and I/O spikes.

Without AI

  • DBAs manually investigate alerts
  • Root cause analysis takes time
  • Issues are purely reactive
  • Alert: “query > 2 seconds”

With AI

  • Anomalies identified automatically
  • Insights generated instantly
  • Systems become predictive
  • “Query normally 50ms, now 800ms”

Practical AI Integration Options

  • External AI Analysis (Fastest) — Export monitoring data, send to Claude or Amazon Q, receive insights.
  • API-Based Pipeline — SQL → API → AI model → results stored back in SQL.
  • Automated Alert Enrichment — Send the query plus execution stats plus an AI-generated explanation. Your alert becomes immediately actionable.

Detecting Anomalies in SQL Before Sending to AI

You can flag anomalies directly in SQL to prepare an AI-ready dataset:

WITH Baseline AS (
    SELECT
        QueryText,
        AVG(AvgExecutionTime) AS BaselineTime
    FROM DBA_Monitoring.dbo.LongRunningQueries
    GROUP BY QueryText
)
SELECT
    l.QueryText,
    l.AvgExecutionTime,
    b.BaselineTime,
    CASE
        WHEN l.AvgExecutionTime > b.BaselineTime * 3 THEN 'ANOMALY'
        ELSE 'NORMAL'
    END AS Status
FROM DBA_Monitoring.dbo.LongRunningQueries l
JOIN Baseline b ON l.QueryText = b.QueryText;

Key DMV Queries Explained

Detecting Blocking Sessions

Blocking detection is often the first script deployed in a monitoring system. The following query identifies sessions currently blocked by other sessions:

SELECT
    blocking_session_id,
    session_id,
    wait_type,
    wait_time,
    wait_resource
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0;

This returns the blocking session ID, the blocked session ID, the wait type, and wait duration — helping DBAs quickly identify which query is causing a blocking chain.

Detecting Long Running Queries

The following query analyzes historical execution statistics to surface inefficient plans or missing indexes:

SELECT
    qs.execution_count,
    qs.total_elapsed_time / qs.execution_count AS avg_execution_time,
    qt.text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY avg_execution_time DESC;

This returns the number of executions, average execution time, and the full query text.

Monitoring Disk I/O Activity

The following query summarizes read and write operations per database to reveal storage bottlenecks:

SELECT
    DB_NAME(database_id) AS DatabaseName,
    SUM(num_of_reads)  AS Reads,
    SUM(num_of_writes) AS Writes
FROM sys.dm_io_virtual_file_stats(NULL, NULL)
GROUP BY database_id;

Workshop: Build the Full Monitoring System

This hands-on lab walks through building a basic but functional monitoring framework. The goal is to create a system that collects telemetry, stores historical metrics, runs automatically, and supports dashboard visualization.

1

Create a Monitoring Database

Create a dedicated database for telemetry storage. Keeping monitoring data separate prevents interference with production workloads.

CREATE DATABASE DBA_Monitoring;
GO
USE DBA_Monitoring;
GO
2

Create Monitoring Tables

-- Blocking sessions history
CREATE TABLE BlockingHistory (
    CaptureTime       DATETIME DEFAULT GETDATE(),
    BlockingSessionID INT,
    BlockedSessionID  INT,
    WaitType          NVARCHAR(120),
    WaitTime          INT
);

-- Long running queries history
CREATE TABLE LongRunningQueries (
    CaptureTime      DATETIME DEFAULT GETDATE(),
    ExecutionCount   INT,
    AvgExecutionTime BIGINT,
    QueryText        NVARCHAR(MAX)
);

-- IO statistics history
CREATE TABLE IOStats (
    CaptureTime  DATETIME DEFAULT GETDATE(),
    DatabaseName NVARCHAR(128),
    Reads        BIGINT,
    Writes       BIGINT
);
3

Create Collection Scripts

-- Blocking collector
INSERT INTO DBA_Monitoring.dbo.BlockingHistory
    (BlockingSessionID, BlockedSessionID, WaitType, WaitTime)
SELECT blocking_session_id, session_id, wait_type, wait_time
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0;

-- Long running query collector
INSERT INTO DBA_Monitoring.dbo.LongRunningQueries
    (ExecutionCount, AvgExecutionTime, QueryText)
SELECT
    qs.execution_count,
    qs.total_elapsed_time / qs.execution_count,
    qt.text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt;

-- IO statistics collector
INSERT INTO DBA_Monitoring.dbo.IOStats (DatabaseName, Reads, Writes)
SELECT
    DB_NAME(database_id),
    SUM(num_of_reads),
    SUM(num_of_writes)
FROM sys.dm_io_virtual_file_stats(NULL, NULL)
GROUP BY database_id;
4

Schedule Monitoring Jobs with SQL Server Agent

Use SQL Server Agent to automate collection at the following recommended frequencies:

MetricFrequency
Blocking detectionEvery 1 minute
Long running queriesEvery 5 minutes
IO metricsEvery 5 minutes
5

Generate Sample Data for Testing

Simulate load by running the following query across multiple sessions simultaneously. Your monitoring tables should begin collecting telemetry.

SELECT TOP 100000 *
FROM sys.objects a
CROSS JOIN sys.objects b;
6

Visualize Monitoring Data

Connect collected metrics to a visualization tool:

  • Power BI — Native SQL Server connector, rich dashboarding
  • Grafana — Open source, excellent for time-series displays
  • SQL Server Reporting Services — Built into the Microsoft stack

Summary and Final Thoughts

A well-designed monitoring system allows DBAs to detect performance problems early and maintain database stability. AI tools can accelerate development of monitoring scripts but should always be validated by experienced engineers.

The future belongs to database professionals who combine deep SQL expertise with AI-assisted development. Those engineers will build faster systems, troubleshoot issues more efficiently, and design more scalable data platforms.

References


Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading