SQL Server system_health Alerts: Deadlock Capture, Long Waits, I/O Warnings, and More
Compatibility: SQL Server 2012 and later. The system_health session has been running on every SQL Server instance since 2012 and already captures deadlock events. No new Extended Events session required. All scripts deploy into a DBA utility database.
Companion article: This article focuses on system_health-driven alerts and structured deadlock capture. For broad coverage of severity-level and error-specific Agent alerts (825, 9002, severities 16–25), see SQL Server Severity Alerts: How to Set Up Proactive Error Notifications. The two systems are designed to work together — deploy both for complete coverage.
- What system_health Is and Why You Should Use It
- The Approach: Purely Event-Driven, Nothing Polling
- Prerequisites
- Install Script: All Four Components
- Why We Use the File Target, Not the Ring Buffer
- Testing the Setup
- Verifying It Worked
- Querying the Deadlock Log
- Going Further: Other system_health Alerts You Can Add
- References
What system_health Is and Why You Should Use It
Every SQL Server instance since 2012 runs a built-in Extended Events session called system_health. It is always on, costs virtually nothing, and has been quietly collecting data about your server since the day it was installed. Most DBAs are aware of it but few take full advantage of it.
According to Microsoft’s documentation, system_health captures all of the following automatically:
- All deadlocks — including the full deadlock graph as XML
- Sessions waiting on latches for more than 15 seconds — with callstack, sql_text, and session_id
- Sessions waiting on locks for more than 30 seconds — with callstack, sql_text, and session_id
- Sessions in long preemptive waits — where SQL Server is waiting on external API calls
- CLR allocation and virtual allocation failures
- Memory broker and out-of-memory events
- Security and connectivity events
- Scheduler monitor and instance health data from
sp_server_diagnostics
If you are not using system_health as part of your daily troubleshooting toolkit, start today. It is on every server you manage right now, collecting data that answers questions you have not even asked yet. The deadlock capture system in this article is built entirely on top of it — no additional session, no additional overhead.
File target vs ring buffer: system_health writes to both a ring buffer (in memory) and a file target (.xel files in the SQL log directory). On SQL Server 2022, the ring buffer target returns no rows for xml_deadlock_report events despite them being fully present in the file target. This is a known issue with how the ring buffer serializes events on newer versions. All scripts in this article use the file target exclusively via sys.fn_xe_file_target_read_file.
The Approach: Purely Event-Driven, Nothing Polling
The goal is to capture every deadlock into a permanent table and send an immediate email notification — without adding any scheduled polling to the server. Nothing runs between deadlocks. The system sits completely idle until SQL Server itself fires the trigger.
The complete deployment has exactly four components:
| Object | Purpose |
|---|---|
| dbo.DeadlockEvents | Permanent landing table — stores every captured deadlock with victim and survivor detail, query text, and the full deadlock graph as XML |
| dbo.usp_CaptureDeadlock | Reads system_health for new deadlock events, parses victim and survivor from the XML, inserts new rows, and sends an HTML email notification |
| DBA_CaptureDeadlock | Unscheduled Agent job that calls the procedure. No schedule — it only runs when fired by the alert |
| Deadlock_Alert | SQL Server Agent alert on error message 1205. Fires the job the instant a deadlock victim is chosen |
Here is the exact chain of events every time a deadlock occurs:
Prerequisites
- A DBA utility database to hold the table, procedure, and Agent job (create one if it does not exist)
- Database Mail configured with a valid mail profile
- SQL Server Agent running
- Error 1205 must be configured to write to the Error Log — Step 1 of the install script checks and fixes this automatically
If is_event_logged = 0 on message 1205, the Agent alert will never fire because the error never reaches the Error Log. The install script checks this first and corrects it if needed. You cannot skip this step.
Install Script: All Four Components
Run all steps in sequence in your DBA utility database. Before running, update two values in the procedure: your Database Mail profile name and your DBA team email address.
Confirm Error 1205 Is Logged to the Error Log
This is the prerequisite that makes the entire chain work. If 1205 is not logged, the alert cannot fire.
IF NOT EXISTS
(
SELECT 1 FROM sys.messages
WHERE message_id = 1205
AND language_id = 1033
AND is_event_logged = 1
)
BEGIN
EXEC master.sys.sp_altermessage
@message_id = 1205,
@parameter = 'WITH_LOG',
@parameter_value = 'true';
PRINT 'Error 1205 is now set to log to the Error Log.';
END
ELSE
PRINT 'Error 1205 already logs to the Error Log. No change needed.';
GO
Create the Deadlock Landing Table
Stores each deadlock event with both victim and survivor details, the query text from each side, and the complete deadlock graph as XML for deep analysis.
USE [DBA];
GO
IF OBJECT_ID('dbo.DeadlockEvents', 'U') IS NOT NULL
DROP TABLE dbo.DeadlockEvents;
GO
CREATE TABLE dbo.DeadlockEvents
(
DeadlockID INT IDENTITY(1,1) NOT NULL,
EventTime DATETIME2(3) NOT NULL,
DatabaseName NVARCHAR(128) NULL,
-- Victim details (the session SQL Server killed)
VictimSPID INT NULL,
VictimLogin NVARCHAR(128) NULL,
VictimHost NVARCHAR(128) NULL,
VictimApp NVARCHAR(256) NULL,
VictimStatement NVARCHAR(MAX) NULL,
VictimLockMode NVARCHAR(20) NULL,
VictimWaitResource NVARCHAR(256) NULL,
-- Survivor details (the session that continued)
SurvivorSPID INT NULL,
SurvivorLogin NVARCHAR(128) NULL,
SurvivorHost NVARCHAR(128) NULL,
SurvivorApp NVARCHAR(256) NULL,
SurvivorStatement NVARCHAR(MAX) NULL,
SurvivorLockMode NVARCHAR(20) NULL,
-- Full deadlock graph for deep analysis
DeadlockGraph XML NOT NULL,
DateRecorded DATETIME2(3) NOT NULL
CONSTRAINT df_DeadlockEvents_DateRecorded DEFAULT (SYSDATETIME()),
CONSTRAINT pkc_DeadlockEvents_DeadlockID
PRIMARY KEY CLUSTERED (DeadlockID)
);
GO
Create the Capture Procedure
This procedure does all the real work. It uses the most recent EventTime in the table as a cutoff — anything newer than that in system_health is a new event to process. It parses victim and survivor from the deadlock XML, inserts the rows, and sends a concise HTML email with a frequency line so you can tell immediately whether this is a one-off or part of a larger pattern.
Update these two values before creating the procedure:
'SQLMailProfile'— change to your actual Database Mail profile name'DBATeam@YourCompany.com'— change to your DBA team email address
USE [DBA];
GO
CREATE OR ALTER PROCEDURE dbo.usp_CaptureDeadlock
AS
SET NOCOUNT ON;
/*
usp_CaptureDeadlock
-------------------
Reads system_health's .xel file target for xml_deadlock_report events
newer than the most recent row in dbo.DeadlockEvents.
Parses victim and survivor detail, inserts new rows, emails a notification.
Called by DBA_CaptureDeadlock Agent job, which is fired by Deadlock_Alert
on SQL Server error 1205. Nothing polls this procedure -- it runs only
when a deadlock actually occurs.
*/
BEGIN
-- Brief pause to allow system_health to flush the new event to the file target
WAITFOR DELAY '00:00:05';
DECLARE
@LastEventTime DATETIME2(3),
@Subject NVARCHAR(255),
@Body NVARCHAR(MAX),
@FrequencyLine NVARCHAR(200),
@CountLastHour INT,
@CountToday INT;
-- Use the most recent captured event as the cutoff for what is new
SELECT @LastEventTime = ISNULL(MAX(EventTime), '19000101')
FROM dbo.DeadlockEvents;
-- -------------------------------------------------------
-- Read new deadlock events from system_health file target
-- system_health*.xel resolves to the instance log directory
-- automatically -- no hardcoded paths, works on any server
-- -------------------------------------------------------
;WITH FileDeadlocks AS
(
SELECT CAST(event_data AS XML) AS EventXML
FROM sys.fn_xe_file_target_read_file('system_health*.xel', NULL, NULL, NULL)
WHERE object_name = 'xml_deadlock_report'
),
Parsed AS
(
SELECT
DATEADD(MINUTE,
DATEDIFF(MINUTE, GETUTCDATE(), GETDATE()),
EventXML.value('(/event/@timestamp)[1]', 'DATETIME2(3)')) AS EventTime,
EventXML AS FullEvent,
EventXML.value(
'(/event/data[@name="xml_report"]/value/deadlock/victim-list/victimProcess/@id)[1]',
'NVARCHAR(50)') AS VictimID
FROM FileDeadlocks
)
INSERT dbo.DeadlockEvents
(
EventTime, DatabaseName,
VictimSPID, VictimLogin, VictimHost, VictimApp,
VictimStatement, VictimLockMode, VictimWaitResource,
SurvivorSPID, SurvivorLogin, SurvivorHost, SurvivorApp,
SurvivorStatement, SurvivorLockMode,
DeadlockGraph
)
SELECT
p.EventTime,
v.n.value('@currentdbname', 'NVARCHAR(128)'),
v.n.value('@spid', 'INT'),
v.n.value('@loginname', 'NVARCHAR(128)'),
v.n.value('@hostname', 'NVARCHAR(128)'),
v.n.value('@clientapp', 'NVARCHAR(256)'),
v.n.value('(inputbuf)[1]', 'NVARCHAR(MAX)'),
v.n.value('@lockMode', 'NVARCHAR(20)'),
v.n.value('@waitresource', 'NVARCHAR(256)'),
s.n.value('@spid', 'INT'),
s.n.value('@loginname', 'NVARCHAR(128)'),
s.n.value('@hostname', 'NVARCHAR(128)'),
s.n.value('@clientapp', 'NVARCHAR(256)'),
s.n.value('(inputbuf)[1]', 'NVARCHAR(MAX)'),
s.n.value('@lockMode', 'NVARCHAR(20)'),
p.FullEvent
FROM Parsed p
CROSS APPLY p.FullEvent.nodes(
'/event/data[@name="xml_report"]/value/deadlock/process-list/process[@id=sql:column("p.VictimID")]'
) v(n)
CROSS APPLY p.FullEvent.nodes(
'/event/data[@name="xml_report"]/value/deadlock/process-list/process[not(@id=sql:column("p.VictimID"))]'
) s(n)
WHERE p.EventTime > @LastEventTime;
-- Nothing new to report -- exit cleanly
IF @@ROWCOUNT = 0 RETURN;
-- -------------------------------------------------------
-- Build frequency context for the email
-- Tells the reader at a glance if this is isolated or a pattern
-- -------------------------------------------------------
SELECT
@CountLastHour = SUM(CASE WHEN EventTime >= DATEADD(HOUR, -1, SYSDATETIME()) THEN 1 ELSE 0 END),
@CountToday = SUM(CASE WHEN EventTime >= CAST(CAST(SYSDATETIME() AS DATE) AS DATETIME2(3)) THEN 1 ELSE 0 END)
FROM dbo.DeadlockEvents;
SET @FrequencyLine =
CAST(@CountLastHour AS NVARCHAR(10)) + ' in the last hour, '
+ CAST(@CountToday AS NVARCHAR(10)) + ' today.';
-- -------------------------------------------------------
-- Build and send the HTML email notification
-- -------------------------------------------------------
SET @Body =
N''
+ N'Deadlock captured on ' + @@SERVERNAME + N'
'
+ N'Frequency: ' + @FrequencyLine + N'
';
SELECT @Body = @Body
+ N'Event time: ' + CONVERT(NVARCHAR(30), EventTime, 120) + N'
'
+ N'Database: ' + ISNULL(DatabaseName, '-') + N'
'
+ N'Victim SPID: ' + ISNULL(CAST(VictimSPID AS NVARCHAR(10)), '-')
+ N' (' + ISNULL(VictimLogin, '-') + N' / ' + ISNULL(VictimHost, '-') + N')
'
+ N'Victim statement:
'
+ N'
'
+ ISNULL(VictimStatement, '-') + N''
+ N'Survivor SPID: ' + ISNULL(CAST(SurvivorSPID AS NVARCHAR(10)), '-')
+ N' (' + ISNULL(SurvivorLogin, '-') + N')
'
+ N'Survivor statement:
'
+ N''
+ ISNULL(SurvivorStatement, '-') + N''
FROM dbo.DeadlockEvents
WHERE EventTime > @LastEventTime
ORDER BY EventTime DESC;
SET @Body = @Body
+ N'
Full deadlock graph stored in DBA.dbo.DeadlockEvents.
'
+ N'';
SET @Subject = @@SERVERNAME + N' - Deadlock captured';
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'SQLMailProfile', -- UPDATE THIS
@recipients = 'DBATeam@YourCompany.com', -- UPDATE THIS
@subject = @Subject,
@body = @Body,
@body_format = 'HTML',
@importance = 'High';
END;
GO
Create the Unscheduled Agent Job
This job has no schedule. It sits idle until fired by the alert in Step 5. The output file is written to the SQL log directory alongside the system_health files for easy access.
USE [msdb];
GO
-- Clean up if a previous version exists
IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE name = N'DBA_CaptureDeadlock')
EXEC msdb.dbo.sp_delete_job @job_name = N'DBA_CaptureDeadlock';
GO
DECLARE @jobId BINARY(16);
DECLARE @LogPath VARCHAR(255);
-- Build output log path in the SQL log directory
SELECT @LogPath = REPLACE(CONVERT(VARCHAR(1000), SERVERPROPERTY('ErrorLogFileName')), '\ERRORLOG', '\');
SET @LogPath = @LogPath + 'DBA_CaptureDeadlock_$(ESCAPE_SQUOTE(STRTDT)).txt';
EXEC msdb.dbo.sp_add_job
@job_name = N'DBA_CaptureDeadlock',
@enabled = 0, -- enabled in Step 6
@description = N'Fired by Deadlock_Alert on error 1205. Reads system_health for new deadlock events, logs to DBA.dbo.DeadlockEvents, and emails a summary. Not scheduled.',
@category_name = N'[Uncategorized (Local)]',
@owner_login_name = N'sa',
@job_id = @jobId OUTPUT;
EXEC msdb.dbo.sp_add_jobstep
@job_id = @jobId,
@step_name = N'Capture',
@step_id = 1,
@subsystem = N'TSQL',
@command = N'EXEC dbo.usp_CaptureDeadlock;',
@database_name = N'DBA',
@output_file_name = @LogPath,
@on_success_action = 1, -- Quit with success
@on_fail_action = 2; -- Quit with failure
EXEC msdb.dbo.sp_add_jobserver
@job_id = @jobId,
@server_name = N'(local)';
GO
Create the Alert on Error 1205
SQL Server only allows one alert per message_id. The script removes any pre-existing alert on 1205 before creating this one — regardless of what it was named.
USE [msdb];
GO
-- Remove any existing alert on message 1205 (regardless of name)
DECLARE @ExistingAlert NVARCHAR(128);
SELECT @ExistingAlert = name FROM msdb.dbo.sysalerts WHERE message_id = 1205;
IF @ExistingAlert IS NOT NULL
EXEC msdb.dbo.sp_delete_alert @name = @ExistingAlert;
GO
EXEC msdb.dbo.sp_add_alert
@name = N'Deadlock_Alert',
@message_id = 1205,
@severity = 0,
@enabled = 0, -- enabled in Step 6
@delay_between_responses = 0,
@include_event_description_in = 0,
@job_name = N'DBA_CaptureDeadlock';
GO
Enable the Alert and Job
EXEC msdb.dbo.sp_update_job @job_name = N'DBA_CaptureDeadlock', @enabled = 1;
EXEC msdb.dbo.sp_update_alert @name = N'Deadlock_Alert', @enabled = 1;
GO
Why We Use the File Target, Not the Ring Buffer
Many published examples — including some in Microsoft’s own documentation — query system_health’s ring buffer target to retrieve deadlock events. The query looks clean and simple. The problem is that on SQL Server 2022, the ring buffer returns no rows for xml_deadlock_report events, even when those events are clearly visible in the file target.
This is a known issue with how the ring buffer serializes certain event types on newer versions. All procedures in this article use sys.fn_xe_file_target_read_file with the system_health*.xel pattern, which resolves relative to the instance log directory automatically. No path hardcoding — it works on any server regardless of where SQL Server was installed.
-- This is what the procedure uses -- portable, no hardcoded paths
FROM sys.fn_xe_file_target_read_file('system_health*.xel', NULL, NULL, NULL)
WHERE object_name = 'xml_deadlock_report'
-- Avoid this pattern -- ring buffer does not work reliably on SQL Server 2022
-- SELECT CAST(target_data AS XML)
-- FROM sys.dm_xe_session_targets
-- WHERE target_name = 'ring_buffer'
Testing the Setup
Create a small test table and reproduce a classic deadlock in two query windows. One session will be chosen as the victim and receive error 1205.
Create the Test Table
USE DBA;
IF OBJECT_ID('dbo.DeadlockTest', 'U') IS NOT NULL
DROP TABLE dbo.DeadlockTest;
CREATE TABLE dbo.DeadlockTest (ID INT PRIMARY KEY, Val INT);
INSERT INTO dbo.DeadlockTest (ID, Val) VALUES (1, 0), (2, 0);
GO
Session 1 — Run First
USE DBA;
BEGIN TRAN;
UPDATE dbo.DeadlockTest WITH (HOLDLOCK) SET Val = 1 WHERE ID = 1;
WAITFOR DELAY '00:00:05';
UPDATE dbo.DeadlockTest SET Val = 1 WHERE ID = 2;
COMMIT;
Session 2 — Run Within the 5-Second Window
USE DBA;
BEGIN TRAN;
UPDATE dbo.DeadlockTest WITH (HOLDLOCK) SET Val = 2 WHERE ID = 2;
WAITFOR DELAY '00:00:05';
UPDATE dbo.DeadlockTest SET Val = 2 WHERE ID = 1;
COMMIT;
One session will be killed with the familiar message:
Msg 1205, Level 13, State 51, Line 4
Transaction (Process ID 79) was deadlocked on lock resources with another process
and has been chosen as the deadlock victim. Rerun the transaction.
Verifying It Worked
-- Did the alert fire?
SELECT name, last_occurrence_date, last_occurrence_time, occurrence_count
FROM msdb.dbo.sysalerts
WHERE name = 'Deadlock_Alert';
-- Did the job run successfully?
SELECT TOP 5
h.run_date, h.run_time, h.run_status,
h.message
FROM msdb.dbo.sysjobhistory h
JOIN msdb.dbo.sysjobs j ON h.job_id = j.job_id
WHERE j.name = 'DBA_CaptureDeadlock'
ORDER BY h.run_date DESC, h.run_time DESC;
-- Was the deadlock captured in our table?
SELECT TOP 5 *
FROM DBA.dbo.DeadlockEvents
ORDER BY EventTime DESC;
Querying the Deadlock Log
Use these queries for ongoing monitoring and trend analysis from the permanent log table:
-- Deadlocks by day — spot frequency trends
SELECT
CAST(EventTime AS DATE) AS deadlock_date,
COUNT(*) AS deadlock_count
FROM DBA.dbo.DeadlockEvents
WHERE EventTime >= DATEADD(DAY, -30, GETDATE())
GROUP BY CAST(EventTime AS DATE)
ORDER BY deadlock_date DESC;
-- Most frequently victimized logins
SELECT
VictimLogin,
COUNT(*) AS times_victim,
MIN(EventTime) AS first_seen,
MAX(EventTime) AS last_seen
FROM DBA.dbo.DeadlockEvents
GROUP BY VictimLogin
ORDER BY times_victim DESC;
-- Most common databases involved
SELECT
DatabaseName,
COUNT(*) AS deadlock_count
FROM DBA.dbo.DeadlockEvents
GROUP BY DatabaseName
ORDER BY deadlock_count DESC;
-- Retrieve the full XML graph for a specific event (for detailed analysis in SSMS)
SELECT DeadlockGraph
FROM DBA.dbo.DeadlockEvents
WHERE DeadlockID = 1;
-- Paste the XML into a .xdl file and open in SSMS for the graphical deadlock view
Going Further: Other system_health Alerts You Can Add
The deadlock capture system above uses exactly the same pattern — Agent alert on an error message fires an Agent job that reads system_health. That same architecture works for several other categories of events that system_health already captures. Here are the most useful ones to add.
Long Lock Waits (system_health captures sessions waiting on locks > 30 seconds)
System_health captures the callstack, sql_text, and session_id for any session that has waited on a lock resource for more than 30 seconds. You cannot hang an Agent alert directly off this event, but you can query it on demand or via a scheduled job:
-- Query system_health for long lock wait events
SELECT
DATEADD(MINUTE, DATEDIFF(MINUTE, GETUTCDATE(), GETDATE()),
CAST(event_data AS XML).value('(/event/@timestamp)[1]', 'DATETIME2(3)'))
AS EventTime,
CAST(event_data AS XML).value('(/event/data[@name="wait_type"]/text)[1]', 'NVARCHAR(100)') AS wait_type,
CAST(event_data AS XML).value('(/event/data[@name="duration"]/value)[1]', 'BIGINT') AS duration_ms,
CAST(event_data AS XML).value('(/event/action[@name="sql_text"]/value)[1]','NVARCHAR(MAX)') AS sql_text,
CAST(event_data AS XML).value('(/event/action[@name="session_id"]/value)[1]','INT') AS session_id
FROM sys.fn_xe_file_target_read_file('system_health*.xel', NULL, NULL, NULL)
WHERE object_name = 'sql_os_wait_info'
AND CAST(event_data AS XML).value('(/event/data[@name="duration"]/value)[1]', 'BIGINT') > 30000
ORDER BY EventTime DESC;
Error 825 — Read-Retry Required (I/O Warning)
Error 825 fires when SQL Server had to retry a page read due to an I/O error. A single occurrence can be a transient storage blip. Multiple occurrences in a short period indicate storage problems that need investigation before they progress to corruption. Set up the alert exactly the same way as the deadlock alert — on message_id 825:
-- Ensure 825 is logged to the Error Log
EXEC master.sys.sp_altermessage
@message_id = 825, @parameter = 'WITH_LOG', @parameter_value = 'true';
-- Create the alert
EXEC msdb.dbo.sp_add_alert
@name = N'IO_ReadRetry_825',
@message_id = 825,
@severity = 0,
@enabled = 1,
@delay_between_responses = 300, -- 5 minutes between repeated alerts
@include_event_description_in = 1;
EXEC msdb.dbo.sp_add_notification
@alert_name = N'IO_ReadRetry_825',
@operator_name = N'DBA Team', -- your operator name
@notification_method = 7;
Error 9002 — Transaction Log Full
When the transaction log fills completely, all writes to that database stop immediately. Error 9002 fires at that moment. An immediate alert gives the DBA time to respond before application errors cascade. The response is typically to grow the log file, back up the log, or investigate what is holding an open transaction.
EXEC msdb.dbo.sp_add_alert
@name = N'TransactionLog_Full_9002',
@message_id = 9002,
@severity = 0,
@enabled = 1,
@delay_between_responses = 60,
@include_event_description_in = 5; -- 5 = include in both email and net send
EXEC msdb.dbo.sp_add_notification
@alert_name = N'TransactionLog_Full_9002',
@operator_name = N'DBA Team',
@notification_method = 7;
Severity 22–25 — Corruption and System Errors
Severities 22 and 23 indicate table or database corruption. Severity 24 indicates a hardware problem. Severity 25 is a fatal system error that may result in SQL Server shutting down. These should always be alerted on with zero delay between responses:
-- Severity 22: table or index damaged
EXEC msdb.dbo.sp_add_alert @name = N'Severity 22',
@severity = 22, @enabled = 1, @delay_between_responses = 0,
@include_event_description_in = 1;
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 22',
@operator_name = N'DBA Team', @notification_method = 7;
-- Severity 23: suspect database
EXEC msdb.dbo.sp_add_alert @name = N'Severity 23',
@severity = 23, @enabled = 1, @delay_between_responses = 0,
@include_event_description_in = 1;
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 23',
@operator_name = N'DBA Team', @notification_method = 7;
-- Severity 24: hardware problem
EXEC msdb.dbo.sp_add_alert @name = N'Severity 24',
@severity = 24, @enabled = 1, @delay_between_responses = 0,
@include_event_description_in = 1;
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 24',
@operator_name = N'DBA Team', @notification_method = 7;
-- Severity 25: fatal system error
EXEC msdb.dbo.sp_add_alert @name = N'Severity 25',
@severity = 25, @enabled = 1, @delay_between_responses = 0,
@include_event_description_in = 1;
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 25',
@operator_name = N'DBA Team', @notification_method = 7;
Long Latch Waits (system_health captures sessions waiting on latches > 15 seconds)
System_health also captures sessions that have waited on latches for more than 15 seconds. Latch waits this long usually indicate I/O pressure, NUMA imbalance, or hot pages. Query the file target on demand:
-- Query system_health for long latch wait events
SELECT
DATEADD(MINUTE, DATEDIFF(MINUTE, GETUTCDATE(), GETDATE()),
CAST(event_data AS XML).value('(/event/@timestamp)[1]', 'DATETIME2(3)'))
AS EventTime,
CAST(event_data AS XML).value('(/event/data[@name="class"]/text)[1]', 'NVARCHAR(60)') AS latch_class,
CAST(event_data AS XML).value('(/event/data[@name="duration"]/value)[1]', 'BIGINT') AS duration_ms,
CAST(event_data AS XML).value('(/event/action[@name="sql_text"]/value)[1]', 'NVARCHAR(MAX)') AS sql_text,
CAST(event_data AS XML).value('(/event/action[@name="session_id"]/value)[1]','INT') AS session_id
FROM sys.fn_xe_file_target_read_file('system_health*.xel', NULL, NULL, NULL)
WHERE object_name = 'latch_suspend_end'
AND CAST(event_data AS XML).value('(/event/data[@name="duration"]/value)[1]', 'BIGINT') > 15000
ORDER BY EventTime DESC;
The pattern is always the same: if an error message number exists for the event, create a SQL Server Agent alert on that message_id. If there is no error message — like long latch waits — query the system_health file target directly. The data is already being collected. You just need to read it.
References
- Microsoft Learn – Use the system_health Session
- Microsoft Learn – MSSQLSERVER_1205 (Deadlock Victim)
- Microsoft Learn – MSSQLSERVER_825 (Read-Retry Required)
- Microsoft Learn – MSSQLSERVER_9002 (Transaction Log Full)
- Microsoft Learn – Database Engine Error Severities
- Microsoft Learn – SQL Server Deadlock Guide
- Microsoft Learn – sys.fn_xe_file_target_read_file
- Microsoft Learn – SQL Server Agent Alerts
This implementation is based on an approach published by sqlfingers.com. The additional alert patterns, system_health event inventory, query log analysis, and SQLYARD formatting were added by David Yard.
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


