Introduction
SQL Server replication relies heavily on background jobs managed by SQL Server Agent. These jobs move data, track progress, clean up history, and keep your monitoring tools up to date. If they’re missing, disabled, or failing, replication performance and reliability suffer.
This guide walks through:
- The key SQL Agent jobs installed with replication
- What each one does and how critical it is
- Which settings you should consider adjusting — and which to leave alone
- A full troubleshooting checklist
- A hands-on workshop for junior DBAs
1. Core Replication Jobs: What They Are and What They Do
When you configure replication, SQL Server automatically creates several Agent jobs behind the scenes. Here’s a reference you can keep:
| Job Name | Purpose | Should You Modify It? | Notes |
|---|---|---|---|
| Snapshot Agent | Generates the initial snapshot for snapshot or transactional replication. | Rarely. Usually controlled through replication UI or sp_addpublication. | Only rerun or reschedule if initial snapshot performance is an issue. |
| Log Reader Agent | Reads changes from the publisher’s transaction log and writes them to the distribution database. | Rarely. Use agent profiles to tune performance if latency is high. | High CPU or blocking here affects the entire replication chain. |
| Distribution Agent | Moves transactions from the distribution database to subscribers. | Rarely. Use agent profiles or parallel agents for performance tuning. | Latency here means commands are waiting in MSrepl_commands. |
| Merge Agent | Applies and reconciles changes between publisher and subscriber in merge replication. | Rarely. Tuning merge replication is more complex — test thoroughly before changes. | |
| Distribution Cleanup: Distribution | Deletes old commands and transactions from the distributor. | Yes (with care). Adjust frequency and retention settings to keep distribution tables small. | Defaults: runs every 10 minutes, deletes commands older than retention. |
| Agent History Cleanup: Distribution | Purges old replication agent history records. | Yes (with care). Adjust retention (default 48h) if tables grow too large. | Can significantly reduce bloat in MSdistribution_history. |
| Replication Monitoring Refresher for Distribution | Updates replication monitoring tables. Used by Replication Monitor. | No. Leave it enabled and scheduled. | If disabled, Replication Monitor will show stale data. |
| Replication Agents Checkup | Checks that replication agents are running and updates monitoring tables. | No. Usually leave this job alone. | Failure here doesn’t break replication but affects monitoring accuracy. |
| Reinitialize Subscriptions Having Data Validation Failures | Reinitializes subscriptions automatically when validation errors occur. | Usually disabled unless auto-reinitialization is desired. | Many DBAs prefer to reinitialize manually to control timing. |
| syspolicy_purge_history | Purges policy-based management history. | Not replication-related. | Safe to ignore in replication context. |
2. Jobs You Should Tune (and How)
While most replication jobs should be left alone, a few require periodic review and tuning:
Distribution Cleanup: Distribution
- Default: Runs every 10 minutes, deletes transactions older than retention period.
- Why it matters: If this job falls behind,
MSrepl_commandsandMSrepl_transactionsballoon in size, causing disk pressure and slower queries. - What you can tune:
- Schedule frequency (more often if high volume)
- Batch size (
@min_distretention,@max_distretention) - Retention period for commands
Example to change retention:
USE distribution;
EXEC sp_changedistributiondb @min_distretention = 0, @max_distretention = 24; -- 24 hours
Agent History Cleanup: Distribution
- Default: Runs every 10 minutes, deletes agent history older than 48 hours.
- Why it matters: History tables (
MSdistribution_history,MSlogreader_history) grow quickly and cause slow queries if not purged. - What you can tune:
- History retention (
@history_retention) - Job schedule
- History retention (
Example to change history retention:
USE distribution;
EXEC sp_changedistributiondb @history_retention = 24; -- 24 hours
Immediate Sync and Snapshot Options
- If
immediate_syncis enabled, commands and snapshots are retained longer to allow new subscriptions without regenerating a snapshot. - Turn this off unless you need it:
EXEC sp_changepublication
@publication = 'YourPublication',
@property = 'immediate_sync',
@value = 'false';
3. Jobs You Should Leave Alone
Some replication jobs should almost never be modified directly:
- Replication Monitoring Refresher for Distribution – Essential for keeping replication monitoring accurate. Leave enabled and on its default schedule.
- Replication Agents Checkup – Supports monitoring health. Adjust only if directed by Microsoft.
- Reinitialize Subscriptions Having Data Validation Failures – Typically disabled by default. Enable only if you want auto-reinitialization.
- Log Reader, Snapshot, Distribution, Merge Agents – Controlled via replication configuration and agent profiles, not by altering the job itself.
4. Troubleshooting and Best Practices
When replication performance drops or jobs fail, follow this process:
- Check job history
- Identify failed or long-running jobs (
msdb.dbo.sysjobhistory).
- Identify failed or long-running jobs (
- Look at replication latency
- Use Replication Monitor or query
msdistribution_statusfor delivery times.
- Use Replication Monitor or query
- Check undistributed commands
EXEC sp_replmonitorsubscriptionpendingcmds
@publisher = 'YourPublisher',
@publisher_db = 'YourDatabase',
@publication = 'YourPublication',
@subscriber = 'YourSubscriber',
@subscriber_db = 'YourSubDB';
4. Investigate cleanup job status
If tables like MSrepl_commands are large, cleanup may not be keeping up.
Monitor blocking
Use sys.dm_exec_requests and sys.dm_tran_locks to check for locks affecting cleanup jobs.
Review agent profiles
Adjust polling intervals, batch sizes, or logging levels if agents are struggling under load.
Don’t blindly apply index suggestions on system tables
Tools may recommend indexing replication tables. Always consult Microsoft support before altering them.
5. Hands-On Workshop for Junior DBAs
Here’s a practical flow to follow in a lab or staging environment:
View all replication jobs:
SELECT job_id, name, enabled
FROM msdb.dbo.sysjobs
WHERE name LIKE '%replication%' OR name LIKE '%distribution%';
2. Check table sizes:
SELECT t.name, SUM(p.rows) AS RowCount
FROM sys.tables t
JOIN sys.partitions p ON t.object_id = p.object_id
WHERE t.name LIKE 'MS%'
GROUP BY t.name;
3. Force cleanup manually:
EXEC dbo.sp_MSdistribution_cleanup;
4. Adjust retention and re-run cleanup:
USE distribution;
EXEC sp_changedistributiondb @history_retention = 24;
5. Monitor undistributed commands and latency:
Use sp_replmonitorsubscriptionpendingcmds and Replication Monitor.
Document everything:
Keep a log of default settings, changes, and test outcomes.
Note which jobs should remain untouched and why.
Final Thoughts
Most replication issues trace back to poor cleanup, excessive history retention, or unchecked table growth — not missing indexes or broken agents. While tools like SolarWinds DPA or Redgate Monitor provide helpful insights, they can’t know which objects are off-limits. Use them as advisors, not authorities.
Key takeaways:
- Tune only the cleanup and retention-related jobs.
- Leave monitoring and system-owned jobs untouched unless directed by Microsoft.
- Monitor and document all changes.
- Always test changes in a non-production environment first.
A well-maintained distributor database is quiet, fast, and invisible — and that’s exactly how replication should feel.
References
Performance Tuning Replication Checklist – Brent Ozar
Performance Statistics for Log Reader and Distribution Agents – Microsoft Docs
Replication Agent History Cleanup Job – ReplTalk
Replication Enhancement – Improved Distribution Database Cleanup – Microsoft TechCommunity
/* ==============================================================
Quick Replication Health Check (resilient to catalog differences)
Focus: sp_MSget_repl_commands slowdowns
Usage: connect to the Distributor and run as one batch
==============================================================*/
SET NOCOUNT ON;
-- CONFIG
DECLARE @distribution_db sysname = N'distribution'; -- change if different
PRINT 'Server time: ' + CONVERT(varchar(30), GETDATE(), 120);
PRINT 'Distribution DB: ' + @distribution_db;
---------------------------------------------------------------
-- 1) Distribution cleanup job status and recent history
---------------------------------------------------------------
PRINT '--- 1) Distribution cleanup job status and last 10 runs ---';
;WITH j AS (
SELECT job_id, name, enabled
FROM msdb.dbo.sysjobs
WHERE name LIKE 'Distribution clean up:%'
)
SELECT TOP (10)
j.name AS job_name,
CASE j.enabled WHEN 1 THEN 'Enabled' ELSE 'Disabled' END AS job_enabled,
h.run_date, h.run_time, h.run_duration,
CASE h.run_status WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded'
WHEN 2 THEN 'Retry' WHEN 3 THEN 'Canceled'
ELSE 'Unknown' END AS run_status,
h.message
FROM msdb.dbo.sysjobhistory h
JOIN j ON h.job_id = j.job_id
ORDER BY h.instance_id DESC;
---------------------------------------------------------------
-- 2) Retention settings on the distribution database
---------------------------------------------------------------
PRINT '--- 2) Distribution retention settings ---';
EXEC sp_helpdistributiondb @database = @distribution_db;
---------------------------------------------------------------
-- 3) Size and row counts of key distribution tables
---------------------------------------------------------------
DECLARE @sql nvarchar(max) =
N'SET NOCOUNT ON;
USE ' + QUOTENAME(@distribution_db) + N';
PRINT ''--- 3) Table sizes and row counts ---'';
IF OBJECT_ID(''dbo.msrepl_commands'') IS NOT NULL EXEC sp_spaceused ''dbo.msrepl_commands'';
IF OBJECT_ID(''dbo.msrepl_transactions'') IS NOT NULL EXEC sp_spaceused ''dbo.msrepl_transactions'';
';
EXEC (@sql);
---------------------------------------------------------------
-- 4) Pending commands per subscription and estimated latency
-- Robust: uses publisher_id if present, else publisher name.
---------------------------------------------------------------
PRINT '--- 4) Pending commands per subscription (all publications) ---';
IF OBJECT_ID('tempdb..#subs') IS NOT NULL DROP TABLE #subs;
IF OBJECT_ID('tempdb..#pending') IS NOT NULL DROP TABLE #pending;
CREATE TABLE #subs (
publisher_id int NULL,
publisher sysname NULL,
publisher_db sysname NOT NULL,
publication sysname NOT NULL
);
-- Detect which columns exist and load #subs accordingly
DECLARE @load_subs nvarchar(max) =
N'
USE ' + QUOTENAME(@distribution_db) + N';
DECLARE @has_pubid bit = CASE WHEN EXISTS (
SELECT 1 FROM sys.columns
WHERE object_id = OBJECT_ID(''' + QUOTENAME(@distribution_db) + N'.dbo.MSdistribution_agents'')
AND name = ''publisher_id'') THEN 1 ELSE 0 END;
DECLARE @has_pubname bit = CASE WHEN EXISTS (
SELECT 1 FROM sys.columns
WHERE object_id = OBJECT_ID(''' + QUOTENAME(@distribution_db) + N'.dbo.MSdistribution_agents'')
AND name = ''publisher'') THEN 1 ELSE 0 END;
IF @has_pubid = 1
BEGIN
INSERT #subs (publisher_id, publisher, publisher_db, publication)
SELECT DISTINCT a.publisher_id, NULL, a.publisher_db, a.publication
FROM dbo.MSdistribution_agents AS a
JOIN dbo.MSpublications AS p
ON a.publisher_db = p.publisher_db
AND a.publication = p.publication
WHERE p.publication IS NOT NULL;
END
ELSE IF @has_pubname = 1
BEGIN
INSERT #subs (publisher_id, publisher, publisher_db, publication)
SELECT DISTINCT NULL, a.publisher, a.publisher_db, a.publication
FROM dbo.MSdistribution_agents AS a
JOIN dbo.MSpublications AS p
ON a.publisher_db = p.publisher_db
AND a.publication = p.publication
WHERE p.publication IS NOT NULL;
END
ELSE
BEGIN
RAISERROR(''MSdistribution_agents has neither publisher_id nor publisher column.'', 16, 1);
END
';
EXEC (@load_subs);
CREATE TABLE #pending(
publisher_id int NULL,
publisher sysname NULL,
publisher_db sysname NOT NULL,
publication sysname NOT NULL,
pending_commands int NULL,
est_completion_time_sec int NULL
);
DECLARE @publisher_id int, @publisher sysname, @publisher_db sysname, @publication sysname;
DECLARE c CURSOR FAST_FORWARD FOR
SELECT publisher_id, publisher, publisher_db, publication FROM #subs;
OPEN c;
FETCH NEXT FROM c INTO @publisher_id, @publisher, @publisher_db, @publication;
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
IF @publisher_id IS NOT NULL
BEGIN
INSERT #pending(publisher_id, publisher, publisher_db, publication, pending_commands, est_completion_time_sec)
EXEC sp_replmonitorsubscriptionpendingcmds
@publisher_id = @publisher_id,
@publisher_db = @publisher_db,
@publication = @publication;
END
ELSE
BEGIN
INSERT #pending(publisher_id, publisher, publisher_db, publication, pending_commands, est_completion_time_sec)
EXEC sp_replmonitorsubscriptionpendingcmds
@publisher = @publisher,
@publisher_db = @publisher_db,
@publication = @publication;
END
END TRY
BEGIN CATCH
PRINT 'PendingCmds failed for '
+ COALESCE('publisher_id=' + CAST(@publisher_id AS nvarchar(20)),'publisher=' + ISNULL(@publisher,'?'))
+ ' | db=' + ISNULL(@publisher_db,'?')
+ ' | pub=' + ISNULL(@publication,'?')
+ ' : ' + ERROR_MESSAGE();
END CATCH;
FETCH NEXT FROM c INTO @publisher_id, @publisher, @publisher_db, @publication;
END
CLOSE c; DEALLOCATE c;
SELECT *
FROM #pending
ORDER BY pending_commands DESC;
---------------------------------------------------------------
-- 5) Active requests and blocking touching msrepl_* or sp_MSget_repl_commands
---------------------------------------------------------------
PRINT '--- 5) Active requests and blocking in the distributor ---';
;WITH R AS (
SELECT r.session_id,
r.blocking_session_id,
r.wait_type,
r.wait_time,
r.wait_resource,
DB_NAME(r.database_id) AS dbname,
r.cpu_time,
r.logical_reads,
r.status,
r.command,
r.start_time,
r.sql_handle,
r.plan_handle,
r.statement_start_offset,
r.statement_end_offset
FROM sys.dm_exec_requests r
)
SELECT
R.session_id, R.blocking_session_id, R.dbname, R.status, R.command,
R.wait_type, R.wait_time, R.wait_resource, R.cpu_time, R.logical_reads, R.start_time,
SUBSTRING(T.text,
(R.statement_start_offset/2)+1,
CASE WHEN R.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max), T.text)) - (R.statement_start_offset/2) + 1
ELSE (R.statement_end_offset - R.statement_start_offset)/2 + 1 END) AS running_stmt,
T.text AS full_batch_text
FROM R
CROSS APPLY sys.dm_exec_sql_text(R.sql_handle) AS T
WHERE (R.dbname = @distribution_db
OR T.text LIKE '%msrepl_%'
OR T.text LIKE '%sp_MSget_repl_commands%')
ORDER BY R.wait_time DESC;
---------------------------------------------------------------
-- 6) Index fragmentation on msrepl_commands / msrepl_transactions
---------------------------------------------------------------
PRINT '--- 6) Index fragmentation on msrepl_commands / msrepl_transactions ---';
SET @sql =
N'USE ' + QUOTENAME(@distribution_db) + N';
SELECT
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
ips.index_type_desc,
ips.avg_fragmentation_in_percent,
ips.page_count
FROM sys.indexes i
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), i.object_id, i.index_id, NULL, ''SAMPLED'') ips
WHERE OBJECT_NAME(i.object_id) IN (''msrepl_commands'',''msrepl_transactions'')
ORDER BY avg_fragmentation_in_percent DESC, page_count DESC;';
EXEC (@sql);
---------------------------------------------------------------
-- 7) Top waits snapshot (server-wide) for context
---------------------------------------------------------------
PRINT '--- 7) Top waits snapshot (server-wide) ---';
SELECT TOP (15)
wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type NOT LIKE 'SLEEP%'
AND wait_type NOT LIKE 'BROKER_TASK_STOP%'
ORDER BY wait_time_ms DESC;
PRINT '--- Done ---';
Discover more from SQLyard
Subscribe to get the latest posts sent to your email.


