SQL Server Replication Jobs Explained: What They Do, How to Tune Them, and When to Leave Them Alone

Introduction

SQL Server replication relies heavily on background jobs managed by SQL Server Agent. These jobs move data, track progress, clean up history, and keep your monitoring tools up to date. If they’re missing, disabled, or failing, replication performance and reliability suffer.

This guide walks through:

  • The key SQL Agent jobs installed with replication
  • What each one does and how critical it is
  • Which settings you should consider adjusting — and which to leave alone
  • A full troubleshooting checklist
  • A hands-on workshop for junior DBAs

1. Core Replication Jobs: What They Are and What They Do

When you configure replication, SQL Server automatically creates several Agent jobs behind the scenes. Here’s a reference you can keep:

Job NamePurposeShould You Modify It?Notes
Snapshot AgentGenerates the initial snapshot for snapshot or transactional replication.Rarely. Usually controlled through replication UI or sp_addpublication.Only rerun or reschedule if initial snapshot performance is an issue.
Log Reader AgentReads changes from the publisher’s transaction log and writes them to the distribution database.Rarely. Use agent profiles to tune performance if latency is high.High CPU or blocking here affects the entire replication chain.
Distribution AgentMoves transactions from the distribution database to subscribers.Rarely. Use agent profiles or parallel agents for performance tuning.Latency here means commands are waiting in MSrepl_commands.
Merge AgentApplies and reconciles changes between publisher and subscriber in merge replication.Rarely. Tuning merge replication is more complex — test thoroughly before changes.
Distribution Cleanup: DistributionDeletes old commands and transactions from the distributor.Yes (with care). Adjust frequency and retention settings to keep distribution tables small.Defaults: runs every 10 minutes, deletes commands older than retention.
Agent History Cleanup: DistributionPurges old replication agent history records.Yes (with care). Adjust retention (default 48h) if tables grow too large.Can significantly reduce bloat in MSdistribution_history.
Replication Monitoring Refresher for DistributionUpdates replication monitoring tables. Used by Replication Monitor.No. Leave it enabled and scheduled.If disabled, Replication Monitor will show stale data.
Replication Agents CheckupChecks that replication agents are running and updates monitoring tables.No. Usually leave this job alone.Failure here doesn’t break replication but affects monitoring accuracy.
Reinitialize Subscriptions Having Data Validation FailuresReinitializes subscriptions automatically when validation errors occur.Usually disabled unless auto-reinitialization is desired.Many DBAs prefer to reinitialize manually to control timing.
syspolicy_purge_historyPurges policy-based management history.Not replication-related.Safe to ignore in replication context.

2. Jobs You Should Tune (and How)

While most replication jobs should be left alone, a few require periodic review and tuning:

Distribution Cleanup: Distribution

  • Default: Runs every 10 minutes, deletes transactions older than retention period.
  • Why it matters: If this job falls behind, MSrepl_commands and MSrepl_transactions balloon in size, causing disk pressure and slower queries.
  • What you can tune:
    • Schedule frequency (more often if high volume)
    • Batch size (@min_distretention, @max_distretention)
    • Retention period for commands

Example to change retention:

USE distribution;
EXEC sp_changedistributiondb @min_distretention = 0, @max_distretention = 24;  -- 24 hours

Agent History Cleanup: Distribution

  • Default: Runs every 10 minutes, deletes agent history older than 48 hours.
  • Why it matters: History tables (MSdistribution_history, MSlogreader_history) grow quickly and cause slow queries if not purged.
  • What you can tune:
    • History retention (@history_retention)
    • Job schedule

Example to change history retention:

USE distribution;
EXEC sp_changedistributiondb @history_retention = 24;  -- 24 hours

Immediate Sync and Snapshot Options

  • If immediate_sync is enabled, commands and snapshots are retained longer to allow new subscriptions without regenerating a snapshot.
  • Turn this off unless you need it:
EXEC sp_changepublication 
  @publication = 'YourPublication', 
  @property = 'immediate_sync', 
  @value = 'false';

3. Jobs You Should Leave Alone

Some replication jobs should almost never be modified directly:

  • Replication Monitoring Refresher for Distribution – Essential for keeping replication monitoring accurate. Leave enabled and on its default schedule.
  • Replication Agents Checkup – Supports monitoring health. Adjust only if directed by Microsoft.
  • Reinitialize Subscriptions Having Data Validation Failures – Typically disabled by default. Enable only if you want auto-reinitialization.
  • Log Reader, Snapshot, Distribution, Merge Agents – Controlled via replication configuration and agent profiles, not by altering the job itself.

4. Troubleshooting and Best Practices

When replication performance drops or jobs fail, follow this process:

  1. Check job history
    • Identify failed or long-running jobs (msdb.dbo.sysjobhistory).
  2. Look at replication latency
    • Use Replication Monitor or query msdistribution_status for delivery times.
  3. Check undistributed commands
EXEC sp_replmonitorsubscriptionpendingcmds 
  @publisher = 'YourPublisher', 
  @publisher_db = 'YourDatabase', 
  @publication = 'YourPublication', 
  @subscriber = 'YourSubscriber', 
  @subscriber_db = 'YourSubDB';

4. Investigate cleanup job status

If tables like MSrepl_commands are large, cleanup may not be keeping up.

Monitor blocking

Use sys.dm_exec_requests and sys.dm_tran_locks to check for locks affecting cleanup jobs.

Review agent profiles

Adjust polling intervals, batch sizes, or logging levels if agents are struggling under load.

Don’t blindly apply index suggestions on system tables

Tools may recommend indexing replication tables. Always consult Microsoft support before altering them.


5. Hands-On Workshop for Junior DBAs

Here’s a practical flow to follow in a lab or staging environment:

View all replication jobs:

SELECT job_id, name, enabled 
FROM msdb.dbo.sysjobs 
WHERE name LIKE '%replication%' OR name LIKE '%distribution%';

2. Check table sizes:

SELECT t.name, SUM(p.rows) AS RowCount
FROM sys.tables t
JOIN sys.partitions p ON t.object_id = p.object_id
WHERE t.name LIKE 'MS%'
GROUP BY t.name;

3. Force cleanup manually:

EXEC dbo.sp_MSdistribution_cleanup;

4. Adjust retention and re-run cleanup:

USE distribution;
EXEC sp_changedistributiondb @history_retention = 24;

5. Monitor undistributed commands and latency:

Use sp_replmonitorsubscriptionpendingcmds and Replication Monitor.

Document everything:

Keep a log of default settings, changes, and test outcomes.

Note which jobs should remain untouched and why.


Final Thoughts

Most replication issues trace back to poor cleanup, excessive history retention, or unchecked table growth — not missing indexes or broken agents. While tools like SolarWinds DPA or Redgate Monitor provide helpful insights, they can’t know which objects are off-limits. Use them as advisors, not authorities.

Key takeaways:

  • Tune only the cleanup and retention-related jobs.
  • Leave monitoring and system-owned jobs untouched unless directed by Microsoft.
  • Monitor and document all changes.
  • Always test changes in a non-production environment first.

A well-maintained distributor database is quiet, fast, and invisible — and that’s exactly how replication should feel.


References

Performance Tuning Replication Checklist – Brent Ozar

Performance Statistics for Log Reader and Distribution Agents – Microsoft Docs

Replication Agent History Cleanup Job – ReplTalk

Replication Enhancement – Improved Distribution Database Cleanup – Microsoft TechCommunity

/* ==============================================================
   Quick Replication Health Check (resilient to catalog differences)
   Focus: sp_MSget_repl_commands slowdowns
   Usage: connect to the Distributor and run as one batch
   ==============================================================*/
SET NOCOUNT ON;

-- CONFIG
DECLARE @distribution_db sysname = N'distribution'; -- change if different

PRINT 'Server time: ' + CONVERT(varchar(30), GETDATE(), 120);
PRINT 'Distribution DB: ' + @distribution_db;

---------------------------------------------------------------
-- 1) Distribution cleanup job status and recent history
---------------------------------------------------------------
PRINT '--- 1) Distribution cleanup job status and last 10 runs ---';
;WITH j AS (
    SELECT job_id, name, enabled
    FROM msdb.dbo.sysjobs
    WHERE name LIKE 'Distribution clean up:%'
)
SELECT TOP (10)
       j.name AS job_name,
       CASE j.enabled WHEN 1 THEN 'Enabled' ELSE 'Disabled' END AS job_enabled,
       h.run_date, h.run_time, h.run_duration,
       CASE h.run_status WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded'
                         WHEN 2 THEN 'Retry'  WHEN 3 THEN 'Canceled'
                         ELSE 'Unknown' END AS run_status,
       h.message
FROM msdb.dbo.sysjobhistory h
JOIN j ON h.job_id = j.job_id
ORDER BY h.instance_id DESC;

---------------------------------------------------------------
-- 2) Retention settings on the distribution database
---------------------------------------------------------------
PRINT '--- 2) Distribution retention settings ---';
EXEC sp_helpdistributiondb @database = @distribution_db;

---------------------------------------------------------------
-- 3) Size and row counts of key distribution tables
---------------------------------------------------------------
DECLARE @sql nvarchar(max) =
N'SET NOCOUNT ON;
USE ' + QUOTENAME(@distribution_db) + N';
PRINT ''--- 3) Table sizes and row counts ---'';
IF OBJECT_ID(''dbo.msrepl_commands'') IS NOT NULL EXEC sp_spaceused ''dbo.msrepl_commands'';
IF OBJECT_ID(''dbo.msrepl_transactions'') IS NOT NULL EXEC sp_spaceused ''dbo.msrepl_transactions'';
';
EXEC (@sql);

---------------------------------------------------------------
-- 4) Pending commands per subscription and estimated latency
--    Robust: uses publisher_id if present, else publisher name.
---------------------------------------------------------------
PRINT '--- 4) Pending commands per subscription (all publications) ---';

IF OBJECT_ID('tempdb..#subs') IS NOT NULL DROP TABLE #subs;
IF OBJECT_ID('tempdb..#pending') IS NOT NULL DROP TABLE #pending;

CREATE TABLE #subs (
    publisher_id   int         NULL,
    publisher      sysname     NULL,
    publisher_db   sysname     NOT NULL,
    publication    sysname     NOT NULL
);

-- Detect which columns exist and load #subs accordingly
DECLARE @load_subs nvarchar(max) =
N'
USE ' + QUOTENAME(@distribution_db) + N';

DECLARE @has_pubid bit = CASE WHEN EXISTS (
    SELECT 1 FROM sys.columns
    WHERE object_id = OBJECT_ID(''' + QUOTENAME(@distribution_db) + N'.dbo.MSdistribution_agents'')
      AND name = ''publisher_id'') THEN 1 ELSE 0 END;

DECLARE @has_pubname bit = CASE WHEN EXISTS (
    SELECT 1 FROM sys.columns
    WHERE object_id = OBJECT_ID(''' + QUOTENAME(@distribution_db) + N'.dbo.MSdistribution_agents'')
      AND name = ''publisher'') THEN 1 ELSE 0 END;

IF @has_pubid = 1
BEGIN
    INSERT #subs (publisher_id, publisher, publisher_db, publication)
    SELECT DISTINCT a.publisher_id, NULL, a.publisher_db, a.publication
    FROM dbo.MSdistribution_agents AS a
    JOIN dbo.MSpublications       AS p
      ON a.publisher_db = p.publisher_db
     AND a.publication  = p.publication
    WHERE p.publication IS NOT NULL;
END
ELSE IF @has_pubname = 1
BEGIN
    INSERT #subs (publisher_id, publisher, publisher_db, publication)
    SELECT DISTINCT NULL, a.publisher, a.publisher_db, a.publication
    FROM dbo.MSdistribution_agents AS a
    JOIN dbo.MSpublications       AS p
      ON a.publisher_db = p.publisher_db
     AND a.publication  = p.publication
    WHERE p.publication IS NOT NULL;
END
ELSE
BEGIN
    RAISERROR(''MSdistribution_agents has neither publisher_id nor publisher column.'', 16, 1);
END
';
EXEC (@load_subs);

CREATE TABLE #pending(
    publisher_id            int         NULL,
    publisher               sysname     NULL,
    publisher_db            sysname     NOT NULL,
    publication             sysname     NOT NULL,
    pending_commands        int         NULL,
    est_completion_time_sec int         NULL
);

DECLARE @publisher_id int, @publisher sysname, @publisher_db sysname, @publication sysname;

DECLARE c CURSOR FAST_FORWARD FOR
    SELECT publisher_id, publisher, publisher_db, publication FROM #subs;

OPEN c;
FETCH NEXT FROM c INTO @publisher_id, @publisher, @publisher_db, @publication;

WHILE @@FETCH_STATUS = 0
BEGIN
    BEGIN TRY
        IF @publisher_id IS NOT NULL
        BEGIN
            INSERT #pending(publisher_id, publisher, publisher_db, publication, pending_commands, est_completion_time_sec)
            EXEC sp_replmonitorsubscriptionpendingcmds
                 @publisher_id = @publisher_id,
                 @publisher_db = @publisher_db,
                 @publication  = @publication;
        END
        ELSE
        BEGIN
            INSERT #pending(publisher_id, publisher, publisher_db, publication, pending_commands, est_completion_time_sec)
            EXEC sp_replmonitorsubscriptionpendingcmds
                 @publisher    = @publisher,
                 @publisher_db = @publisher_db,
                 @publication  = @publication;
        END
    END TRY
    BEGIN CATCH
        PRINT 'PendingCmds failed for ' 
              + COALESCE('publisher_id=' + CAST(@publisher_id AS nvarchar(20)),'publisher=' + ISNULL(@publisher,'?'))
              + ' | db=' + ISNULL(@publisher_db,'?')
              + ' | pub=' + ISNULL(@publication,'?')
              + ' : ' + ERROR_MESSAGE();
    END CATCH;

    FETCH NEXT FROM c INTO @publisher_id, @publisher, @publisher_db, @publication;
END
CLOSE c; DEALLOCATE c;

SELECT *
FROM #pending
ORDER BY pending_commands DESC;

---------------------------------------------------------------
-- 5) Active requests and blocking touching msrepl_* or sp_MSget_repl_commands
---------------------------------------------------------------
PRINT '--- 5) Active requests and blocking in the distributor ---';
;WITH R AS (
    SELECT r.session_id,
           r.blocking_session_id,
           r.wait_type,
           r.wait_time,
           r.wait_resource,
           DB_NAME(r.database_id) AS dbname,
           r.cpu_time,
           r.logical_reads,
           r.status,
           r.command,
           r.start_time,
           r.sql_handle,
           r.plan_handle,
           r.statement_start_offset,
           r.statement_end_offset
    FROM sys.dm_exec_requests r
)
SELECT
    R.session_id, R.blocking_session_id, R.dbname, R.status, R.command,
    R.wait_type, R.wait_time, R.wait_resource, R.cpu_time, R.logical_reads, R.start_time,
    SUBSTRING(T.text,
              (R.statement_start_offset/2)+1,
              CASE WHEN R.statement_end_offset = -1
                   THEN LEN(CONVERT(nvarchar(max), T.text)) - (R.statement_start_offset/2) + 1
                   ELSE (R.statement_end_offset - R.statement_start_offset)/2 + 1 END) AS running_stmt,
    T.text AS full_batch_text
FROM R
CROSS APPLY sys.dm_exec_sql_text(R.sql_handle) AS T
WHERE (R.dbname = @distribution_db
       OR T.text LIKE '%msrepl_%'
       OR T.text LIKE '%sp_MSget_repl_commands%')
ORDER BY R.wait_time DESC;

---------------------------------------------------------------
-- 6) Index fragmentation on msrepl_commands / msrepl_transactions
---------------------------------------------------------------
PRINT '--- 6) Index fragmentation on msrepl_commands / msrepl_transactions ---';
SET @sql =
N'USE ' + QUOTENAME(@distribution_db) + N';
SELECT
    OBJECT_NAME(i.object_id) AS table_name,
    i.name AS index_name,
    ips.index_type_desc,
    ips.avg_fragmentation_in_percent,
    ips.page_count
FROM sys.indexes i
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), i.object_id, i.index_id, NULL, ''SAMPLED'') ips
WHERE OBJECT_NAME(i.object_id) IN (''msrepl_commands'',''msrepl_transactions'')
ORDER BY avg_fragmentation_in_percent DESC, page_count DESC;';
EXEC (@sql);

---------------------------------------------------------------
-- 7) Top waits snapshot (server-wide) for context
---------------------------------------------------------------
PRINT '--- 7) Top waits snapshot (server-wide) ---';
SELECT TOP (15)
       wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type NOT LIKE 'SLEEP%'
  AND wait_type NOT LIKE 'BROKER_TASK_STOP%'
ORDER BY wait_time_ms DESC;

PRINT '--- Done ---';


Discover more from SQLyard

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLyard

Subscribe now to keep reading and get access to the full archive.

Continue reading