Understanding PARALLEL REDO TASK in SQL ServerParallel Redo Tasks in SQL Server: What They Are, When To Worry, and How To TroubleshootUnderstanding PARALLEL REDO TASK in SQL Server

If you run Always On Availability Groups or a database is coming online after a crash or restart, you’ll eventually see background sessions with the command PARALLEL REDO TASK. They often show up as “BACKGROUND” and “sleeping” in DMVs. That is normal. These workers replay log records on a secondary replica or during crash recovery so the database can catch up quickly. Microsoft LearnTECHCOMMUNITY.MICROSOFT.COM

Starting with SQL Server 2016, redo can run in parallel. SQL Server 2022 improved this further with a ParallelRedoThreadPool so more databases benefit and the old 100-thread instance limit is no longer the ceiling. TECHCOMMUNITY.MICROSOFT.COMMicrosoft Learn

Quick take

  • Seeing PARALLEL REDO TASK by itself is not a problem.
  • It becomes a problem when redo queue size grows or stays high, redo rate is low, or user reads on the secondary are blocked or very slow. Microsoft LearnSQLPerformance.com

How to identify what you’re seeing

1) Confirm you’re looking at redo workers

SELECT 
  r.session_id, 
  r.command, 
  r.status, 
  r.wait_type, 
  r.cpu_time, 
  r.reads, 
  r.writes, 
  DB_NAME(r.database_id) AS database_name
FROM sys.dm_exec_requests AS r
WHERE r.command LIKE '%REDO%';  -- PARALLEL REDO TASK, REDO THREAD, etc.

You should see one or more PARALLEL REDO TASK rows. These are system tasks. Leave them alone. Microsoft Learn

2) Check redo health on the replicas

https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-requests-transact-sql?view=sql-server-ver17&utm_source=chatgpt.com

SELECT
  DB_NAME(drs.database_id)        AS db_name,
  rs.role_desc                    AS replica_role,
  drs.synchronization_state_desc,
  drs.is_primary_replica,
  drs.redo_queue_size,            -- KB waiting to be redone
  drs.redo_rate,                  -- KB/sec actually being redone
  drs.log_send_queue_size,        -- KB waiting to be sent
  drs.log_send_rate               -- KB/sec being sent
FROM sys.dm_hadr_database_replica_states AS drs
JOIN sys.dm_hadr_availability_replica_states AS rs
  ON drs.replica_id = rs.replica_id
ORDER BY db_name, replica_role;

3) Look at waits that point to the bottleneck

On the secondary:

SELECT TOP (20) * 
FROM sys.dm_os_wait_stats 
WHERE wait_type LIKE 'PARALLEL_REDO%' OR wait_type LIKE 'HADR_%'
ORDER BY wait_time_ms DESC;

On the primary:

SELECT TOP (20) * 
FROM sys.dm_os_wait_stats 
WHERE wait_type LIKE 'HADR_%'
ORDER BY wait_time_ms DESC;

Examples to watch:

  • PARALLEL_REDO_FLOW_CONTROL, PARALLEL_REDO_DRAIN_WORKER are common and usually benign unless they dominate.
  • HADR_SYNC_COMMIT stacking on the primary points at synchronous commit acknowledgment delays. SQLSkillsTaryn PivotsTECHCOMMUNITY.MICROSOFT.COM

4) Grab perf counters for a second opinion

SELECT counter_name, instance_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Database Replica%'
  AND counter_name IN ('Redo Queue', 'Redone Bytes/sec');

Use this alongside the DMV values. The DMV’s redo_rate is calculated based on active redo time and can differ from the counter. Microsoft Learn

5) Check the error log for context

EXEC xp_readerrorlog 0, 1, 'parallel redo';

You will often see “parallel redo is started” or “parallel redo is shutdown” when a database starts or a backup kicks off. That is expected. Database Administrators Stack ExchangeServer Fault

6) If you need deeper detail, sample waits with Extended Events

-- Captures waits on redo threads to a ring buffer
CREATE EVENT SESSION redo_waits ON SERVER
ADD EVENT sqlos.wait_info(
    ACTION(sqlserver.session_id, sqlserver.database_id)
    WHERE (sqlserver.session_id > 0))
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=10MB, TRACK_CAUSALITY=ON);
ALTER EVENT SESSION redo_waits ON SERVER STATE = START;
-- Let it run during the issue window, then:
ALTER EVENT SESSION redo_waits ON SERVER STATE = STOP;

Filter the results for PARALLEL_REDO% waits to see what blocks the redo workers. TECHCOMMUNITY.MICROSOFT.COM

Is it a send problem or a redo problem?

A fast way to decide:

  • High log_send_queue_size and low redo queue
    → The primary cannot send fast enough. Investigate primary CPU, log generation rate, network, or synchronous commit acks. You will often see HADR_SYNC_COMMIT on the primary in synchronous mode. TECHCOMMUNITY.MICROSOFT.COM
  • Low log_send_queue_size and high redo queue
    → The secondary is receiving logs but cannot apply them fast enough. Focus on redo rate, secondary I/O and CPU, and read workloads on the secondary that compete with redo. SQLServerCentral

Common causes and what to do

  1. Heavy read workload on the readable secondary
    Readers can force the engine to maintain committed versions ahead of redo and can cause waits tied to readable secondaries.
    What to try: temporarily switch the secondary to not readable, move the reporting workload, or reduce concurrency to let redo catch up. Watch for waits like HADR_DATABASE_WAIT_FOR. TECHCOMMUNITY.MICROSOFT.COM
  2. Slow I/O on the secondary
    Redo writes data pages. Slow data or log storage on the secondary directly slows redo.
    What to try: check file latencies, move to faster storage, separate data and log, verify instant file init for data files where applicable. (General HA guidance.) Microsoft Learn
  3. Network latency or bandwidth issues
    If the primary is synchronous and network is slow, the primary will stack HADR_SYNC_COMMIT.
    What to try: validate throughput and latency, consider Async for noncritical DBs, or place replicas closer. TECHCOMMUNITY.MICROSOFT.COM
  4. Large or long-running transactions creating huge redo backlogs
    Big index rebuilds, mass updates, and long open transactions can balloon the redo queue.
    What to try: break operations into chunks, schedule maintenance off-peak, or change the affected DB to Async temporarily during bulk ops. (General HA guidance.) Microsoft Learn
  5. Thread limits and version differences
    Older versions could hit the 100-thread instance limit for parallel redo and leave some databases on single-threaded redo. SQL Server 2022 introduces a shared ParallelRedoThreadPool that improves fairness.
    What to try: plan upgrades where practical. TECHCOMMUNITY.MICROSOFT.COMMicrosoft Learn
  6. Bugs fixed in CUs
    There have been fixes related to parallel redo in secondary replicas.
    What to try: patch to the latest CU for your major version. Microsoft Support

End-to-end troubleshooting checklist

  1. Confirm you are looking at redo workers with sys.dm_exec_requests. Microsoft Learn
  2. Measure backlog and throughput with sys.dm_hadr_database_replica_states. Note redo queue, redo rate, log send queue, and role. Microsoft Learn
  3. Classify the bottleneck using waits on the primary and secondary. Look for HADR_* and PARALLEL_REDO_*. TECHCOMMUNITY.MICROSOFT.COMSQLSkills
  4. Corroborate with perf counters: Database Replica: Redo Queue and Redone Bytes/sec. Microsoft Learn
  5. Check the error log for start or shutdown messages around the same time window. Database Administrators Stack Exchange
  6. Test mitigations: reduce reads on the secondary, verify secondary I/O, validate network, break big transactions, or consider Async for noisy databases. TECHCOMMUNITY.MICROSOFT.COM+1
  7. Consider versioning: apply latest CU and read the release notes for redo fixes. Plan for SQL Server 2022 improvements when feasible. Microsoft SupportMicrosoft Learn

Example: walking the numbers

You spot multiple PARALLEL REDO TASK rows that are mostly sleeping.

  • DMV shows redo_queue_size = 3,200,000 KB and redo_rate = 4,000 KB/sec. That is about 800 seconds of backlog if the rate holds, a bit over 13 minutes.
  • log_send_queue_size is near zero.
  • Secondary shows PARALLEL_REDO_FLOW_CONTROL as a top wait, and the server’s disk latency for data files is high.

This points to redo being the bottleneck on the secondary, likely due to slow I/O. Move the database to faster storage or relieve competing read workload. Re-check the redo rate after the change. Taryn Pivots


FAQs

Should I kill a PARALLEL REDO TASK session?
No. These are system workers. Killing them does not fix anything and can make recovery slower. Use the steps above to locate the real bottleneck. (General HA guidance referenced throughout.) Microsoft Learn

Why do I see “parallel redo started” and “parallel redo shutdown” messages around backups or startup?
That is expected. The engine initializes the workers, then shuts them down when there is nothing to redo. Database Administrators Stack ExchangeServer Fault

Where can I see how many redo threads are in play?
Use the HADR thread DMVs to see AG and per-database thread usage, including redo and parallel redo. Microsoft Learn


References


Discover more from SQLyard

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLyard

Subscribe now to keep reading and get access to the full archive.

Continue reading