Understanding PARALLEL REDO TASK in SQL ServerParallel Redo Tasks in SQL Server: What They Are, When To Worry, and How To TroubleshootUnderstanding PARALLEL REDO TASK in SQL Server

If you run Always On Availability Groups or a database is coming online after a crash or restart, you’ll eventually see background sessions with the command PARALLEL REDO TASK. They often show up as “BACKGROUND” and “sleeping” in DMVs. That is normal. These workers replay log records on a secondary replica or during crash recovery so the database can catch up quickly. Microsoft Learn TECHCOMMUNITY.MICROSOFT.COM

Starting with SQL Server 2016, redo can run in parallel. SQL Server 2022 improved this further with a ParallelRedoThreadPool so more databases benefit and the old 100-thread instance limit is no longer the ceiling. TECHCOMMUNITY.MICROSOFT.COM Microsoft Learn

Quick take

Seeing PARALLEL REDO TASK by itself is not a problem.
It becomes a problem when redo queue size grows or stays high, redo rate is low, or user reads on the secondary are blocked or very slow. Microsoft Learn SQLPerformance.com

How to identify what you’re seeing

1) Confirm you’re looking at redo workers

SELECT 
  r.session_id, 
  r.command, 
  r.status, 
  r.wait_type, 
  r.cpu_time, 
  r.reads, 
  r.writes, 
  DB_NAME(r.database_id) AS database_name
FROM sys.dm_exec_requests AS r
WHERE r.command LIKE '%REDO%';  -- PARALLEL REDO TASK, REDO THREAD, etc.

You should see one or more PARALLEL REDO TASK rows. These are system tasks. Leave them alone. Microsoft Learn

2) Check redo health on the replicas

https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-requests-transact-sql?view=sql-server-ver17&utm_source=chatgpt.com

SELECT
  DB_NAME(drs.database_id)        AS db_name,
  rs.role_desc                    AS replica_role,
  drs.synchronization_state_desc,
  drs.is_primary_replica,
  drs.redo_queue_size,            -- KB waiting to be redone
  drs.redo_rate,                  -- KB/sec actually being redone
  drs.log_send_queue_size,        -- KB waiting to be sent
  drs.log_send_rate               -- KB/sec being sent
FROM sys.dm_hadr_database_replica_states AS drs
JOIN sys.dm_hadr_availability_replica_states AS rs
  ON drs.replica_id = rs.replica_id
ORDER BY db_name, replica_role;

redo_queue_size tells you backlog on the secondary.
redo_rate shows throughput.
Compare log_send_queue_size to see whether the bottleneck is sending from the primary or redoing on the secondary. Microsoft Learn Database Administrators Stack Exchange

3) Look at waits that point to the bottleneck

On the secondary:

SELECT TOP (20) * 
FROM sys.dm_os_wait_stats 
WHERE wait_type LIKE 'PARALLEL_REDO%' OR wait_type LIKE 'HADR_%'
ORDER BY wait_time_ms DESC;

On the primary:

SELECT TOP (20) * 
FROM sys.dm_os_wait_stats 
WHERE wait_type LIKE 'HADR_%'
ORDER BY wait_time_ms DESC;

Examples to watch:

PARALLEL_REDO_FLOW_CONTROL, PARALLEL_REDO_DRAIN_WORKER are common and usually benign unless they dominate.
HADR_SYNC_COMMIT stacking on the primary points at synchronous commit acknowledgment delays. SQLSkills Taryn Pivots TECHCOMMUNITY.MICROSOFT.COM

4) Grab perf counters for a second opinion

SELECT counter_name, instance_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Database Replica%'
  AND counter_name IN ('Redo Queue', 'Redone Bytes/sec');

Use this alongside the DMV values. The DMV’s redo_rate is calculated based on active redo time and can differ from the counter. Microsoft Learn

5) Check the error log for context

EXEC xp_readerrorlog 0, 1, 'parallel redo';

You will often see “parallel redo is started” or “parallel redo is shutdown” when a database starts or a backup kicks off. That is expected. Database Administrators Stack Exchange Server Fault

6) If you need deeper detail, sample waits with Extended Events

-- Captures waits on redo threads to a ring buffer
CREATE EVENT SESSION redo_waits ON SERVER
ADD EVENT sqlos.wait_info(
    ACTION(sqlserver.session_id, sqlserver.database_id)
    WHERE (sqlserver.session_id > 0))
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=10MB, TRACK_CAUSALITY=ON);
ALTER EVENT SESSION redo_waits ON SERVER STATE = START;
-- Let it run during the issue window, then:
ALTER EVENT SESSION redo_waits ON SERVER STATE = STOP;

Filter the results for PARALLEL_REDO% waits to see what blocks the redo workers. TECHCOMMUNITY.MICROSOFT.COM

Is it a send problem or a redo problem?

A fast way to decide:

High log_send_queue_size and low redo queue
→ The primary cannot send fast enough. Investigate primary CPU, log generation rate, network, or synchronous commit acks. You will often see HADR_SYNC_COMMIT on the primary in synchronous mode. TECHCOMMUNITY.MICROSOFT.COM
Low log_send_queue_size and high redo queue
→ The secondary is receiving logs but cannot apply them fast enough. Focus on redo rate, secondary I/O and CPU, and read workloads on the secondary that compete with redo. SQLServerCentral

Common causes and what to do

Heavy read workload on the readable secondary
Readers can force the engine to maintain committed versions ahead of redo and can cause waits tied to readable secondaries.
What to try: temporarily switch the secondary to not readable, move the reporting workload, or reduce concurrency to let redo catch up. Watch for waits like HADR_DATABASE_WAIT_FOR. TECHCOMMUNITY.MICROSOFT.COM
Slow I/O on the secondary
Redo writes data pages. Slow data or log storage on the secondary directly slows redo.
What to try: check file latencies, move to faster storage, separate data and log, verify instant file init for data files where applicable. (General HA guidance.) Microsoft Learn
Network latency or bandwidth issues
If the primary is synchronous and network is slow, the primary will stack HADR_SYNC_COMMIT.
What to try: validate throughput and latency, consider Async for noncritical DBs, or place replicas closer. TECHCOMMUNITY.MICROSOFT.COM
Large or long-running transactions creating huge redo backlogs
Big index rebuilds, mass updates, and long open transactions can balloon the redo queue.
What to try: break operations into chunks, schedule maintenance off-peak, or change the affected DB to Async temporarily during bulk ops. (General HA guidance.) Microsoft Learn
Thread limits and version differences
Older versions could hit the 100-thread instance limit for parallel redo and leave some databases on single-threaded redo. SQL Server 2022 introduces a shared ParallelRedoThreadPool that improves fairness.
What to try: plan upgrades where practical. TECHCOMMUNITY.MICROSOFT.COM Microsoft Learn
Bugs fixed in CUs
There have been fixes related to parallel redo in secondary replicas.
What to try: patch to the latest CU for your major version. Microsoft Support

End-to-end troubleshooting checklist

Confirm you are looking at redo workers with sys.dm_exec_requests. Microsoft Learn
Measure backlog and throughput with sys.dm_hadr_database_replica_states. Note redo queue, redo rate, log send queue, and role. Microsoft Learn
Classify the bottleneck using waits on the primary and secondary. Look for HADR_* and PARALLEL_REDO_*. TECHCOMMUNITY.MICROSOFT.COM SQLSkills
Corroborate with perf counters: Database Replica: Redo Queue and Redone Bytes/sec. Microsoft Learn
Check the error log for start or shutdown messages around the same time window. Database Administrators Stack Exchange
Test mitigations: reduce reads on the secondary, verify secondary I/O, validate network, break big transactions, or consider Async for noisy databases. TECHCOMMUNITY.MICROSOFT.COM+1
Consider versioning: apply latest CU and read the release notes for redo fixes. Plan for SQL Server 2022 improvements when feasible. Microsoft Support Microsoft Learn

Example: walking the numbers

You spot multiple PARALLEL REDO TASK rows that are mostly sleeping.

DMV shows redo_queue_size = 3,200,000 KB and redo_rate = 4,000 KB/sec. That is about 800 seconds of backlog if the rate holds, a bit over 13 minutes.
log_send_queue_size is near zero.
Secondary shows PARALLEL_REDO_FLOW_CONTROL as a top wait, and the server’s disk latency for data files is high.

This points to redo being the bottleneck on the secondary, likely due to slow I/O. Move the database to faster storage or relieve competing read workload. Re-check the redo rate after the change. Taryn Pivots

FAQs

Should I kill a PARALLEL REDO TASK session?
No. These are system workers. Killing them does not fix anything and can make recovery slower. Use the steps above to locate the real bottleneck. (General HA guidance referenced throughout.) Microsoft Learn

Why do I see “parallel redo started” and “parallel redo shutdown” messages around backups or startup?
That is expected. The engine initializes the workers, then shuts them down when there is nothing to redo. Database Administrators Stack Exchange Server Fault

Where can I see how many redo threads are in play?
Use the HADR thread DMVs to see AG and per-database thread usage, including redo and parallel redo. Microsoft Learn

References

Microsoft Learn: Overview of Always On Availability Groups; Availability modes. Microsoft Learn+1
Microsoft Learn: sys.dm_hadr_database_replica_states and redo metrics. Microsoft Learn
Microsoft Learn: sys.dm_exec_requests details. Microsoft Learn
Microsoft TechCommunity: AG secondary replica redo model; Troubleshooting REDO queue build-up; Common causes for AG data latency. TECHCOMMUNITY.MICROSOFT.COM+2TECHCOMMUNITY.MICROSOFT.COM+2
SQL Server 2022: ParallelRedoThreadPool improvements. Microsoft Learn
SQLSkills wait library: PARALLEL_REDO_DRAIN_WORKER. SQLSkills
Microsoft KB: Parallel redo fix in secondaries. Microsoft Support

Discover more from SQLyard

Subscribe to get the latest posts sent to your email.

Understanding PARALLEL REDO TASK in SQL ServerParallel Redo Tasks in SQL Server: What They Are, When To Worry, and How To TroubleshootUnderstanding PARALLEL REDO TASK in SQL Server

Quick take

How to identify what you’re seeing

1) Confirm you’re looking at redo workers

2) Check redo health on the replicas

3) Look at waits that point to the bottleneck

4) Grab perf counters for a second opinion

5) Check the error log for context

6) If you need deeper detail, sample waits with Extended Events

Is it a send problem or a redo problem?

Common causes and what to do

End-to-end troubleshooting checklist

Example: walking the numbers

FAQs

References

Like this:

Related

Discover more from SQLyard

Leave a ReplyCancel reply

Sign up to receive email updates, fresh news and more!

Quick take

How to identify what you’re seeing

1) Confirm you’re looking at redo workers

2) Check redo health on the replicas

3) Look at waits that point to the bottleneck

4) Grab perf counters for a second opinion

5) Check the error log for context

6) If you need deeper detail, sample waits with Extended Events

Is it a send problem or a redo problem?

Common causes and what to do

End-to-end troubleshooting checklist

Example: walking the numbers

FAQs

References

Share this:

Like this:

Related

Discover more from SQLyard

Related Posts

Leave a ReplyCancel reply

Discover more from SQLyard