If you run Always On Availability Groups or a database is coming online after a crash or restart, you’ll eventually see background sessions with the command PARALLEL REDO TASK. They often show up as “BACKGROUND” and “sleeping” in DMVs. That is normal. These workers replay log records on a secondary replica or during crash recovery so the database can catch up quickly. Microsoft LearnTECHCOMMUNITY.MICROSOFT.COM
Starting with SQL Server 2016, redo can run in parallel. SQL Server 2022 improved this further with a ParallelRedoThreadPool so more databases benefit and the old 100-thread instance limit is no longer the ceiling. TECHCOMMUNITY.MICROSOFT.COMMicrosoft Learn
Quick take
- Seeing PARALLEL REDO TASK by itself is not a problem.
- It becomes a problem when redo queue size grows or stays high, redo rate is low, or user reads on the secondary are blocked or very slow. Microsoft LearnSQLPerformance.com
How to identify what you’re seeing
1) Confirm you’re looking at redo workers
SELECT
r.session_id,
r.command,
r.status,
r.wait_type,
r.cpu_time,
r.reads,
r.writes,
DB_NAME(r.database_id) AS database_name
FROM sys.dm_exec_requests AS r
WHERE r.command LIKE '%REDO%'; -- PARALLEL REDO TASK, REDO THREAD, etc.
You should see one or more PARALLEL REDO TASK rows. These are system tasks. Leave them alone. Microsoft Learn
2) Check redo health on the replicas
SELECT
DB_NAME(drs.database_id) AS db_name,
rs.role_desc AS replica_role,
drs.synchronization_state_desc,
drs.is_primary_replica,
drs.redo_queue_size, -- KB waiting to be redone
drs.redo_rate, -- KB/sec actually being redone
drs.log_send_queue_size, -- KB waiting to be sent
drs.log_send_rate -- KB/sec being sent
FROM sys.dm_hadr_database_replica_states AS drs
JOIN sys.dm_hadr_availability_replica_states AS rs
ON drs.replica_id = rs.replica_id
ORDER BY db_name, replica_role;
- redo_queue_size tells you backlog on the secondary.
- redo_rate shows throughput.
- Compare log_send_queue_size to see whether the bottleneck is sending from the primary or redoing on the secondary. Microsoft LearnDatabase Administrators Stack Exchange
3) Look at waits that point to the bottleneck
On the secondary:
SELECT TOP (20) *
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PARALLEL_REDO%' OR wait_type LIKE 'HADR_%'
ORDER BY wait_time_ms DESC;
On the primary:
SELECT TOP (20) *
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'HADR_%'
ORDER BY wait_time_ms DESC;
Examples to watch:
PARALLEL_REDO_FLOW_CONTROL,PARALLEL_REDO_DRAIN_WORKERare common and usually benign unless they dominate.HADR_SYNC_COMMITstacking on the primary points at synchronous commit acknowledgment delays. SQLSkillsTaryn PivotsTECHCOMMUNITY.MICROSOFT.COM
4) Grab perf counters for a second opinion
SELECT counter_name, instance_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Database Replica%'
AND counter_name IN ('Redo Queue', 'Redone Bytes/sec');
Use this alongside the DMV values. The DMV’s redo_rate is calculated based on active redo time and can differ from the counter. Microsoft Learn
5) Check the error log for context
EXEC xp_readerrorlog 0, 1, 'parallel redo';
You will often see “parallel redo is started” or “parallel redo is shutdown” when a database starts or a backup kicks off. That is expected. Database Administrators Stack ExchangeServer Fault
6) If you need deeper detail, sample waits with Extended Events
-- Captures waits on redo threads to a ring buffer
CREATE EVENT SESSION redo_waits ON SERVER
ADD EVENT sqlos.wait_info(
ACTION(sqlserver.session_id, sqlserver.database_id)
WHERE (sqlserver.session_id > 0))
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=10MB, TRACK_CAUSALITY=ON);
ALTER EVENT SESSION redo_waits ON SERVER STATE = START;
-- Let it run during the issue window, then:
ALTER EVENT SESSION redo_waits ON SERVER STATE = STOP;
Filter the results for PARALLEL_REDO% waits to see what blocks the redo workers. TECHCOMMUNITY.MICROSOFT.COM
Is it a send problem or a redo problem?
A fast way to decide:
- High log_send_queue_size and low redo queue
→ The primary cannot send fast enough. Investigate primary CPU, log generation rate, network, or synchronous commit acks. You will often seeHADR_SYNC_COMMITon the primary in synchronous mode. TECHCOMMUNITY.MICROSOFT.COM - Low log_send_queue_size and high redo queue
→ The secondary is receiving logs but cannot apply them fast enough. Focus on redo rate, secondary I/O and CPU, and read workloads on the secondary that compete with redo. SQLServerCentral
Common causes and what to do
- Heavy read workload on the readable secondary
Readers can force the engine to maintain committed versions ahead of redo and can cause waits tied to readable secondaries.
What to try: temporarily switch the secondary to not readable, move the reporting workload, or reduce concurrency to let redo catch up. Watch for waits likeHADR_DATABASE_WAIT_FOR. TECHCOMMUNITY.MICROSOFT.COM - Slow I/O on the secondary
Redo writes data pages. Slow data or log storage on the secondary directly slows redo.
What to try: check file latencies, move to faster storage, separate data and log, verify instant file init for data files where applicable. (General HA guidance.) Microsoft Learn - Network latency or bandwidth issues
If the primary is synchronous and network is slow, the primary will stackHADR_SYNC_COMMIT.
What to try: validate throughput and latency, consider Async for noncritical DBs, or place replicas closer. TECHCOMMUNITY.MICROSOFT.COM - Large or long-running transactions creating huge redo backlogs
Big index rebuilds, mass updates, and long open transactions can balloon the redo queue.
What to try: break operations into chunks, schedule maintenance off-peak, or change the affected DB to Async temporarily during bulk ops. (General HA guidance.) Microsoft Learn - Thread limits and version differences
Older versions could hit the 100-thread instance limit for parallel redo and leave some databases on single-threaded redo. SQL Server 2022 introduces a shared ParallelRedoThreadPool that improves fairness.
What to try: plan upgrades where practical. TECHCOMMUNITY.MICROSOFT.COMMicrosoft Learn - Bugs fixed in CUs
There have been fixes related to parallel redo in secondary replicas.
What to try: patch to the latest CU for your major version. Microsoft Support
End-to-end troubleshooting checklist
- Confirm you are looking at redo workers with
sys.dm_exec_requests. Microsoft Learn - Measure backlog and throughput with
sys.dm_hadr_database_replica_states. Note redo queue, redo rate, log send queue, and role. Microsoft Learn - Classify the bottleneck using waits on the primary and secondary. Look for
HADR_*andPARALLEL_REDO_*. TECHCOMMUNITY.MICROSOFT.COMSQLSkills - Corroborate with perf counters:
Database Replica: Redo QueueandRedone Bytes/sec. Microsoft Learn - Check the error log for start or shutdown messages around the same time window. Database Administrators Stack Exchange
- Test mitigations: reduce reads on the secondary, verify secondary I/O, validate network, break big transactions, or consider Async for noisy databases. TECHCOMMUNITY.MICROSOFT.COM+1
- Consider versioning: apply latest CU and read the release notes for redo fixes. Plan for SQL Server 2022 improvements when feasible. Microsoft SupportMicrosoft Learn
Example: walking the numbers
You spot multiple PARALLEL REDO TASK rows that are mostly sleeping.
- DMV shows
redo_queue_size = 3,200,000 KBandredo_rate = 4,000 KB/sec. That is about 800 seconds of backlog if the rate holds, a bit over 13 minutes. log_send_queue_sizeis near zero.- Secondary shows
PARALLEL_REDO_FLOW_CONTROLas a top wait, and the server’s disk latency for data files is high.
This points to redo being the bottleneck on the secondary, likely due to slow I/O. Move the database to faster storage or relieve competing read workload. Re-check the redo rate after the change. Taryn Pivots
FAQs
Should I kill a PARALLEL REDO TASK session?
No. These are system workers. Killing them does not fix anything and can make recovery slower. Use the steps above to locate the real bottleneck. (General HA guidance referenced throughout.) Microsoft Learn
Why do I see “parallel redo started” and “parallel redo shutdown” messages around backups or startup?
That is expected. The engine initializes the workers, then shuts them down when there is nothing to redo. Database Administrators Stack ExchangeServer Fault
Where can I see how many redo threads are in play?
Use the HADR thread DMVs to see AG and per-database thread usage, including redo and parallel redo. Microsoft Learn
References
- Microsoft Learn: Overview of Always On Availability Groups; Availability modes. Microsoft Learn+1
- Microsoft Learn: sys.dm_hadr_database_replica_states and redo metrics. Microsoft Learn
- Microsoft Learn: sys.dm_exec_requests details. Microsoft Learn
- Microsoft TechCommunity: AG secondary replica redo model; Troubleshooting REDO queue build-up; Common causes for AG data latency. TECHCOMMUNITY.MICROSOFT.COM+2TECHCOMMUNITY.MICROSOFT.COM+2
- SQL Server 2022: ParallelRedoThreadPool improvements. Microsoft Learn
- SQLSkills wait library:
PARALLEL_REDO_DRAIN_WORKER. SQLSkills - Microsoft KB: Parallel redo fix in secondaries. Microsoft Support
Discover more from SQLyard
Subscribe to get the latest posts sent to your email.


