🧠 Understanding PARALLEL_REDO_FLOW_CONTROL in SQL Server Always On: What It Means and How to Monitor It

If you’re running SQL Server with Always On Availability Groups, you’ve likely come across the PARALLEL_REDO_FLOW_CONTROL wait type. It shows up when your secondary replica’s redo thread is being throttled — essentially pausing to keep things balanced and prevent overloading the system.

But when should you worry about it? And what can you do about it? Let’s walk through it.


šŸ” What is PARALLEL_REDO_FLOW_CONTROL?

This wait type happens when SQL Server’s redo thread on a secondary replica is intentionally paused to avoid overwhelming I/O or CPU resources. While it’s normal to see some of this, high or growing wait times could point to:

  • Slow disk on the secondary
  • CPU pressure
  • A redo queue backlog
  • Poor Always On architecture design (e.g., misaligned replica roles or slow replication networks)

šŸ›  Query to Monitor It

Run this to check your current wait stats:

SELECT
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    CAST(signal_wait_time_ms * 100.0 / NULLIF(wait_time_ms,0) AS DECIMAL(5,2)) AS signal_wait_percent
FROM sys.dm_os_wait_stats
WHERE wait_type = 'PARALLEL_REDO_FLOW_CONTROL'
ORDER BY wait_time_ms DESC;

What to look at:

  • wait_time_ms: Total accumulated wait
  • signal_wait_time_ms: Time spent waiting for CPU
  • signal_wait_percent: A high percentage (over 30%) suggests CPU pressure

āœ… How to Interpret the Results

MetricGoodConcerning
wait_time_ms< 10,000> 100,000 over time
max_wait_time_ms< 1000> 5000
signal_wait_percent< 5%> 30% = look into CPU

šŸ“ˆ How to Check Redo Queue Size

This query checks the redo queue size in KB:

SELECT 
    ag.name AS ag_name,
    ar.replica_server_name,
    db.database_name,
    drs.redo_queue_size,
    drs.redo_rate,
    drs.redo_queue_size / NULLIF(drs.redo_rate, 0) AS estimated_seconds_to_clear
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_databases_cluster db ON db.group_id = drs.group_id AND db.database_id = drs.database_id
JOIN sys.availability_groups ag ON ag.group_id = drs.group_id
JOIN sys.availability_replicas ar ON ar.replica_id = drs.replica_id
WHERE drs.is_primary_replica = 0;

āš™ļø Recommended Trace Flags

If you’re seeing consistently high redo waits, these trace flags can help enable more aggressive parallel redo:

Trace FlagDescription
9567Enables parallel redo
7451Improves redo parallelism by relaxing blocking rules
3449Ensures redo starts immediately after crash recovery (SQL 2016+)
To enable:

DBCC TRACEON (9567, -1);
DBCC TRACEON (7451, -1);
DBCC TRACEON (3449, -1);

Caution: Test in a non-production environment first, and confirm with your DBA or Microsoft support if you’re on a critical system.


🧰 Extra Tips

  • Ensure secondary replicas have fast storage (ideally SSDs or Azure Premium Disks)
  • Monitor CPU usage — especially if signal waits are creeping up
  • Check that replicas are connected with low-latency networking
  • Don’t overcommit the replica’s resources — Always On is not “free”

🧾 Final Thoughts

A little PARALLEL_REDO_FLOW_CONTROL is normal. But if it’s consistently high, it’s a sign your secondary can’t keep up. The good news is: with the right monitoring, tuning, and maybe a few trace flags, you can bring things under control.


Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading