SQL Server Always On Availability Groups: The Complete DBA Guide

SQL Server Always On Availability Groups: The Complete DBA Guide – SQLYARD

SQL Server Always On Availability Groups: The Complete DBA Guide


Always On Availability Groups is the most capable and most complex high availability feature SQL Server offers. Done right, it gives you zero-data-loss automatic failover, readable secondaries for offloading reporting and backups, disaster recovery replicas across data centers or cloud regions, and — since SQL Server 2022 — automatic replication of logins, jobs, and permissions through Contained AGs.

Done wrong, it gives you plan cache issues from index maintenance, read routing that silently falls back to the primary, DNS misconfigurations that break failover, and redo queues that grow under reporting load until your RPO is measured in hours not seconds.

This guide covers every layer: topology planning and the sync vs async decision, step-by-step configuration from WSFC through listener and read-only routing, the real-world challenges of the 1-sync-2-async topology, index maintenance strategy for readable secondaries, CNAME vs listener architecture, and current deployment patterns for on-premises, cloud, and hybrid environments.

Compatibility: Core AG functionality covered here applies to SQL Server 2014 and later. SQL Server 2016+ features (load-balanced read routing, WSFC-free read-scale AGs) and SQL Server 2022+ features (Contained AGs, parallel redo pool improvements, TDS 8.0) are called out with version badges inline. SQL Server 2025 additions are noted where research confirmed them.

1 What Always On AGs Do — and What They Don’t

An Availability Group is a database-level HA and DR solution. You group one or more user databases together, and SQL Server continuously ships transaction log records from the primary replica to up to eight secondary replicas. The secondaries apply those records through a redo thread, staying synchronized with the primary.

The listener — a virtual network name backed by a cluster IP resource — provides a single connection endpoint for applications. When a failover occurs, the listener’s IP moves to the new primary node and client reconnections are automatically directed to the correct server. Applications do not need to know which physical server is the current primary.

What AGs Protect

  • User databases that are explicitly added to the AG
  • Database-level failover — all databases in the AG fail over as a unit
  • Data in those databases through log shipping and redo
  • With Contained AGs (SQL Server 2022+): logins, SQL Agent jobs, and permissions stored in the contained master and msdb

What AGs Do Not Protect

  • System databases (master, msdb, model, tempdb) — these remain per-instance, not per-AG, unless using Contained AGs
  • SQL Agent jobs on non-contained AGs — must be manually scripted to all replicas and made AG-aware
  • Logins on non-contained AGs — must be manually synchronized with matching SIDs
  • Linked servers, credentials, and server-level configurations — must be manually replicated
  • Instance-level settings — each replica is a separate, independent SQL Server instance

Enterprise Edition requirement: Full AG functionality — multiple databases per AG, more than two replicas, readable secondaries — requires Enterprise Edition. Standard Edition supports Basic Availability Groups (1 database, 2 replicas, no readable secondary). SQL Server 2025 raises the Standard Edition hardware ceiling to 32 cores and 256 GB RAM, making AGs on Standard Edition more viable for mid-sized workloads, but the feature restrictions remain.

2 Prerequisites: WSFC, Networking, Editions, and Quorum

Windows Server Failover Cluster (WSFC)

Every node hosting an AG replica must be a member of the same WSFC. The cluster provides quorum voting, failover detection, and the network name infrastructure that the listener depends on. WSFC does not require shared storage for AGs — each replica has its own independent storage — but it does require:

  • All nodes in the same Active Directory domain
  • The cluster service account having Create Computer Objects permission in AD (for creating the listener computer object)
  • Reliable low-latency networking between nodes — dedicated heartbeat network recommended
  • Firewall rules open for the mirroring endpoint port (default 5022), cluster port (3343), and listener port (typically 1433)

Quorum Configuration

Quorum determines whether the cluster has enough healthy members to remain operational. The fundamental rule: always design for an odd number of votes. Two-node clusters without a witness have a split-brain risk — if the two nodes lose communication, neither can be certain the other is down, so neither will take primary ownership. Add a file share witness or cloud witness as a third vote.

Cluster SizeRecommended QuorumNotes
2 nodesNode + File Share Witness or Cloud WitnessEssential — without witness, loss of one node loses quorum
3 nodesNode Majority (no witness needed)Any two nodes can maintain quorum
4 nodesNode + File Share Witness or Node MajorityWitness gives 3-of-5 votes if one node is in DR site
Cross-site (2+2 nodes)Node Majority + File Share Witness at primary siteDR site nodes should have 0 votes to prevent split-site scenario

Remove votes from geographically remote nodes. If a DR site node participates in quorum voting and the WAN link between sites goes down, the DR site node may believe the primary site is down and refuse to participate — causing your primary site cluster to lose quorum even though the primary site is perfectly healthy. Set the vote weight of geographically remote nodes to zero using PowerShell: (Get-ClusterNode "DRNode").NodeWeight = 0

Mirroring Endpoints

Each replica must have a database mirroring endpoint — the TCP listener that AG uses to ship log records between replicas. This is separate from the SQL Server listener port.

-- Create mirroring endpoint on each replica (run on each instance)
-- Use a port not in use -- 5022 is the default convention
CREATE ENDPOINT [Hadr_Endpoint]
    STATE = STARTED
    AS TCP (LISTENER_PORT = 5022)
    FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES);
GO

-- Grant connect permission to the SQL Server service account
-- (replace with your actual service account)
GRANT CONNECT ON ENDPOINT::Hadr_Endpoint TO [DOMAIN\SQLServiceAccount];
GO

-- Verify endpoint is running
SELECT name, state_desc, type_desc
FROM sys.endpoints
WHERE type_desc = 'DATABASE_MIRRORING';

3 Synchronous vs Asynchronous Commit — Making the Right Decision

This single decision has more impact on your AG’s behavior, performance, and protection level than any other configuration choice.

Synchronous Commit

The primary replica waits for the secondary to confirm it has hardened (written to disk) the log record before acknowledging the transaction to the client. Zero data loss on failover is guaranteed — the secondary has everything the primary had at the moment of failure. Automatic failover is only possible with synchronous replicas. The cost: every write transaction must round-trip to the synchronous secondary before the client gets a commit acknowledgment. On a well-designed network with low latency between replicas, this overhead is typically 1–5 ms per transaction — acceptable for most OLTP workloads. On a high-latency WAN link, it can add tens of milliseconds per transaction, which is why synchronous commit is only appropriate for replicas within the same data center or connected by a dedicated low-latency network.

Asynchronous Commit

The primary commits the transaction immediately after writing to its own local log, without waiting for the secondary to acknowledge receipt. Maximum performance with no commit overhead — the primary is not affected by secondary latency, secondary performance, or secondary availability. The cost: if the primary fails before the secondary has received and hardened recent log records, those transactions are lost. Forced manual failover to an async replica always has potential data loss. Asynchronous replicas cannot participate in automatic failover.

AttributeSynchronousAsynchronous
Data loss on failoverZero (guaranteed)Possible — depends on redo lag
Automatic failoverYesNo — manual only
Write latency impactRound-trip to secondary per commitNone — primary not affected
Network requirementLow latency required (<5ms ideal)High latency tolerated
Best use caseLocal HA — same data centerDR — remote data center or cloud
Max sync replicasUp to 5 synchronous replicas (including primary) per AG

SQL Server 2025 addition: Microsoft introduced a configurable AG Commit Time setting — previously the primary waited a hardcoded 10 ms for synchronous acknowledgment before timing out. In SQL Server 2025 this value is configurable at the server level, allowing fine-tuning for environments with consistent sub-10ms network latency where the default timeout was unnecessarily generous.

4 The 1-Sync / 2-Async Topology — Pros, Cons, and Real-World Lessons

A common production topology that SQLYARD has encountered is: one synchronous secondary in the same data center for local HA with automatic failover, plus two asynchronous secondaries — one in a remote DR data center and one dedicated to read workloads — each in asynchronous commit mode. This is a powerful setup but it comes with specific operational challenges that are not obvious until you have lived with it.

┌─────────────────────────────────────────────────────────────┐ │ AVAILABILITY GROUP │ │ │ │ PRIMARY SYNC SECONDARY ASYNC-1 ASYNC-2 │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │ │ PROD-DB1 │◄────►│ PROD-DB2 │ │ DR-DB1 │ │ RPT-DB │ │ │ │(R/W) │ sync │(standby) │───►│(DR site) │ │(reads) │ │ │ └──────────┘ └──────────┘async└──────────┘ └────────┘ │ │ │ │ Listener: AG-LISTENER (always follows primary) │ └─────────────────────────────────────────────────────────────┘ Commit flow (synchronous): Primary ──log──► Sync Secondary ──ACKED──► Primary commits ──► Client Commit flow (asynchronous): Primary commits ──► Client Primary ──log──► Async-1, Async-2 (independently, no wait)

✓ Advantages

  • Local HA with automatic failover and zero data loss via sync secondary
  • DR protection in a geographically separate location via async secondary
  • Read workloads offloaded to dedicated async read replica — primary unaffected
  • Backups can be offloaded to any secondary, eliminating backup I/O from primary
  • Three independent copies of the data for corruption protection
  • Async replicas do not add commit latency to the primary

✗ Challenges

  • Index maintenance is log-intensive and must run on primary — creates massive redo work on ALL secondaries simultaneously
  • The read-only replica (async) may have stale indexes because it applies whatever the primary had, not what reporting queries need
  • Indexes added for reporting queries on the secondary must be added to the primary first — whether or not they make sense there
  • Long-running reporting queries on the async secondary can block the redo thread, causing redo lag
  • Async replicas can fall behind during maintenance windows — forced failover may have significant data loss
  • Four SQL Server instances to license, patch, and maintain
Real-world lesson — Index maintenance in the 1-sync/2-async topology In a previous environment with exactly this topology, index maintenance was the most significant operational challenge. An index rebuild on a large table on the primary generates hundreds of GB of log records that must be replicated to all three secondaries. The synchronous secondary must complete the redo operation before the log can be truncated on the primary — which means a large index rebuild holds the log open on the primary for the duration of redo on the synchronous secondary. The async read replica would fall behind during maintenance windows, sometimes by hours. More critically: indexes added specifically to support reporting queries on the read replica had to be created on the primary first, even if they had no benefit there — and in some cases actively degraded primary write performance by adding more index maintenance overhead. The solution was a careful audit of every index on the primary: classify each as “required by OLTP queries,” “required by reporting queries only,” or “required by both.” Indexes required only for reporting were documented for impact review. Some were created as filtered indexes or with low fill factors to minimize OLTP impact. Others were deferred until a maintenance window where the async read replica could absorb the redo work during off-peak hours.

Index Strategy for a Read-Only Secondary

Every index that exists on the primary is automatically available on the secondary. But the secondary cannot have indexes that the primary does not have. This creates a fundamental constraint: the index design for your read workload is bounded by what your write workload will tolerate.

  • Before adding an index for a reporting query: test the impact on the primary’s write performance, the log generation rate, and the redo time on the sync secondary.
  • Consider the redo thread’s capacity. A busy read secondary under heavy reporting load can slow its own redo thread — reporting queries can block DDL redo operations. If the redo thread falls behind, the RPO for DR failover degrades silently.
  • For synchronous commit environments: Microsoft recommends temporarily switching synchronous replicas to asynchronous before running large index rebuilds, then switching back. This prevents the rebuild from holding transactions open on the primary while waiting for the sync secondary’s redo to complete.
  • Use Ola Hallengren’s IndexOptimize. It supports AG-aware execution, automatically skipping operations that have no benefit (rebuild vs reorganize based on fragmentation threshold) and can limit log generation per maintenance window.
-- Temporarily convert sync secondary to async before a large index rebuild
-- Run on primary
ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'SYNC-SECONDARY-NODE'
WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);

-- Run the index rebuild
ALTER INDEX ALL ON dbo.LargeTable REBUILD
WITH (ONLINE = ON, MAXDOP = 4, DATA_COMPRESSION = PAGE);

-- Switch back to synchronous after redo has caught up on the secondary
ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'SYNC-SECONDARY-NODE'
WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);

-- Verify redo has caught up before switching back
SELECT
    ar.replica_server_name,
    drs.synchronization_state_desc,
    drs.redo_queue_size,
    drs.redo_rate
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_database_replica_states drs
    ON ar.replica_id = drs.replica_id
WHERE ar.group_id IN (
    SELECT group_id FROM sys.availability_groups WHERE name = 'YourAG'
);

5 Enabling Always On and Creating the AG

Step 1: Enable Always On on Each Instance

-- Method 1: SQL Server Configuration Manager (recommended -- requires service restart)
-- SQL Server Services → SQL Server (MSSQLSERVER) → Properties → Always On High Availability
-- Check "Enable Always On Availability Groups"

-- Method 2: PowerShell
Enable-SqlAlwaysOn -ServerInstance "PROD-DB1" -Force
-- Force flag restarts the SQL Server service immediately

-- Method 3: T-SQL (requires restart to take effect)
-- Not available -- must use SSCM or PowerShell for this step

-- Verify Always On is enabled
SELECT SERVERPROPERTY('IsHadrEnabled') AS [Always On Enabled];

Step 2: Prepare the Databases

-- Requirements before adding a database to an AG:
-- 1. Full recovery model
-- 2. At least one full backup taken after setting full recovery

-- Set full recovery
ALTER DATABASE [YourDatabase] SET RECOVERY FULL;

-- Take a full backup (required before adding to AG)
BACKUP DATABASE [YourDatabase]
TO DISK = N'\\BackupShare\YourDatabase_FULL.bak'
WITH COMPRESSION, CHECKSUM, STATS = 10;

-- If using automatic seeding (SQL Server 2016+), no manual backup/restore to secondary needed
-- If using manual seeding: restore WITH NORECOVERY on all secondary replicas before joining

Step 3: Create the Availability Group (T-SQL)

-- Run on PRIMARY replica
-- Adjust replica names, endpoint URLs, and IP addresses for your environment
CREATE AVAILABILITY GROUP [YourAG]
WITH (
    AUTOMATED_BACKUP_PREFERENCE = SECONDARY,  -- prefer backups on secondary
    DB_FAILOVER = ON,                          -- fail over if any AG DB goes unhealthy
    DTC_SUPPORT = NONE                         -- set to PER_DB if distributed transactions needed
)
FOR DATABASE [YourDatabase]           -- add more databases: [DB1], [DB2]
REPLICA ON
    -- PRIMARY replica
    N'PROD-DB1' WITH (
        ENDPOINT_URL          = N'TCP://PROD-DB1.domain.com:5022',
        AVAILABILITY_MODE     = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE         = AUTOMATIC,
        SEEDING_MODE          = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
    ),
    -- SYNCHRONOUS SECONDARY (local HA)
    N'PROD-DB2' WITH (
        ENDPOINT_URL          = N'TCP://PROD-DB2.domain.com:5022',
        AVAILABILITY_MODE     = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE         = AUTOMATIC,
        SEEDING_MODE          = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
    ),
    -- ASYNCHRONOUS SECONDARY #1 (DR data center)
    N'DR-DB1' WITH (
        ENDPOINT_URL          = N'TCP://DR-DB1.domain.com:5022',
        AVAILABILITY_MODE     = ASYNCHRONOUS_COMMIT,
        FAILOVER_MODE         = MANUAL,
        SEEDING_MODE          = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
    ),
    -- ASYNCHRONOUS SECONDARY #2 (read scale / reporting)
    N'RPT-DB1' WITH (
        ENDPOINT_URL          = N'TCP://RPT-DB1.domain.com:5022',
        AVAILABILITY_MODE     = ASYNCHRONOUS_COMMIT,
        FAILOVER_MODE         = MANUAL,
        SEEDING_MODE          = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
    );
GO

-- Run on each SECONDARY replica to join
ALTER AVAILABILITY GROUP [YourAG] JOIN;
ALTER AVAILABILITY GROUP [YourAG] GRANT CREATE ANY DATABASE;  -- for automatic seeding
GO

Step 4: Verify Synchronization

-- Check AG health and synchronization state
SELECT
    ag.name                             AS [AG Name],
    ar.replica_server_name              AS [Replica],
    ar.availability_mode_desc           AS [Commit Mode],
    ar.failover_mode_desc               AS [Failover Mode],
    ars.role_desc                       AS [Role],
    ars.synchronization_health_desc     AS [Sync Health],
    ars.connected_state_desc            AS [Connected],
    drs.synchronization_state_desc      AS [DB Sync State],
    drs.redo_queue_size                 AS [Redo Queue (KB)],
    drs.log_send_queue_size             AS [Send Queue (KB)]
FROM sys.availability_groups              ag
JOIN sys.availability_replicas            ar  ON ar.group_id   = ag.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ars.replica_id = ar.replica_id
JOIN sys.dm_hadr_database_replica_states  drs ON drs.replica_id = ar.replica_id
ORDER BY ag.name, ars.role_desc, ar.replica_server_name;

6 Configuring the AG Listener

The listener is a virtual network name (VNN) registered in DNS and owned as a Windows cluster resource. It provides the single connection endpoint that clients use. When a failover occurs, the cluster moves the IP resource to the new primary node and updates the DNS registration — clients reconnecting after a failover automatically reach the new primary.

15-character DNS name limit. NetBIOS recognizes only the first 15 characters of the listener DNS name. If you have multiple WSFC clusters in the same domain, names with the same first 15 characters will conflict. Plan listener names carefully in multi-AG environments. SSMS limits the name to 15 characters in the GUI. T-SQL allows up to 63 characters but NetBIOS still sees only the first 15.

-- Create listener via T-SQL (run on primary replica)
-- Specify a static IP address -- never use DHCP for production listeners
ALTER AVAILABILITY GROUP [YourAG]
ADD LISTENER N'YourAGListener' (
    WITH IP (
        (N'192.168.1.100', N'255.255.255.0')  -- primary site IP
        -- Add a second IP for DR/multi-subnet:
        -- ,(N'10.0.2.100', N'255.255.255.0')
    ),
    PORT = 1433
);
GO

-- For multi-subnet: specify one IP per subnet
-- The cluster registers all IPs in DNS; MultiSubnetFailover=True in connection string
-- allows the client driver to try all IPs in parallel during failover

-- Verify listener
SELECT dns_name, port, ip_configuration_string
FROM sys.availability_group_listeners
WHERE group_id IN (
    SELECT group_id FROM sys.availability_groups WHERE name = 'YourAG'
);

RegisterAllProvidersIP

When a listener is created via SSMS or T-SQL (not Failover Cluster Manager), the RegisterAllProvidersIP property is set to 1, which registers all listener IP addresses in DNS simultaneously. This is required for MultiSubnetFailover=True to work correctly — clients try all IPs in parallel on failover. If you have legacy clients that cannot use MultiSubnetFailover=True, set RegisterAllProvidersIP = 0 so only the active IP is registered.

-- Check and set RegisterAllProvidersIP (run in PowerShell on any cluster node)
Import-Module FailoverClusters
Get-ClusterResource "YourAGListener" | Get-ClusterParameter

# Set to 0 for legacy clients without MultiSubnetFailover support
Get-ClusterResource "YourAGListener" |
    Set-ClusterParameter RegisterAllProvidersIP 0

# Restart the listener resource to apply
Stop-ClusterResource "YourAGListener"
Start-ClusterResource "YourAGListener"

-- Also consider reducing HostRecordTTL for faster DNS failover for legacy clients:
Get-ClusterResource "YourAGListener" |
    Set-ClusterParameter HostRecordTTL 30  # 30 seconds (default is 1200 = 20 minutes)

7 Configuring Read-Only Routing and ApplicationIntent

Read-only routing (ROR) enables the AG listener to automatically redirect connections that specify ApplicationIntent=ReadOnly in their connection string to a configured readable secondary replica. Without ROR configured, ApplicationIntent=ReadOnly connections through the listener go to the primary by default.

ROR requires two things configured on each replica: a READ_ONLY_ROUTING_URL (the TCP address ROR uses to redirect to this replica) and a READ_ONLY_ROUTING_LIST (the ordered priority list of where to send read-intent connections when this replica is primary).

Read-Only Routing URL Calculation

The routing URL must include the replica’s fully qualified domain name and port — exactly as the client driver will use to make the redirected connection. Use the FQDN, not just the machine name, to avoid DNS resolution issues across subnets.

-- Configure read-only routing (run on primary replica)
-- Set routing URL for each replica (the address ROR redirects TO for that replica)
ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'PROD-DB1' WITH
(SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://PROD-DB1.domain.com:1433'));

ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'PROD-DB2' WITH
(SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://PROD-DB2.domain.com:1433'));

ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'RPT-DB1' WITH
(SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://RPT-DB1.domain.com:1433'));

-- Set the routing LIST for each replica when it is acting as PRIMARY
-- This is an ordered priority list: first available replica in the list gets the connection
-- Configure for BOTH replicas that could become primary (after any failover)

-- When PROD-DB1 is primary: prefer RPT-DB1 for reads, fall back to PROD-DB2, then PROD-DB1 itself
ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'PROD-DB1' WITH
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = ('RPT-DB1','PROD-DB2','PROD-DB1')));

-- When PROD-DB2 is primary (after failover): prefer RPT-DB1, fall back to PROD-DB1, then PROD-DB2
ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'PROD-DB2' WITH
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = ('RPT-DB1','PROD-DB1','PROD-DB2')));
GO

Load-Balanced Routing Across Multiple Secondaries SQL Server 2016+

SQL Server 2016 introduced load-balanced read routing. Instead of always sending read-intent connections to the first available replica in the list, you can specify a set of replicas in parentheses — SQL Server distributes connections across them in round-robin order.

-- Load-balanced routing across two read replicas (SQL Server 2016+)
-- When PROD-DB1 is primary: distribute reads between RPT-DB1 and PROD-DB2 evenly
-- Fall back to PROD-DB1 itself if both are unavailable
ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'PROD-DB1' WITH
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = (('RPT-DB1','PROD-DB2'), 'PROD-DB1')));
-- The parenthesized set = load-balanced pool; outer list = failover order
GO

Verify Read-Only Routing Is Working

-- Test from SQLCMD: connect to listener with -K ReadOnly flag
-- sqlcmd -S YourAGListener -d YourDatabase -K ReadOnly -Q "SELECT @@SERVERNAME"
-- Result should be the secondary server name, not the primary

-- Verify routing configuration in T-SQL
SELECT
    src.replica_server_name  AS [Primary Replica (when)],
    dest.replica_server_name AS [Routes Reads To],
    dest.read_only_routing_url,
    rl.routing_priority
FROM sys.availability_read_only_routing_lists rl
JOIN sys.availability_replicas src  ON rl.replica_id          = src.replica_id
JOIN sys.availability_replicas dest ON rl.read_only_replica_id = dest.replica_id
JOIN sys.availability_groups   ag   ON ag.group_id             = src.group_id
WHERE ag.name = 'YourAG'
ORDER BY src.replica_server_name, rl.routing_priority;

-- Confirm your current connection is on the secondary after connecting with ReadOnly intent
SELECT
    @@SERVERNAME                           AS [Connected Server],
    sys.fn_hadr_is_primary_replica(DB_NAME()) AS [Is Primary (0=secondary)],
    DB_NAME()                              AS [Database];

8 Readable Secondary Settings — Read-Intent Only vs Yes

When configuring a secondary replica’s readable access, there are three options:

SettingWho Can ReadBest For
ALLOW_CONNECTIONS = NONE No connections allowed — failover standby only Pure HA standby where you do not want any read activity
ALLOW_CONNECTIONS = READ_ONLY Only connections specifying ApplicationIntent=ReadOnly Read-intent only — recommended for production. Enforces that only intentional read connections reach the secondary. Protects against accidental write attempts and SSMS ad-hoc queries consuming resources.
ALLOW_CONNECTIONS = ALL Any connection, including those without ApplicationIntent=ReadOnly Useful when applications cannot be easily updated to use ApplicationIntent, or for direct SSMS connections to the secondary replica’s server name (not through the listener)

Recommendation: Use READ_ONLY (Read-Intent Only) for all production readable secondaries. This ensures that only connections explicitly declaring read-only intent reach the secondary through the listener. It also prevents accidental DML attempts from developers who connect via the listener without specifying intent — they will receive an error that makes the restriction clear rather than silently routing to the primary.

-- Set readable secondary access
ALTER AVAILABILITY GROUP [YourAG]
MODIFY REPLICA ON N'RPT-DB1' WITH
(SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));

-- Verify
SELECT
    ar.replica_server_name,
    ar.secondary_role_allow_connections_desc
FROM sys.availability_replicas ar
JOIN sys.availability_groups ag ON ag.group_id = ar.group_id
WHERE ag.name = 'YourAG';

9 Contained Availability Groups SQL Server 2022+

One of the most significant operational improvements in SQL Server 2022. Contained AGs solve the longstanding pain point of having to manually synchronize logins, SQL Agent jobs, permissions, and linked servers to every replica every time something changes. With a Contained AG, those objects — created in the context of the AG — are automatically replicated as part of the AG itself.

How It Works

A Contained AG creates its own copies of master and msdb — named AgName_master and AgName_msdb — that are replicated alongside user databases. When you connect to the Contained AG’s listener and create a login or an Agent job, those objects are created in the AG-scoped system databases and automatically synchronized to all replicas. A failover carries those objects with it — no manual scripting, no SID mismatch, no missed logins.

-- Create a Contained AG (SQL Server 2022+)
-- The WITH (CONTAINED) clause enables this feature
CREATE AVAILABILITY GROUP [YourContainedAG]
WITH (
    CONTAINED,
    AUTOMATED_BACKUP_PREFERENCE = SECONDARY,
    DB_FAILOVER = ON
)
FOR DATABASE [YourDatabase]
REPLICA ON
    N'PROD-DB1' WITH (
        ENDPOINT_URL      = N'TCP://PROD-DB1.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE     = AUTOMATIC,
        SEEDING_MODE      = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
    ),
    N'PROD-DB2' WITH (
        ENDPOINT_URL      = N'TCP://PROD-DB2.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE     = AUTOMATIC,
        SEEDING_MODE      = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
    );
GO

-- After creation, connect via the AG LISTENER to create contained objects
-- Objects created via the listener are AG-scoped and replicate automatically

-- Example: Create a login via the listener (AG-scoped login -- will replicate)
-- Connect to the listener first, then:
CREATE LOGIN [ReportingUser] WITH PASSWORD = N'StrongPassword!2026';
-- This login now exists on all replicas and survives failover

-- Verify contained AG databases
SELECT name, database_id, is_contained_availability_database
FROM sys.databases
WHERE name LIKE '%YourContainedAG%';

Contained AG Limitations to Know

  • Enterprise Edition only — not available in Standard Edition
  • No job offloading to secondaries — jobs inside the contained msdb only run on the primary. If you were using secondaries for CHECKDB or backup jobs, you will need to rethink that pattern
  • No replication support — transactional replication cannot be used alongside Contained AGs
  • No distributed AG support in SQL Server 2022 — DAGs and Contained AGs could not be combined until SQL Server 2025
  • Automatic seeding required for the contained system databases in SQL Server 2022
  • Connect via listener to create contained objects — objects created by connecting directly to a replica instance (not the listener) are not AG-scoped and do not replicate

10 Index Maintenance on an AG — The Hard Truth

This is the operational topic that most AG documentation underemphasizes. Every index maintenance operation — ALTER INDEX REBUILD, ALTER INDEX REORGANIZE, CREATE INDEX, DROP INDEX — generates transaction log records on the primary that must be shipped and redone on every secondary. Large index rebuilds on large tables produce enormous volumes of log that take hours to redo on secondaries, can grow the primary’s log file to many times the database size during the operation, and can directly delay the synchronous secondary’s redo completion, extending your effective RPO.

The Redo Block Problem

The redo thread on a readable secondary can be blocked by a long-running read-only query. If a reporting query holds a shared lock on a table’s pages and the redo thread needs to apply a DDL change (like the physical page moves from an index rebuild), the redo thread waits. This means: heavy reporting activity on the read secondary can delay index rebuild redo, which in turn keeps the primary’s log file growing because it cannot be truncated until redo completes everywhere.

-- Monitor redo queue and rate on all secondaries
-- redo_queue_size growing during maintenance = redo falling behind
-- redo_rate = how fast (KB/sec) the secondary is applying log records
SELECT
    ar.replica_server_name,
    ar.availability_mode_desc,
    drs.synchronization_state_desc,
    drs.redo_queue_size              AS [Redo Queue (KB)],
    drs.redo_rate                    AS [Redo Rate (KB/sec)],
    CASE WHEN drs.redo_rate > 0
         THEN CAST(drs.redo_queue_size / drs.redo_rate AS DECIMAL(10,1))
         ELSE NULL END               AS [Est Min to Catch Up],
    drs.log_send_queue_size          AS [Log Send Queue (KB)]
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_database_replica_states drs ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ag.group_id = ar.group_id
WHERE ag.name = 'YourAG'
ORDER BY drs.redo_queue_size DESC;

Index Maintenance Best Practices for AG Environments

  • Question the rebuild first. Microsoft’s own guidance notes that indexes accessed primarily from the buffer pool have no disk fragmentation problem — the pages are read from memory in the cached order. Consider updating statistics with FULLSCAN instead of rebuilding for large, frequently-accessed tables. Statistics accuracy is what actually drives query plan quality.
  • Use REORGANIZE instead of REBUILD for moderately fragmented indexes. REORGANIZE is a series of small transactions rather than one large transaction — it generates far less log per operation and does not hold the log open for the duration.
  • Use ONLINE = ON where possible for rebuilds. Online index rebuild still generates the same volume of log but does not acquire exclusive table-level locks, so reporting queries on the secondary are not competing with redo for lock access.
  • Schedule maintenance during the lowest redo lag window. Identify the time when async secondaries’ redo queues are smallest and secondary read load is lowest. That is your maintenance window.
  • Temporarily convert sync to async before large rebuilds on synchronous commit AGs (see Section 4).
  • Use Ola Hallengren’s IndexOptimize with @UpdateStatistics = 'ALL' — it handles AG-awareness, skips read-only secondaries, and supports threshold-based rebuild/reorganize decisions.
-- Identify high-fragmentation indexes worth rebuilding vs reorganizing
-- (run in the context of the relevant database, on the primary)
SELECT
    SCHEMA_NAME(o.schema_id) + '.' + OBJECT_NAME(ps.object_id)  AS [Object],
    i.name                                                        AS [Index],
    ps.index_type_desc,
    CAST(ps.avg_fragmentation_in_percent AS DECIMAL(5,1))        AS [Frag %],
    ps.page_count,
    CASE
        WHEN ps.avg_fragmentation_in_percent >= 30
         AND ps.page_count >= 1000 THEN 'REBUILD'
        WHEN ps.avg_fragmentation_in_percent >= 5
         AND ps.page_count >= 100  THEN 'REORGANIZE'
        ELSE 'SKIP'
    END                                                           AS [Recommended Action]
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') ps
JOIN sys.indexes  i ON ps.object_id = i.object_id AND ps.index_id = i.index_id
JOIN sys.objects  o ON o.object_id  = i.object_id
WHERE ps.database_id = DB_ID()
AND ps.page_count > 100
ORDER BY ps.avg_fragmentation_in_percent DESC;

11 Backup Strategy — Offloading to Secondaries

One of the highest-value use cases for AG secondaries is offloading backups from the primary. Full database backups are I/O-intensive operations that compete with user workload. Running them on a secondary eliminates that competition from the primary entirely.

Backup TypeCan Run on Secondary?Notes
Full database backupYes (copy-only only, on secondary)Secondary can only take copy-only full backups — these do not interrupt the log chain on the primary
Differential backupNoDifferential bases on the last full or differential on the primary — cannot be taken on secondary
Transaction log backupYesCan be taken on any secondary — but all log backups (primary and secondary) must be kept together for a complete restore chain
Full + Differential + Log (SQL Server 2025)Yes — all types on secondarySQL Server 2025 removed this restriction — full and differential backups can now be taken on secondaries without copy-only restriction
-- Set backup preference to prefer secondary replicas
ALTER AVAILABILITY GROUP [YourAG]
SET (AUTOMATED_BACKUP_PREFERENCE = SECONDARY);
-- Options: PRIMARY, SECONDARY_ONLY, SECONDARY (prefer but allow primary), NONE

-- Backup job T-SQL must be AG-aware -- only run on preferred replica
-- Use this check at the start of every backup job step:
IF sys.fn_hadr_backup_is_preferred_replica(DB_NAME()) <> 1
BEGIN
    PRINT 'Not the preferred backup replica. Skipping.';
    RETURN;
END

-- Then run the backup:
BACKUP DATABASE [YourDatabase]
TO DISK = N'\\BackupShare\YourDatabase_COPY_ONLY.bak'
WITH COPY_ONLY, COMPRESSION, CHECKSUM, STATS = 10;

-- IMPORTANT: If taking log backups on secondary, all log backups must be
-- consolidated for a restore chain. Keep them in the same UNC path.
-- Ola Hallengren's DatabaseBackup handles this correctly with -BackupType Log
-- on secondary replicas.

12 SQL Agent Jobs in an AG Environment

This is one of the most common operational pitfalls with traditional (non-Contained) AGs. SQL Agent jobs live in msdb on each instance. They are not replicated by the AG. When a failover occurs, the new primary has its own instance of msdb — and if jobs were not pre-deployed to that instance, they will not run after failover.

The Pattern for Non-Contained AGs

  1. Deploy all AG-related jobs to every replica. Use scripting tools (dbatools, SSMS, or custom PowerShell) to deploy the same jobs to all replicas simultaneously.
  2. Make every job AG-aware. The first step of every job that touches AG databases should check whether this instance is the current primary. If it is not, exit gracefully.
  3. Jobs that should run on the primary only: check sys.fn_hadr_is_primary_replica() and exit if not primary.
  4. Jobs that can run on secondary: verify the database is accessible for read-only operations before proceeding.
-- AG-aware job step template -- primary-only jobs
-- Add this as Step 1 of any job that requires primary replica access
IF (SELECT sys.fn_hadr_is_primary_replica(N'YourDatabase')) <> 1
BEGIN
    PRINT 'Not the primary replica for YourDatabase. Job exiting gracefully.';
    -- Exit with success so the job does not show as failed
    -- Comment out the RETURN and add RAISERROR if you want it to mark as failed
    RETURN;
END
-- Actual job logic follows here

-- AG-aware job step template -- backup jobs (prefer secondary)
IF sys.fn_hadr_backup_is_preferred_replica(N'YourDatabase') <> 1
BEGIN
    PRINT 'Not the preferred backup replica. Skipping.';
    RETURN;
END
-- Backup logic follows here

Contained AG Simplification

With Contained AGs (SQL Server 2022+), jobs created via the listener are stored in the AG-scoped msdb and replicate automatically. After failover, those jobs are present on the new primary and will run without intervention. The trade-off: contained AG jobs cannot be offloaded to run on secondary replicas — they always run on the primary.

13 CHECKDB — Primary, Secondary, or Both?

DBCC CHECKDB detects corruption at the file level — physical page integrity, index consistency, inter-object links. The AG ships log records between replicas, not data pages. If a page is physically corrupted on the primary’s storage, the corruption does not transfer to the secondary through the log stream. The secondary has its own physically independent copy of the data on its own storage. A corrupted page on primary storage would be detected by the page checksum mechanism and raise error 824 — but the secondary’s copy of that page, written from the log stream, would be clean.

This means: running CHECKDB on a secondary confirms the secondary’s storage is clean, but does not verify the primary’s storage. To fully verify all copies, CHECKDB should run on the primary and on at least one secondary independently.

-- Run CHECKDB on secondary replica for the secondary's copy of the database
-- Connect directly to the secondary instance (not through the listener)
-- Use PHYSICAL_ONLY for a faster check during business hours
DBCC CHECKDB (N'YourDatabase') WITH PHYSICAL_ONLY, NO_INFOMSGS;

-- Run full CHECKDB on primary during low-traffic window (weekly)
-- This is the authoritative check for the primary's storage integrity
DBCC CHECKDB (N'YourDatabase') WITH NO_INFOMSGS, ALL_ERRORMSGS;

-- Monitor last CHECKDB across all databases
SELECT
    name,
    DATABASEPROPERTYEX(name, 'LastGoodCheckDbTime') AS [Last Good CheckDB],
    DATEDIFF(day,
        CAST(DATABASEPROPERTYEX(name, 'LastGoodCheckDbTime') AS DATETIME),
        GETDATE()) AS [Days Since CheckDB]
FROM sys.databases
WHERE database_id > 4
ORDER BY [Days Since CheckDB] DESC;

14 Row Versioning Overhead on Readable Secondaries

When any secondary replica is configured with ALLOW_CONNECTIONS = READ_ONLY, SQL Server automatically enables a form of snapshot isolation for all reads on those secondaries. This allows read queries to access consistent data without being blocked by redo operations applying on-going changes. The mechanism is row versioning — the same underlying technology as RCSI.

The cost: every row that is modified, inserted, or deleted on the primary receives a 14-byte overhead added to the row for version tracking metadata. This impacts storage and can slightly increase I/O for write-heavy workloads on the primary.

The 14-byte overhead is per modified row, not per transaction. For most OLTP workloads this is imperceptible. For very high-write workloads on tables with narrow rows (many small rows being updated frequently), it can become measurable — typically in the 1–3% storage growth range. Version data is stored in TempDB on the secondary, not on the primary, so TempDB sizing on the secondary needs to account for active read workloads.

-- Monitor row versioning overhead and TempDB version store on the secondary
-- Connect to the secondary replica and run:
SELECT
    DB_NAME(database_id)    AS [Database],
    reserved_page_count     AS [Version Store Pages],
    reserved_space_kb / 1024 AS [Version Store (MB)]
FROM sys.dm_tran_version_store_space_usage
ORDER BY reserved_space_kb DESC;

-- If version store is growing, look for long-running read queries on the secondary
-- that are preventing version store cleanup
SELECT
    s.session_id,
    s.login_name,
    s.status,
    r.total_elapsed_time / 1000   AS [Elapsed (sec)],
    r.logical_reads,
    r.command,
    t.text
FROM sys.dm_exec_sessions s
LEFT JOIN sys.dm_exec_requests r ON r.session_id = s.session_id
OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE s.is_user_process = 1
AND s.status = 'running'
ORDER BY r.total_elapsed_time DESC;

15 The Listener Is a Virtual Name — What That Means

The AG listener is a Virtual Network Name (VNN) — a DNS record and Windows cluster IP resource that floats between nodes. It is not a real server. It is not an instance of SQL Server. It is a cluster-managed alias that always points to whichever server is currently the primary replica.

This distinction matters for several operational reasons:

  • The listener name resolves to the cluster IP resource, which resolves to the current primary’s IP. After failover, the cluster moves the IP resource and updates DNS — but DNS TTL means old cached records may persist for minutes.
  • The listener does not appear in sys.servers or sys.endpoints as a normal server — it is a cluster resource.
  • Connecting through the listener with ApplicationIntent=ReadOnly does NOT guarantee you land on the secondary — it requests routing, which the primary then redirects based on your routing list configuration. If no readable secondary is available, the connection falls through to the primary.
  • Connecting directly to a replica’s server name (bypassing the listener) always goes to that specific instance regardless of its role — useful for direct secondary access, but breaks automatic HA redirection.
-- Verify which replica you're actually connected to after connecting through the listener
SELECT
    @@SERVERNAME                                                   AS [Physical Server],
    sys.fn_hadr_is_primary_replica(DB_NAME())                      AS [Is Primary],
    (SELECT role_desc FROM sys.dm_hadr_availability_replica_states
     WHERE is_local = 1)                                           AS [Replica Role];

-- After connecting with ApplicationIntent=ReadOnly through the listener:
-- Is Primary should be 0, Replica Role should be SECONDARY
-- If Is Primary = 1, read routing fell back to primary (check routing config)

16 CNAME per Application vs Direct Listener — Pros, Cons, and When Each Makes Sense

This is a real architectural choice with genuine trade-offs on both sides. At a previous company with multiple AGs serving multiple applications, the approach of creating one CNAME per application — each pointing to the AG listener — was used to add a layer of application-level naming abstraction on top of the AG infrastructure. The intent was good: it made the connection origin trackable and gave per-application routing flexibility.

Real-world experience — CNAMEs per application in a multi-AG environment In a multi-AG environment with several distinct applications, individual CNAMEs were configured for each application pointing to the appropriate AG listener. For example: APP1-DB → AG1-LISTENER, APP2-DB → AG1-LISTENER, APP3-DB → AG2-LISTENER. The motivation was clear visibility — looking at a connection in sys.dm_exec_sessions showed the CNAME the application used, making troubleshooting simpler. The benefit was real in practice for identifying which application was generating which load. The challenge was DNS TTL management — on failover, the AG listener’s DNS record updated through the cluster mechanism, but applications cached the CNAME resolution independently of the listener’s cluster-managed TTL. Some connections took 15–30 minutes to reconnect to the new primary after failover because the CNAME’s TTL had not expired. The lesson: if using CNAMEs, set their TTL to match or be shorter than the listener’s HostRecordTTL (typically 30–60 seconds for production AG environments). Also: ensure MultiSubnetFailover=True is in all connection strings, and test failover behavior with the CNAMEs in place — not just with direct listener names.

✓ CNAME per Application Pros

  • Clear connection identity in sys.dm_exec_sessions — you can see exactly which application is connected
  • Future infrastructure flexibility — change the underlying AG or listener without updating every application’s connection string, just update the CNAME
  • Supports migration: point the CNAME to a new listener during a database migration without touching applications
  • Application-specific routing policies possible
  • Cleaner documentation — CNAME names are application-meaningful

✗ CNAME per Application Cons

  • Additional DNS layer means additional TTL propagation to manage on failover
  • CNAME TTL must be managed independently and aligned to cluster failover expectations
  • More DNS records to maintain, monitor, and document
  • Requires DNS change permission and coordination for migrations (may involve infrastructure team)
  • MultiSubnetFailover behavior with CNAME can be unpredictable — test thoroughly
  • Some older ODBC drivers do not honor the MultiSubnetFailover hint when connecting through a CNAME

When to Use CNAMEs

  • Multiple applications connecting to the same AG listener where per-application visibility is valuable for troubleshooting
  • Environments with planned future migration or infrastructure changes where you want a naming abstraction layer
  • When the operations team owns DNS and connection strings must not change between environments

When to Connect Directly to the Listener

  • Simple AG environments with one or two applications
  • When failover speed is the highest priority and you want the fewest DNS layers
  • When MultiSubnetFailover=True support is confirmed for all client drivers in use
-- If using CNAMEs: verify DNS TTL is set appropriately
-- Run from PowerShell on a client machine:
Resolve-DnsName -Name "YourCNAMEName" -Type CNAME | Select-Object Name, TTL, NameHost
Resolve-DnsName -Name "YourAGListenerName" -Type A | Select-Object Name, TTL, IPAddress

-- The CNAME TTL should be <= the listener's HostRecordTTL
-- Check listener HostRecordTTL from cluster (PowerShell on cluster node):
Get-ClusterResource "YourAGListener" | Get-ClusterParameter HostRecordTTL

17 Connection String Best Practices

Getting the connection string right is not optional — wrong connection string settings cause silent failures after failover, read routing that falls back to primary without warning, and slow reconnection times.

Standard Read-Write Connection (Primary)

-- ADO.NET / Microsoft.Data.SqlClient (recommended for new development)
Server=tcp:YourAGListener,1433;
Database=YourDatabase;
Integrated Security=SSPI;
MultiSubnetFailover=True;
-- ApplicationIntent not specified = ReadWrite = primary

-- ODBC
Driver={ODBC Driver 18 for SQL Server};
Server=tcp:YourAGListener,1433;
Database=YourDatabase;
Trusted_Connection=Yes;
MultiSubnetFailover=Yes;

Read-Only Connection (Routed to Secondary)

-- ADO.NET
Server=tcp:YourAGListener,1433;
Database=YourDatabase;
Integrated Security=SSPI;
MultiSubnetFailover=True;
ApplicationIntent=ReadOnly;
-- Note: ApplicationIntent is case-sensitive in some drivers

Connection String Rules

  • Always include MultiSubnetFailover=True — even for single-subnet AGs. It enables parallel IP connection attempts during failover and pre-configures clients for future multi-subnet expansion without connection string changes.
  • Always use TCP:// protocol prefix and port number — do not rely on Named Pipes or browser-based discovery. Explicit TCP ensures predictable behavior across subnets and firewalls.
  • ApplicationIntent=ReadOnly is case-sensitive — some drivers will not honor it if the case does not match exactly.
  • Always connect to the listener name, not the server name — applications pointing directly to a server name lose the failover redirection benefit entirely.
  • Test both intent types after any failover test — confirm both read-write and read-only connections reach the expected replicas after simulating a failover.

18 Key DMVs for AG Health Monitoring

-- Complete AG health snapshot
SELECT
    ag.name                                AS [AG],
    ar.replica_server_name                 AS [Replica],
    ar.availability_mode_desc              AS [Mode],
    ar.failover_mode_desc                  AS [Failover],
    ars.role_desc                          AS [Role],
    ars.synchronization_health_desc        AS [Sync Health],
    ars.connected_state_desc               AS [Connection],
    ars.operational_state_desc             AS [Operational],
    drs.synchronization_state_desc         AS [DB Sync],
    drs.is_suspended                       AS [Suspended],
    drs.redo_queue_size                    AS [Redo Queue KB],
    drs.log_send_queue_size                AS [Send Queue KB],
    drs.redo_rate                          AS [Redo Rate KB/s],
    drs.log_send_rate                      AS [Send Rate KB/s],
    drs.last_redone_time                   AS [Last Redo],
    drs.last_received_time                 AS [Last Received]
FROM sys.availability_groups                      ag
JOIN sys.availability_replicas                    ar
    ON ar.group_id    = ag.group_id
JOIN sys.dm_hadr_availability_replica_states      ars
    ON ars.replica_id = ar.replica_id
JOIN sys.dm_hadr_database_replica_states          drs
    ON drs.replica_id = ar.replica_id
ORDER BY ag.name, ars.role_desc, ar.replica_server_name;

-- Estimated data loss on async replicas (potential RPO exposure)
-- Run on primary only
SELECT
    ag.name                                AS [AG],
    ar.replica_server_name                 AS [Async Replica],
    drs.log_send_queue_size                AS [Unsent Log (KB)],
    drs.redo_queue_size                    AS [Unredone Log (KB)],
    CASE WHEN drs.redo_rate > 0
         THEN CAST(drs.redo_queue_size / drs.redo_rate AS DECIMAL(10,1))
         ELSE NULL END                     AS [Est Catch-Up Min],
    -- Estimated data loss in seconds = unredone log / redo rate
    -- This is your current RPO exposure for forced failover to this replica
    CASE WHEN drs.redo_rate > 0
         THEN CAST(((drs.redo_queue_size / drs.redo_rate) * 60.0) AS DECIMAL(10,0))
         ELSE NULL END                     AS [Est Data Loss (sec)]
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_database_replica_states drs ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ag.group_id = ar.group_id
WHERE ar.availability_mode_desc = 'ASYNCHRONOUS_COMMIT'
AND drs.is_local = 0
ORDER BY drs.redo_queue_size DESC;

19 Redo Queue Management — The Most Important Metric

The redo queue is the amount of transaction log on a secondary that has been received from the primary but not yet applied (redone) to the secondary's databases. A growing redo queue means the secondary is falling behind. It has two critical implications: the secondary's data is stale (read queries return older data), and a forced failover to this secondary will result in data loss equal to the redo queue size.

Common causes of redo queue growth: I/O bottleneck on the secondary's storage (redo writes cannot keep up with incoming log), CPU contention from concurrent read queries on a readable secondary, or log-intensive operations on the primary (large index rebuilds, bulk loads) generating log faster than the secondary can apply it.

-- Alert threshold: redo queue above 500 MB is concerning for most OLTP environments
-- Alert threshold: redo queue above 5 GB requires immediate investigation

-- Set up a SQL Agent alert job using this query:
SELECT
    ar.replica_server_name,
    SUM(drs.redo_queue_size) / 1024.0    AS [Redo Queue (MB)],
    AVG(drs.redo_rate)                    AS [Avg Redo Rate (KB/s)]
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_database_replica_states drs ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ag.group_id = ar.group_id
WHERE ag.name = 'YourAG'
AND drs.is_local = 0
GROUP BY ar.replica_server_name
HAVING SUM(drs.redo_queue_size) > 512000;  -- 500 MB threshold

-- If redo queue is growing: check for long-running queries blocking redo on secondary
-- Connect directly to the secondary instance and run:
SELECT
    r.session_id,
    r.total_elapsed_time / 1000  AS [Elapsed (sec)],
    r.logical_reads,
    r.wait_type,
    t.text                        AS [SQL]
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.session_id > 50
AND r.total_elapsed_time > 30000  -- running more than 30 seconds
ORDER BY r.total_elapsed_time DESC;

20 On-Premises: Single Data Center HA

The baseline pattern: two nodes in the same data center, both on the same subnet, synchronous commit with automatic failover, file share witness on a third server.

Data Center A ├── PROD-DB1 (Primary) ─── synchronous ───► PROD-DB2 (Secondary) │ │ automatic failover └── AG-LISTENER (VNN) ◄──────────────────────────────┘ │ File Share Witness (FSW) — third vote for quorum WSFC: 3 votes total (PROD-DB1 + PROD-DB2 + FSW)
  • Both nodes should be identical or near-identical hardware — the synchronous secondary must keep up with primary commit rate
  • Dedicated heartbeat network recommended — separate from client network
  • File share witness on a server that is unlikely to be affected by the same failure that takes down a SQL node (different rack, different switch, different power circuit)

21 On-Premises: Multi-Site HA + DR

The common enterprise pattern: synchronous secondary at the primary site for automatic HA failover, asynchronous secondary at the DR site for manual DR failover.

Data Center A (Primary Site) ├── PROD-DB1 (Primary) ├── PROD-DB2 (Sync Secondary — automatic failover target) └── AG-LISTENER │ │ synchronous ▼ Data Center B (DR Site) └── DR-DB1 (Async Secondary — manual DR failover only) Quorum: PROD-DB1 (vote) + PROD-DB2 (vote) + FSW-A (vote, at primary site) DR-DB1 vote weight = 0 (prevents DR site from blocking primary site quorum)

Remove the DR node's vote from quorum. If DR-DB1 participates in quorum voting and the WAN link between sites fails, the primary site (PROD-DB1 + PROD-DB2) can still maintain quorum with the file share witness. Without removing the vote, a WAN outage could prevent the primary site cluster from making decisions — your production databases go offline even though both primary site servers are healthy.

22 Azure VM: HA with Internal Load Balancer

Azure VMs cannot use the traditional Windows cluster IP resource for the listener because Azure VMs are not directly assigned floating IPs — the Azure fabric controls IP routing. An Internal Load Balancer (ILB) is required to provide the single endpoint IP that the listener needs. The ILB uses health probes to determine which VM is the current primary and directs traffic accordingly.

Azure Virtual Network ├── VM: SQL-VM1 (Primary) ─── synchronous ───► VM: SQL-VM2 (Secondary) │ │ └── Internal Load Balancer (ILB) │ ├── Frontend IP: 10.0.0.100 (listener address) │ ├── Backend Pool: SQL-VM1, SQL-VM2 │ ├── Health Probe: port 59999 (SQL Server AG probe) │ └── Load Balancing Rule → Direct Server Return (DSR) │ │ Cloud Witness (Azure Storage Account) ◄────────────────┘ (replaces file share witness for quorum)
-- Azure: Configure the SQL Server health probe responder on each VM
-- The ILB health probe checks port 59999 on the current primary
-- SQL Server responds on this port only when it is the primary replica
-- Enable with this PowerShell (run on each SQL VM):
$clusterIP = "10.0.0.100"  # ILB frontend IP
$clusterIPResourceName = "IP Address $clusterIP"

# Update cluster parameters to use ILB
Get-ClusterResource $clusterIPResourceName |
    Set-ClusterParameter -Multiple @{
        Address         = $clusterIP
        ProbePort       = 59999
        SubnetMask      = "255.255.255.255"
        Network         = (Get-ClusterNetwork)[0].Name
        EnableDhcp      = 0
    }

-- Cloud Witness (preferred for Azure -- replaces file share witness)
-- Set from PowerShell on any cluster node:
Set-ClusterQuorum -CloudWitness -AccountName "yourstorageaccount" `
    -AccessKey "your_storage_access_key"

23 Hybrid: On-Premises Primary + Azure DR Replica

A powerful disaster recovery pattern: keep your primary and synchronous HA secondary on-premises for low-latency OLTP, add an asynchronous Azure VM replica as a cloud-based DR target. This gives you geographic separation, automated cloud infrastructure for DR, and the option to test DR in Azure without impacting the on-premises cluster.

On-Premises Data Center ├── PROD-DB1 (Primary) ├── PROD-DB2 (Sync Secondary — local HA) └── AG-LISTENER (on-premises) │ │ asynchronous (VPN or ExpressRoute) ▼ Azure Region └── AZURE-DR-VM (Async Secondary — Azure VM) ├── Azure Internal Load Balancer (for Azure-side listener) └── Managed Disk storage Quorum: PROD-DB1 + PROD-DB2 + File Share Witness (on-premises) AZURE-DR-VM vote weight = 0

Key considerations for hybrid deployments:

  • Use Azure ExpressRoute for the WAN link between on-premises and Azure if possible. VPN Gateway works but has higher latency variability and lower bandwidth ceiling.
  • The Azure VM must be domain-joined — WSFC requires Active Directory. Use Azure Active Directory Domain Services or extend on-premises AD to Azure with a domain controller VM.
  • Set the Azure VM replica's quorum vote weight to 0 — it is a DR target, not a quorum participant.
  • DNS must resolve the listener name from both on-premises and Azure networks. Consider split-horizon DNS or ensuring the on-premises DNS zone is visible from the Azure VNet.
  • With SQL Server 2022 and the Managed Instance Link feature, you can alternatively use Azure SQL Managed Instance as an asynchronous DR target without deploying a full VM — simplifying management at the cost of some flexibility.

24 Read-Scale AG Without a Cluster (WSFC-Free) SQL Server 2017+

SQL Server 2017 introduced the ability to create an AG without a Windows Server Failover Cluster for read-scale scenarios where you do not need automatic failover — only readable secondaries and manual DR. This removes the WSFC requirement, simplifying deployment significantly for read-offload use cases.

-- Create a read-scale AG without cluster (SQL Server 2017+)
-- No WSFC required -- manual failover only -- no listener possible without cluster
CREATE AVAILABILITY GROUP [ReadScaleAG]
WITH (CLUSTER_TYPE = NONE)    -- no WSFC
FOR DATABASE [ReportingDatabase]
REPLICA ON
    N'OLTP-PRIMARY' WITH (
        ENDPOINT_URL      = N'TCP://OLTP-PRIMARY.domain.com:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        FAILOVER_MODE     = MANUAL,
        SEEDING_MODE      = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
    ),
    N'REPORT-SECONDARY' WITH (
        ENDPOINT_URL      = N'TCP://REPORT-SECONDARY.domain.com:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        FAILOVER_MODE     = MANUAL,
        SEEDING_MODE      = AUTOMATIC,
        SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
    );
GO

-- No listener is possible without WSFC
-- Applications connect directly to the secondary server name for read queries
-- Primary server name for writes
-- This is suitable for dedicated reporting infrastructure where no listener is needed

Linux support: Read-scale AGs without a cluster are the primary deployment pattern for SQL Server on Linux, where Pacemaker replaces WSFC for cluster-managed AGs. On Linux with Pacemaker, full HA including automatic failover is supported. On Linux without Pacemaker (CLUSTER_TYPE = NONE), only manual failover is available — same trade-off as Windows without WSFC.

References


Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading