📘 SQL Always On Availability Groups – Hybrid Setup Steps

This series walks through setting up a resilient SQL Server Always On Availability Group (AG) with two on-premises nodes, one Azure replica, and an Azure Cloud Witness for quorum. By the end, you’ll have a production-ready cluster with automatic failover on-prem and disaster recovery in the cloud.

Part 1 — Build the Windows Server Failover Cluster (WSFC) for SQL Always On

Topology: 2 on-prem nodes + 1 Azure node, Azure Cloud Witness, multi-subnet listener

Example worksheet (fill yours in before you start)

ItemExample
AD Domaincorp.example.com
Cluster Name (CNO)SQLCLUSTER
NodesSQLNODE1 (on-prem), SQLNODE2 (on-prem), SQLNODE-AZ (Azure VM)
Node IPs170.xx.xx.11 (NODE1), 170.xx.xx.12 (NODE2), 170.xx.xx.21 (NODE-AZ)
Cluster Mgmt IP (if used)170.xx.xx.30
Listener NameSQLLISTENER
Listener IPs (per subnet)170.xx.xx.50 (on-prem), 170.xx.xx.60 (Azure)
SQL Service AccountCORP\sqlsvc
Azure Storage (Cloud Witness)https://sqlwitnessacct.blob.core.windows.net/ (LRS)
HADR Endpoint Port5022
Listener Port1433
Azure LB Health Probe Port(s)59999 (listener), 58888 (cluster IP, optional)

Pre-configuration checklist (every node)

  • Not a domain controller; all nodes joined to the same domain.
  • Time sync and DNS working across sites.
  • Same OS version/patch level (e.g., Windows Server 2019+).
  • Power settings: Disable NIC power saving (“Allow the computer to turn off this device”).
  • Drives aligned and consistent (paths/sizes) across nodes for Data, Logs, TempDB.
  • Outbound access to *.core.windows.net (for Cloud Witness).
  • Service account CORP\sqlsvc exists.

Active Directory objects & permissions (clean and safe)

  • Pre-stage the CNO (Computer object named SQLCLUSTER) in an OU you control.
  • Give the CNO Create Computer objects and Read All Properties on the OU that will contain VCOs (listener / AG names).
  • The cluster wizard will create VCOs (e.g., SQLLISTENER) under the CNO’s permissions.

Tip: Keep cluster-related computer objects (CNO + VCOs) in a dedicated OU so GPOs and permissions are predictable.

Install the WSFC feature (every node)

PowerShell (preferred):

Install-WindowsFeature Failover-Clustering -IncludeManagementTools
Restart-Computer

Validate the nodes

Run from any node:

Test-Cluster -Node SQLNODE1,SQLNODE2,SQLNODE-AZ

Open the HTML report; warnings may be OK; fix failures.

Create the cluster

New-Cluster -Name SQLCLUSTER `
  -Node SQLNODE1,SQLNODE2,SQLNODE-AZ `
  -StaticAddress 170.xx.xx.30

Then:

Get-ClusterNode

All nodes should show Up.

Configure quorum: Azure Cloud Witness

  1. Create an Azure Storage Account (General purpose, LRS, Standard).
  2. Get the Access Key.
  3. Set quorum:
Set-ClusterQuorum -CloudWitness `
  -AccountName "sqlwitnessacct" `
  -AccessKey "REDACTED_ACCESS_KEY"

Cloud Witness removes dependence on on-prem file shares and is ideal for multi-site clusters.

Basic WSFC tuning for multi-subnet SQL

  • Lower DNS TTL for the cluster name and later for the SQL listener to reduce stale DNS:
# Example for a network name resource called "SQL Network Name (SQLLISTENER)"
Get-ClusterResource "SQL Network Name (SQLLISTENER)" | Set-ClusterParameter HostRecordTTL 300
  • In multi-subnet AGs, modern clients should use MultiSubnetFailover=True. If not, you can set:
# Only if you must (older clients). Prefer MultiSubnetFailover=True instead.
Get-ClusterResource "SQL Network Name (SQLLISTENER)" | Set-ClusterParameter RegisterAllProvidersIP 0
  • Set preferred owners so on-prem nodes are chosen first:
(Get-ClusterGroup "Cluster Group").PreferredOwners = @("SQLNODE1","SQLNODE2","SQLNODE-AZ")

Part 2 — Install & Prepare SQL Server on the Nodes

Server & account prep

  • CORP\sqlsvc = SQL Server service account
    • Local admin on each node (or grant the required rights explicitly).
    • Rights commonly needed (managed via GPO): Log on as a service, Lock pages in memory, Impersonate a client, Debug programs, Bypass traverse checking, Backup files, etc.

Install SQL Server (each node)

  • Run New SQL Server stand-alone installation (not failover cluster install).
  • Select features: Database Engine Services (add Full-Text/Replication if you need them).
  • Instance: default or named; be consistent across nodes.
  • Collation: keep identical.
  • Paths:
    • D:\Data (user DBs)
    • E:\Logs (T-logs)
    • F:\TempDB (TempDB)
  • Finish install; apply latest CU (same build on all nodes).

Enable Always On (each node)

SQL Server Configuration Manager → SQL Server (MSSQLSERVER) → Enable Always On Availability Groups → Restart service.

Create HADR endpoints (if missing)

Run on each node:

CREATE ENDPOINT [Hadr_endpoint]
    STATE = STARTED
    AS TCP (LISTENER_PORT = 5022)
    FOR DATABASE_MIRRORING (ROLE = ALL);
GO
-- Ensure the SQL service account owns the endpoint
ALTER AUTHORIZATION ON ENDPOINT::[Hadr_endpoint] TO [CORP\sqlsvc];
GO

Smoke test connectivity between nodes

  • Open firewalls for TCP 5022 between all nodes.
  • Verify:
telnet 170.xx.xx.11 5022
telnet 170.xx.xx.12 5022
telnet 170.xx.xx.21 5022

Seed a test database (we’ll add it to the AG)

On primary candidate (e.g., SQLNODE1):

CREATE DATABASE AlwaysOnTest
ON (NAME = AlwaysOnTest_Data, FILENAME = 'D:\Data\AlwaysOnTest.mdf', SIZE = 128MB),
   (NAME = AlwaysOnTest_Log , FILENAME = 'E:\Logs\AlwaysOnTest.ldf', SIZE = 64MB);
GO
BACKUP DATABASE AlwaysOnTest TO DISK = 'D:\Backups\AlwaysOnTest.bak' WITH INIT;
BACKUP LOG AlwaysOnTest TO DISK = 'D:\Backups\AlwaysOnTest.trn' WITH INIT;

Restore on secondaries WITH NORECOVERY (only needed if not using automatic seeding).

Part 3 — Create the Availability Group + Listener (Multi-Subnet)

Create AG AG1 (wizard or T-SQL)

In SSMS on SQLNODE1:
Always On High AvailabilityAvailability GroupsNew Availability Group Wizard.

Recommended model (hybrid HA/DR):

  • Replicas:
    • SQLNODE1Synchronous, Automatic Failover = Yes
    • SQLNODE2Synchronous, Automatic Failover = Yes
    • SQLNODE-AZAsynchronous, Automatic Failover = No (DR)
  • Readable secondary: Yes (we’ll use routing in Part 4)
  • Seeding: Automatic seeding OR Full/Log restore WITH NORECOVERY
  • Databases: include AlwaysOnTest (must be in FULL recovery and backed up)

T-SQL (equivalent skeleton)

-- On SQLNODE1 (primary)
CREATE AVAILABILITY GROUP [AG1]
  WITH (AUTOMATED_BACKUP_PREFERENCE = SECONDARY)
  FOR DATABASE [AlwaysOnTest]
  REPLICA ON
    N'SQLNODE1'   WITH (ENDPOINT_URL = N'TCP://SQLNODE1.corp.example.com:5022',
                        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
                        FAILOVER_MODE = AUTOMATIC,
                        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)),
    N'SQLNODE2'   WITH (ENDPOINT_URL = N'TCP://SQLNODE2.corp.example.com:5022',
                        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
                        FAILOVER_MODE = AUTOMATIC,
                        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)),
    N'SQLNODE-AZ' WITH (ENDPOINT_URL = N'TCP://SQLNODE-AZ.corp.example.com:5022',
                        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
                        FAILOVER_MODE = MANUAL,
                        SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));
GO

Create the listener SQLLISTENER (multi-subnet)

In the wizard Listener step:

  • Listener DNS name: SQLLISTENER
  • Port: 1433
  • Static IPs:
    • On-prem subnet IP: 170.xx.xx.50
    • Azure subnet IP: 170.xx.xx.60

Multi-subnet means the listener has two IPs. Clients should use MultiSubnetFailover=True in their connection strings for fast failover.

Azure internal load balancer (ILB) for the listener

Because Azure doesn’t natively ARP for the cluster-owned listener IP, you need an Internal Load Balancer (Standard recommended) on the Azure subnet:

1) Create ILB

  • Type: Internal
  • VNet/Subnet: same as SQLNODE-AZ
  • Static private IP: the Azure listener IP (170.xx.xx.60)

2) Backend pool

  • Add SQLNODE-AZ NIC to the pool.

3) Health Probe

  • TCP 59999, interval 5 sec, unhealthy threshold 2.
  • Add inbound NSG rule on SQLNODE-AZ for 59999/TCP.

4) Load-balancing rules (Floating IP / DSR = Enabled)

  • Rule A (Listener):
    • Frontend: 170.xx.xx.60
    • Protocol: TCP 1433
    • Backend port: 1433 (ignored when Floating IP is on)
    • Probe: the 59999 probe
    • Session persistence: None
    • Floating IP: Enabled

(Optional) Rule B (Cluster Core) if you also expose the cluster core IP in Azure:

  • TCP 3343, same probe pattern (use another probe port like 58888 if used), Floating IP Enabled.

5) Tell the cluster which node is “owner” for Azure probe
On SQLNODE-AZ, open firewall for probe port and configure the probe IP resource in the AG listener OR set the probe into the cluster parameters (varies by exact design). In many designs, the probe watches the owner node’s dummy endpoint (e.g., a firewall rule listening on 59999) and DSR handles the frontend VIP.

If you use SSMS wizard to create the listener first, you can attach the ILB afterwards—just ensure the Azure subnet IP for the listener matches the ILB frontend IP and the probe is healthy.

Verify the AG & listener

-- Replica health
SELECT ag.name, ar.replica_server_name, ars.role_desc, ars.operational_state_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id;

-- Database sync
SELECT ag.name, ar.replica_server_name, drs.database_id, drs.synchronization_state_desc
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ag.group_id = ar.group_id;

-- Endpoints
SELECT name, state_desc, port, role_desc FROM sys.database_mirroring_endpoints;

-- Listener view
SELECT * FROM sys.availability_group_listeners;
SELECT * FROM sys.availability_group_listener_ip_addresses;

Part 4 — Read-Only Routing (T-SQL & GUI), Ops Tuning, and Validation

Enable readable secondaries (all replicas)

ALTER AVAILABILITY GROUP [AG1] 
  MODIFY REPLICA ON N'SQLNODE1'   WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));
ALTER AVAILABILITY GROUP [AG1] 
  MODIFY REPLICA ON N'SQLNODE2'   WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));
ALTER AVAILABILITY GROUP [AG1] 
  MODIFY REPLICA ON N'SQLNODE-AZ' WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));

Define read-only routing URLs (FQDNs or hostnames)

ALTER AVAILABILITY GROUP [AG1] 
  MODIFY REPLICA ON N'SQLNODE1' 
  WITH (SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://SQLNODE1.corp.example.com:1433'));

ALTER AVAILABILITY GROUP [AG1] 
  MODIFY REPLICA ON N'SQLNODE2' 
  WITH (SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://SQLNODE2.corp.example.com:1433'));

ALTER AVAILABILITY GROUP [AG1] 
  MODIFY REPLICA ON N'SQLNODE-AZ' 
  WITH (SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://SQLNODE-AZ.corp.example.com:1433'))

Set routing lists (order of preference per primary)

Prefer the other on-prem node first, then Azure, then self:

ALTER AVAILABILITY GROUP [AG1]   
  MODIFY REPLICA ON N'SQLNODE1' 
  WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = ('SQLNODE2','SQLNODE-AZ','SQLNODE1')));

ALTER AVAILABILITY GROUP [AG1]   
  MODIFY REPLICA ON N'SQLNODE2' 
  WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = ('SQLNODE1','SQLNODE-AZ','SQLNODE2')));

ALTER AVAILABILITY GROUP [AG1]   
  MODIFY REPLICA ON N'SQLNODE-AZ' 
  WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = ('SQLNODE1','SQLNODE2','SQLNODE-AZ')));
GO

Connection string for read-only workloads:

Server=SQLLISTENER;Database=YourDB;
ApplicationIntent=ReadOnly;MultiSubnetFailover=True;Encrypt=True;TrustServerCertificate=True;

SSMS GUI path (alternative)

AG → Properties → Replicas tab:

  • Set Readable Secondary = Yes for each.
  • Enter Read-Only Routing URL for each replica.
  • Configure Read-Only Routing List for each primary.

Operational tuning & hygiene

DNS & client behavior

  • Prefer client drivers that support MultiSubnetFailover=True.
  • Lower TTL for listener (HostRecordTTL=300).
  • Consider RegisterAllProvidersIP=0 only for legacy clients; otherwise leave to default and use MultiSubnetFailover=True.

Preferred owners & failover policy

  • Core cluster groups owned by on-prem nodes first:
(Get-ClusterGroup "AG1").PreferredOwners = @("SQLNODE1","SQLNODE2","SQLNODE-AZ")

Kerberos/SPNs (if you use Windows auth to the listener)

Register SPNs on the SQL service account for each node and the listener:

setspn -S MSSQLSvc/SQLLISTENER.corp.example.com:1433 CORP\sqlsvc
setspn -S MSSQLSvc/SQLNODE1.corp.example.com:1433    CORP\sqlsvc
setspn -S MSSQLSvc/SQLNODE2.corp.example.com:1433    CORP\sqlsvc
setspn -S MSSQLSvc/SQLNODE-AZ.corp.example.com:1433  CORP\sqlsvc

Eliminate duplicate SPNs:

setspn -X

Firewalls & ports to allow

  • SQL Server: 1433/TCP (client), 5022/TCP (HADR)
  • Cluster: 3343/TCP (intra-cluster), ICMP as needed
  • Azure ILB Health Probe: 59999/TCP (and 58888/TCP if you add a separate cluster core rule)

Monitoring & logs

Get-ClusterLog -UseLocalTime -Destination C:\ClusterLogs

Backups & jobs

  • Consider running full/log backups from a preferred secondary (with COPY_ONLY full if needed).
  • Ensure agent jobs that must run on the primary are guarded by sys.fn_hadr_is_primary_replica.

Validation checklist (copy/paste)

  • WSFC validated and healthy (3 nodes).
  • Cloud Witness configured (Azure LRS).
  • SQL binaries same version/CU on all nodes.
  • HADR endpoints started on port 5022.
  • AG AG1 created with 2 sync + 1 async (Azure).
  • Listener SQLLISTENER online with IPs 170.xx.xx.50 (on-prem) and 170.xx.xx.60 (Azure).
  • Azure Internal Load Balancer set for 170.xx.xx.60, probe 59999, rule 1433 (Floating IP On).
  • Read-only routing URLs and lists configured; read-only connections route to secondaries.
  • Client string uses MultiSubnetFailover=True.
  • Failover tested between on-prem nodes; DR tested to Azure (manual).

Appendix — Quick “fresh” worksheet template

  • Domain: corp.example.com
  • Nodes:
    • SQLNODE1 — IP 170.xx.xx.11
    • SQLNODE2 — IP 170.xx.xx.12
    • SQLNODE-AZ — IP 170.xx.xx.21
  • Cluster Name (CNO): SQLCLUSTER (IP 170.xx.xx.30)
  • AG Name: AG1
  • Listener: SQLLISTENER — IPs 170.xx.xx.50 (on-prem), 170.xx.xx.60 (Azure), Port 1433
  • Service Account: CORP\sqlsvc
  • HADR Endpoint Port: 5022
  • Azure Storage: sqlwitnessacct (LRS), Key: <redacted>
  • Azure ILB Health Probe(s): 59999 (listener), 58888 (optional cluster rule)

Wrap-Up

At this point, you’ve built a hybrid Always On AG with:

  • Two on-prem synchronous replicas for HA.
  • One Azure async replica for DR.
  • Azure Cloud Witness for cluster quorum.
  • Read-only routing to offload reporting workloads.

Applications can now connect via the listener SQLLISTENER at 170.xx.xx.50 and automatically route read-only queries to secondary replicas.


Discover more from SQLyard

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLyard

Subscribe now to keep reading and get access to the full archive.

Continue reading