SQL Server: How to Partition a Large Table in a 2-Node Always On AG with Minimal Downtime
Compatibility: SQL Server 2014 and later with Always On Availability Groups. Table partitioning requires Enterprise Edition. The AG Failover method described here works on any synchronous-commit 2-node AG configuration.
- Scenario Overview
- Approach Comparison — Why AG Failover Is Recommended
- Impact Summary
- Environment Reference
- Step 1 — Pre-Flight Checks
- Step 2 — Fail Over AG to Node B
- Step 3 — Remove Target Database from AG on Node A
- Step 4 — Run Partition Work on Node A (Isolated)
- Step 5 — Backup Partitioned Database on Node A
- Step 6 — Fail Over AG Back to Node A
- Step 7 — Reseed Node B
- Step 8 — Post-Completion Validation
- Rollback Plan
- Partition Work Phases Reference
- References
Scenario Overview
You have a large production table — in this example approximately 350 GB and 3.4 million rows — that needs to be partitioned by date. The database runs in a 2-node synchronous Always On Availability Group with automatic failover. All application traffic connects through the AG Listener.
The challenge is that partitioning a 350 GB table requires rebuilding the clustered index, dropping and recreating the primary key, and adding nonclustered indexes — all of which are destructive, I/O-intensive operations that take 4–9 hours. Running these operations on the primary node while users are connected is technically possible but practically very damaging.
The AG Failover method solves this by performing all destructive work on an isolated node with zero application connections. The application experiences two brief failovers (~10–30 seconds each) — everything else is zero impact.
Note — Partition Elimination and Query Performance: Partitioning is primarily a data management tool, not a performance tool. Queries against the partitioned table must filter on the partition column (TransactionDate) for partition elimination to work — meaning the query only scans the relevant partition(s) rather than the whole table. If your application queries do not filter on the partition key, SQL Server will scan all partitions, which can make those queries slower than before partitioning. Confirm that your key queries include a WHERE TransactionDate BETWEEN... or equivalent filter before committing to this change.
Approach Comparison — Why AG Failover Is Recommended
Two approaches were evaluated. Option A (run directly on the primary) was initially proposed. Option B (AG Failover method) is the recommended approach after analysis.
Option A — Run on Primary
- All work done on the live primary node
- Application remains connected throughout
- No failovers required
- Lower complexity
Option B — AG Failover (Recommended)
- Fail to Node B, isolate Node A
- All destructive work on isolated node
- Two brief failovers (~10–30 sec each)
- Zero impact during partition work
| Factor | Option A — Run on Primary | Option B — AG Failover |
|---|---|---|
| Application downtime | None officially — but 4–9 hours of degraded performance | Two failovers (~10–30 seconds each) |
| Query performance during work | Severely impacted — clustered index rebuild on 350 GB causes heavy I/O, CPU contention, and lock escalation | Zero impact — all heavy work on isolated node with no users |
| Index availability during work | PK and NC indexes dropped — queries fall back to full table scans on 350 GB until recreated | Indexes on active node untouched until clean cutover |
| Blocking risk | High — CREATE CLUSTERED INDEX on 350 GB competes with application queries for locks | None — no application queries on the node being modified |
| AG resync after work | ~350 GB of transaction log must ship to secondary — saturates network and secondary I/O for hours | Clean backup/restore — no log shipping overhead |
| Impact to secondary (read-only) | Secondary unusable during resync — reporting queries impacted | Secondary gets a clean partitioned copy via restore |
| HA during work | Full HA maintained throughout | Single node during partition work + reseed (4–12 hours) |
| Complexity | Lower — fewer steps | Medium — more steps, but each step is safer |
| Rollback | Restore from backup — 4–9 hour window with partial changes on primary | App on untouched node until cutover — abort at any point before Step 6 with zero impact |
Key risk with Option A: While the database technically stays online, rebuilding the clustered index and dropping/recreating PK + NC indexes on a 350 GB table while users are actively querying it causes 3–6 hours of heavy I/O contention, a period where any query using those indexes hits a full table scan, and approximately 350 GB of transaction log shipping to the secondary. This is a functional degradation, not a no-impact change. If Option A is chosen, it must be treated as a full maintenance window with application teams notified.
Impact Summary
| What | Duration | When |
|---|---|---|
| AG Failover #1 — app moves to Node B | ~10–30 seconds | Start of maintenance window |
| Partition work on isolated Node A | 4–9 hours — ZERO app impact | During maintenance window |
| AG Failover #2 — app moves back to Node A | ~10–30 seconds | After partition work completes |
| Node B reseed (backup/restore) | 1–3 hours — ZERO app impact | After failover #2 |
Environment Reference
The examples throughout this article use the following example names. Substitute your actual values:
| Node A (initial primary) | PROD-DBNODE1 |
| Node B (initial secondary) | PROD-DBNODE2 |
| AG Name | PROD-AG01 |
| AG Listener | PROD-AGLISTEN (port 1433) |
| Target Database | AppDatabase |
| Target Table | dbo.TransactionDetail (~350 GB) |
| Partition Column | TransactionDate (DATE) |
| Other DBs in AG | DB1, DB2, DB3, DB4 (remain protected throughout) |
| Backup Share | \\backup-share\AppDatabase\ |
This failover moves all databases in the AG — not just the target database. Confirm all application teams are aware that all AG databases will experience the brief failover disruptions at Steps 2 and 6.
Pre-Flight Checks Full HA
Step 1a: Confirm AG Health
Both replicas must show SYNCHRONIZED and CONNECTED before proceeding. Do not start the maintenance window on a degraded AG.
-- Run on either node
SELECT
ag.name AS ag_name,
ar.replica_server_name,
ars.role_desc,
ars.synchronization_health_desc,
ars.connected_state_desc
FROM sys.dm_hadr_availability_replica_states ars
JOIN sys.availability_replicas ar
ON ars.replica_id = ar.replica_id
JOIN sys.availability_groups ag
ON ar.group_id = ag.group_id;
Step 1b: Confirm Node Roles
-- Confirm before starting:
-- Node A (PROD-DBNODE1) = PRIMARY
-- Node B (PROD-DBNODE2) = SECONDARY
-- AG Name: PROD-AG01
-- Listener: PROD-AGLISTEN (port 1433)
SELECT replica_server_name, role_desc
FROM sys.dm_hadr_availability_replica_states ars
JOIN sys.availability_replicas ar
ON ars.replica_id = ar.replica_id;
Step 1c: Full Backup
-- Run on PROD-DBNODE1 (current primary)
BACKUP DATABASE [AppDatabase]
TO DISK = N'\\backup-share\AppDatabase\AppDatabase_PrePartition_FULL.bak'
WITH COMPRESSION, CHECKSUM, STATS = 5;
Step 1d: Disable Transaction Log Backups
CRITICAL — Do this before any work begins. Disable any scheduled transaction log backup jobs for AppDatabase. If a log backup runs between your manual backups and the restore on Node B (Step 7), the log chain will break and the restore will fail.
-- Identify and disable log backup jobs that include AppDatabase
-- Record job names here before disabling:
-- Job Name: ____________________________
-- To disable a job:
EXEC msdb.dbo.sp_update_job
@job_name = N'YourLogBackupJobName',
@enabled = 0;
Step 1e: Record Row Count
-- Record exact count — must match post-partition total in Step 8
SELECT COUNT(*) AS total_rows FROM dbo.TransactionDetail;
Fail Over AG to Node B Full HA
Step 2a: Execute Failover
-- Run on PROD-DBNODE2 (the current secondary)
ALTER AVAILABILITY GROUP [PROD-AG01] FAILOVER;
This failover moves ALL databases in the AG to Node B — not just AppDatabase. All other databases (DB1, DB2, DB3, DB4) will also move. Confirm this is understood and accepted by all application owners before proceeding.
Step 2b: Verify Roles Have Swapped
SELECT replica_server_name, role_desc
FROM sys.dm_hadr_availability_replica_states ars
JOIN sys.availability_replicas ar
ON ars.replica_id = ar.replica_id;
Step 2c: Confirm App Connectivity Through Listener
Verify the application is connected and functioning through PROD-AGLISTEN on PROD-DBNODE2 before proceeding to Step 3.
Remove AppDatabase from AG on Node A Single Node — Node B Serving
Step 3a: Remove AppDatabase from AG on Node A
-- Run on PROD-DBNODE1 (now the secondary)
ALTER DATABASE [AppDatabase] SET HADR OFF;
This only removes AppDatabase from the AG on this node. The other databases (DB1, DB2, DB3, DB4) remain in the AG and remain fully protected.
Step 3b: Recover the Database on Node A
-- The database will be in RESTORING state after removal — recover it
-- Run on PROD-DBNODE1
RESTORE DATABASE [AppDatabase] WITH RECOVERY;
-- Run on PROD-DBNODE1
SELECT name, state_desc FROM sys.databases WHERE name = 'AppDatabase';
Should show ONLINE.
From this point: AppDatabase runs on PROD-DBNODE2 only via AG Listener. No HA for AppDatabase. The other databases remain fully protected in the AG. This is the expected single-node window — proceed promptly.
Run Partition Work on Node A (Isolated) Single Node — Node B Serving
Execute all partition phases on PROD-DBNODE1. Because no application connections exist on this node, all operations run at full I/O speed with no contention, no blocking, and no lock escalation concerns.
| Phase | Description | Est. Duration | User Impact |
|---|---|---|---|
| Phase 0 | Fix nullable columns + verify prerequisites | Minutes | None — isolated node |
| Phase 1 | Create partition function + partition scheme | Minutes | None — isolated node |
| Phase 2 | Rebuild clustered index with partition scheme (~350 GB rebuild) | 3–6 hours | None — isolated node |
| Phase 3 | Rebuild primary key + nonclustered indexes partition-aligned | 1–3 hours | None — isolated node |
| Phase 4 | Validate partitions + row counts | Minutes | None — isolated node |
| Phase 5 | Archive setup + SWITCH old partitions to archive table | Minutes | None — isolated node |
| Phase 6 | Create SQL Agent maintenance job for ongoing partition management | Minutes | None — isolated node |
Phase 1: Create Partition Function and Scheme
-- Run on PROD-DBNODE1 (isolated)
-- Adjust boundary dates to match your retention requirements
CREATE PARTITION FUNCTION pf_TransDetail_Date (DATE)
AS RANGE RIGHT FOR VALUES
(
'2020-01-01', '2021-01-01', '2022-01-01',
'2023-01-01', '2024-01-01', '2025-01-01',
'2026-01-01', '2027-01-01'
);
CREATE PARTITION SCHEME ps_TransDetail_Date
AS PARTITION pf_TransDetail_Date
ALL TO ([PRIMARY]); -- Adjust to target filegroups as needed
Phase 2: Rebuild Clustered Index with Partition Scheme
-- Drop existing clustered index and rebuild on partition scheme
-- This is the longest-running operation (3–6 hours on 350 GB)
-- No users on this node — runs at full speed
CREATE CLUSTERED INDEX CIX_TransactionDetail_Date
ON dbo.TransactionDetail (TransactionDate, TransactionID)
WITH (DROP_EXISTING = ON, ONLINE = OFF, SORT_IN_TEMPDB = ON)
ON ps_TransDetail_Date (TransactionDate);
Phase 3: Rebuild Primary Key Partition-Aligned
-- Drop the existing PK (non-partitioned)
ALTER TABLE dbo.TransactionDetail
DROP CONSTRAINT PK_TransactionDetail;
-- Recreate PK partition-aligned
ALTER TABLE dbo.TransactionDetail
ADD CONSTRAINT PK_TransactionDetail
PRIMARY KEY CLUSTERED (TransactionDate, TransactionID)
ON ps_TransDetail_Date (TransactionDate);
-- Rebuild any nonclustered indexes partition-aligned
CREATE NONCLUSTERED INDEX IX_TransactionDetail_CustomerID
ON dbo.TransactionDetail (CustomerID)
INCLUDE (TransactionDate, Amount)
ON ps_TransDetail_Date (TransactionDate);
Phase 4: Validate Partitions
-- Confirm 12 partitions listed
SELECT
p.partition_number,
p.rows,
prv.value AS boundary_value
FROM sys.partitions p
JOIN sys.tables t ON p.object_id = t.object_id
LEFT JOIN sys.partition_range_values prv
ON prv.boundary_id = p.partition_number - 1
WHERE t.name = 'TransactionDetail'
AND p.index_id IN (0, 1)
ORDER BY p.partition_number;
Phase 5: Archive Setup and SWITCH
-- Create archive table (must match schema exactly)
CREATE TABLE dbo.TransactionDetail_Archive
(
TransactionID INT NOT NULL,
TransactionDate DATE NOT NULL,
CustomerID INT NULL,
Amount DECIMAL(18,2) NULL,
-- match all columns from main table
CONSTRAINT PK_TransactionDetail_Archive
PRIMARY KEY CLUSTERED (TransactionDate, TransactionID)
ON [PRIMARY],
CONSTRAINT CK_Archive_Date
CHECK (TransactionDate < '2023-01-01')
) ON [PRIMARY];
-- Switch old partitions to archive table
-- Partition 1 contains data before 2020, Partition 2 = 2020, etc.
ALTER TABLE dbo.TransactionDetail
SWITCH PARTITION 1 TO dbo.TransactionDetail_Archive;
ALTER TABLE dbo.TransactionDetail
SWITCH PARTITION 2 TO dbo.TransactionDetail_Archive;
Note — Modern Alternative for Ongoing Partition Purging (SQL Server 2016+): The SWITCH-to-archive pattern above is the classic approach and remains fully valid — it gives you a recoverable archive copy of the data before removal. Since SQL Server 2016, you can also use TRUNCATE TABLE...WITH (PARTITIONS(N)) to purge old partitions directly without needing a staging archive table. Both methods work. Use SWITCH when you need to retain an accessible archive copy. Use TRUNCATE with partition number when you simply want to purge old data with the least overhead.
-- SQL Server 2016+ syntax — purge partitions 1 and 2 directly
-- Minimal logging, no archive table required, no SWITCH prerequisites
TRUNCATE TABLE dbo.TransactionDetail WITH (PARTITIONS (1, 2));
This is the recommended approach for routine sliding-window partition maintenance on SQL Server 2016 and later. SWITCH remains the better choice when a recoverable archive copy is required.
- All indexes show ps_TransDetail_Date / PARTITION_SCHEME
- Expected number of partitions listed
- Archive SWITCH verified — main table partitions switched = 0 rows, archive has data
- Total row count (main + archive) matches pre-partition count from Step 1e
Backup Partitioned Database on Node A Single Node — Node B Serving
Step 5a: Full Backup
-- Run on PROD-DBNODE1 (isolated, partitioned database)
BACKUP DATABASE [AppDatabase]
TO DISK = N'\\backup-share\AppDatabase\AppDatabase_Partitioned_FULL.bak'
WITH COMPRESSION, CHECKSUM, STATS = 5;
Step 5b: Log Backup
-- Run on PROD-DBNODE1
BACKUP LOG [AppDatabase]
TO DISK = N'\\backup-share\AppDatabase\AppDatabase_Partitioned_LOG.trn'
WITH COMPRESSION, CHECKSUM, STATS = 5;
Fail Over AG Back to Node A Single Node — A Serving After Failover
Step 6a: Remove AppDatabase from AG on Node B
-- Run on PROD-DBNODE2 (current primary)
-- Node B still has the OLD unpartitioned AppDatabase in the AG
ALTER AVAILABILITY GROUP [PROD-AG01] REMOVE DATABASE [AppDatabase];
AppDatabase is now out of the AG on both nodes. The other databases are unaffected. The AG Listener will no longer route AppDatabase connections until it is re-added in Step 6d.
Step 6b: Drop Old AppDatabase on Node B
-- Run on PROD-DBNODE2
-- This drops the old unpartitioned copy — confirmed it was removed from AG in 6a first
DROP DATABASE [AppDatabase];
Step 6c: Fail Over AG to Node A
-- Run on PROD-DBNODE1 (the node with the partitioned database)
ALTER AVAILABILITY GROUP [PROD-AG01] FAILOVER;
This failover moves all remaining databases (DB1, DB2, DB3, DB4) back to PROD-DBNODE1. AppDatabase is not in the AG yet — it will be added in Step 6d.
Step 6d: Add AppDatabase Back to AG from Node A
-- Run on PROD-DBNODE1 (now primary, has the partitioned AppDatabase)
ALTER AVAILABILITY GROUP [PROD-AG01] ADD DATABASE [AppDatabase];
-- Confirm AG roles
SELECT replica_server_name, role_desc
FROM sys.dm_hadr_availability_replica_states ars
JOIN sys.availability_replicas ar ON ars.replica_id = ar.replica_id;
-- Confirm AppDatabase synchronization state
SELECT database_name, synchronization_state_desc
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_databases_cluster adc
ON drs.group_database_id = adc.group_database_id;
PROD-DBNODE1 = PRIMARY. AppDatabase should be listed in the AG.
Step 6e: Confirm App Connectivity Through Listener
Verify the application is connected and functioning through PROD-AGLISTEN, now pointing to PROD-DBNODE1 with the partitioned database.
Reseed Node B Single Node — A Serving
Two options to reseed PROD-DBNODE2. Automatic seeding is simpler. Manual seeding (backup/restore) gives more control and is required if automatic seeding is not configured.
Option 1: Automatic Seeding (Recommended)
-- Step 7a: Verify automatic seeding is configured on Node A (primary)
SELECT
ar.replica_server_name,
ar.seeding_mode_desc
FROM sys.availability_replicas ar
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
WHERE ag.name = 'PROD-AG01';
-- If seeding_mode_desc shows MANUAL, change it to AUTOMATIC:
ALTER AVAILABILITY GROUP [PROD-AG01]
MODIFY REPLICA ON N'PROD-DBNODE2'
WITH (SEEDING_MODE = AUTOMATIC);
-- Step 7b: Grant CREATE DATABASE permission on Node B
-- Run on PROD-DBNODE2
ALTER AVAILABILITY GROUP [PROD-AG01] GRANT CREATE ANY DATABASE;
-- Monitor seeding progress from Node A
SELECT
ag.name AS ag_name,
drs.database_name,
drs.current_state,
drs.performed_seeding,
drs.start_time,
drs.completion_time,
drs.failure_state_desc
FROM sys.dm_hadr_automatic_seeding drs
JOIN sys.availability_groups ag ON drs.ag_id = ag.group_id
WHERE drs.database_name = 'AppDatabase';
Option 2: Manual Seeding (Backup/Restore)
Use this if automatic seeding is not available or preferred. Log backup jobs for AppDatabase should already be disabled from Step 1d.
-- Run on PROD-DBNODE1 (primary)
BACKUP DATABASE [AppDatabase]
TO DISK = N'\\backup-share\AppDatabase\AppDatabase_Partitioned_Reseed_FULL.bak'
WITH COMPRESSION, CHECKSUM, STATS = 5;
BACKUP LOG [AppDatabase]
TO DISK = N'\\backup-share\AppDatabase\AppDatabase_Partitioned_Reseed_LOG.trn'
WITH COMPRESSION, CHECKSUM, STATS = 5;
-- Run on PROD-DBNODE2
RESTORE DATABASE [AppDatabase]
FROM DISK = N'\\backup-share\AppDatabase\AppDatabase_Partitioned_Reseed_FULL.bak'
WITH NORECOVERY, STATS = 5;
RESTORE LOG [AppDatabase]
FROM DISK = N'\\backup-share\AppDatabase\AppDatabase_Partitioned_Reseed_LOG.trn'
WITH NORECOVERY, STATS = 5;
-- Join to AG on Node B
ALTER DATABASE [AppDatabase]
SET HADR AVAILABILITY GROUP = [PROD-AG01];
Step 7c: Re-enable Transaction Log Backups
Do not forget: Re-enable the transaction log backup jobs for AppDatabase that were disabled in Step 1d.
EXEC msdb.dbo.sp_update_job
@job_name = N'YourLogBackupJobName',
@enabled = 1;
SELECT
ar.replica_server_name,
ars.role_desc,
drs.synchronization_state_desc,
drs.synchronization_health_desc
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
JOIN sys.dm_hadr_availability_replica_states ars ON ars.replica_id = ar.replica_id
WHERE drs.database_id = DB_ID('AppDatabase');
Both PROD-DBNODE1 and PROD-DBNODE2 should show SYNCHRONIZED and HEALTHY.
✅ Full HA Restored — All databases in PROD-AG01 synchronized on both nodes.
Post-Completion Validation Full HA Restored
Step 8a: Verify Partitions on Both Nodes
-- Run on BOTH PROD-DBNODE1 and PROD-DBNODE2
SELECT
p.partition_number,
p.rows,
prv.value AS boundary_value
FROM sys.partitions p
JOIN sys.tables t ON p.object_id = t.object_id
LEFT JOIN sys.partition_range_values prv
ON prv.boundary_id = p.partition_number - 1
WHERE t.name = 'TransactionDetail'
AND p.index_id IN (0, 1)
ORDER BY p.partition_number;
Step 8b: Verify Total Row Count Matches
SELECT
(SELECT COUNT(*) FROM dbo.TransactionDetail)
+ (SELECT COUNT(*) FROM dbo.TransactionDetail_Archive)
AS total_rows;
-- Must match pre-partition count recorded in Step 1e
Step 8c: Verify All Indexes Are Partition-Aligned
SELECT
i.name AS index_name,
ds.name AS data_space_name,
ds.type_desc
FROM sys.indexes i
JOIN sys.data_spaces ds ON i.data_space_id = ds.data_space_id
WHERE i.object_id = OBJECT_ID('dbo.TransactionDetail')
ORDER BY i.index_id;
-- Every index should show ps_TransDetail_Date / PARTITION_SCHEME
Step 8d: Verify SQL Agent Maintenance Job on Both Nodes
SELECT name, enabled, date_created
FROM msdb.dbo.sysjobs
WHERE name = 'AppDatabase_TransDetail_Partition_Maintenance';
-- Note: if job only exists on Node A, run Phase 6 from the partition runbook on Node B
Step 8e: Application Smoke Test
Confirm the application is functioning correctly through the AG Listener against the partitioned database. Run representative queries and confirm expected response times.
Rollback Plan
| Point in Process | Rollback Action | App Impact |
|---|---|---|
| Before Step 2 (before any failover) | Nothing to roll back — no changes made yet | None |
| After Step 2, before Step 4 (partition work) | Fail back to Node A (reverse Step 2). App on untouched Node A the whole time. | ~10–30 sec failover |
| During Step 4 (partition work in progress) | Stop work on Node A. Fail back to Node A. App was on Node B — still unaffected. | ~10–30 sec failover |
| After Step 6 (partitioned DB is live) | Restore from pre-partition backup (AppDatabase_PrePartition_FULL.bak). This is a longer recovery window — confirm backup is accessible before starting. | Maintenance window required |
The key rollback advantage of Option B: At any point before Step 6, the application is running on the untouched node with the original unmodified database. You can abort with zero application impact and zero data risk.
Partition Work Phases Reference
All phases run on the isolated Node A with no application connections. Estimated total duration: 4–9 hours depending on hardware and data volume.
| Phase | Description | Key Operations | Duration |
|---|---|---|---|
| Phase 0 | Prerequisites | Fix nullable partition column, verify no replication conflicts | Minutes |
| Phase 1 | Partition infrastructure | CREATE PARTITION FUNCTION, CREATE PARTITION SCHEME | Minutes |
| Phase 2 | Clustered index rebuild | DROP + CREATE CLUSTERED INDEX on partition scheme (~350 GB) | 3–6 hours |
| Phase 3 | PK + NC index rebuild | DROP + recreate PRIMARY KEY partition-aligned, rebuild NC indexes | 1–3 hours |
| Phase 4 | Validation | Verify partition count, row counts, index alignment | Minutes |
| Phase 5 | Archive setup + SWITCH | Create archive table, SWITCH old partitions, verify data | Minutes |
| Phase 6 | Maintenance job | Create SQL Agent job for ongoing partition maintenance | Minutes |
References
- Microsoft Docs – Always On Availability Groups Overview
- Microsoft Docs – Planned Manual Failover of an Availability Group
- Microsoft Docs – Partitioned Tables and Indexes
- Microsoft Docs – ALTER TABLE SWITCH
- Microsoft Docs – Automatic Seeding for Secondary Replicas
- Microsoft Docs – ALTER AVAILABILITY GROUP
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


