Do We Still Need DBCC CHECKDB If We’re on All-Flash Storage?

Short answer: yes. Moving your databases to SSDs/NVMe arrays (even the fancy enterprise ones) doesn’t remove the need for DBCC CHECKDB. Fast disks reduce latency; they don’t eliminate the many other ways a database can become corrupt.

Below I’ll walk through why you still run it, what CHECKDB actually checks, a few common “but we’re on flash” myths, when you might adjust frequency, and some practical run patterns with examples.


What DBCC CHECKDB Actually Does

DBCC CHECKDB validates the physical and logical integrity of your database:

  • Physical consistency: allocation structures, page linkage, page header fields, torn pages, and (if enabled) checksums.
  • Logical consistency: system catalog validity, cross-object relationships, and per-table/per-index logical rules (think orphaned rows, broken relationships in internal structures).
  • Data purity (on modern databases by default): looks for out-of-range and invalid values in columns (e.g., a datetime that’s not a real datetime).

It uses a database snapshot to read a transactionally consistent view while your workload runs, and it can use tempdb for some work tables.

Key point: CHECKDB is not just “read every page.” It understands SQL Server’s internal structures and verifies that they’re sane.


Why All-Flash Doesn’t Make CHECKDB Optional

Flash reduces I/O wait time. It doesn’t protect you from:

  1. Memory or firmware bugs
    Bad bits can come from anywhere in the stack—CPU, memory, HBA, controller firmware, drivers—not just the disks. Those faults can persist to storage and only surface later.
  2. Write-ordering and power events
    Even with capacitors and battery-backed caches, rare events happen. Torn or partially written pages can still occur (especially if protections are misconfigured or fail).
  3. File system or volume issues
    NTFS/ReFS hiccups, controller quirks, or pathing problems can mangle data before it ever touches the NAND cells.
  4. Logical corruption
    No storage medium prevents logical mistakes: bugs in application code, unexpected query patterns, bad bulk loads, or misuse of WITH (IGNORE_CONSTRAINTS)-style operations. CHECKDB is your last line of defense for these.
  5. Human error
    Accidental drops, mismatched restores, or in-place file replacements won’t be “caught” by a fast disk.
  6. Silent corruption
    Checksums help detect it, but CHECKDB cross-validates structures that checksums alone can’t guarantee.

Myths to Retire

  • “We have enterprise SSDs with end-to-end protection; we’re safe.”
    Great—keep them. But you still have CPU, RAM, firmware, drivers, hypervisors, and humans in the loop. Corruption isn’t a “disk-only” problem.
  • “We enabled page checksums and backup checksums; that replaces CHECKDB.”
    Checksums are necessary but not sufficient. They detect certain types of physical damage; they don’t validate higher-level logical consistency.
  • “We run Availability Groups; the secondary is fine.”
    AGs replicate pages; they can replicate corruption, too. You can (and should) offload CHECKDB to a readable secondary to save primary resources—but that’s a scheduling decision, not a reason to skip it. If storage stacks differ across replicas, consider running CHECKDB on more than one.
  • “Our vendor runs health checks on the SAN.”
    SAN health ≠ SQL Server logical integrity. Different layers, different guarantees.

When You Might Tune (Not Eliminate) CHECKDB

Large databases and tight maintenance windows are real. You can adjust frequency and scope without abandoning safety:

  • Daily PHYSICAL_ONLY, weekly full CHECKDB
    PHYSICAL_ONLY is much faster; it focuses on lower-level physical checks. Use it for early warning, then run a full CHECKDB weekly.
  • Rotate by filegroup
    For very large DBs with separate filegroups, use DBCC CHECKFILEGROUP on a rotation (e.g., different filegroup each night, full run on weekends).
  • Target hot tables between full runs
    DBCC CHECKTABLE on the most critical tables mid-week, with full CHECKDB on the weekend.
  • Offload to a restored copy
    Restore last night’s full + diffs/logs to a non-prod server and run CHECKDB there. This also validates your restore chain.
  • Offload to AG secondary
    If you have a readable secondary, run there to spare the primary—just remember this doesn’t test the primary’s I/O path.

Settings That Help (But Don’t Replace) CHECKDB

  • PAGE_VERIFY = CHECKSUM on every database.
    Detects torn/partial writes at read time.
  • Backups with checksums (BACKUP ... WITH CHECKSUM).
    Catches corruption during backup and helps you fail fast.
  • Regular test restores
    A backup you can’t restore is not a backup. Automate restores to a sandbox and run CHECKDB there.
  • Stable firmware/drivers and power-loss protection
    Keep the stack patched and validated; use controller caches with proper protection.

Practical Run Patterns and Examples

1) Full weekly, physical-only daily

-- Daily (fast): physical checks only
DBCC CHECKDB (N'YourDB') WITH PHYSICAL_ONLY, NO_INFOMSGS;

-- Weekly (deeper): full logical and physical checks
DBCC CHECKDB (N'YourDB') WITH NO_INFOMSGS, ALL_ERRORMSGS;

2) Rotate by filegroup (very large DBs)

-- Nightly rotation example
DBCC CHECKFILEGROUP (N'PRIMARY') WITH NO_INFOMSGS;       -- Mon
DBCC CHECKFILEGROUP (N'FG_HOT') WITH NO_INFOMSGS;        -- Tue
DBCC CHECKFILEGROUP (N'FG_WARM') WITH NO_INFOMSGS;       -- Wed
DBCC CHECKFILEGROUP (N'FG_COLD') WITH NO_INFOMSGS;       -- Thu
-- Full CHECKDB on Sun
DBCC CHECKDB (N'YourDB') WITH NO_INFOMSGS, ALL_ERRORMSGS;

3) Offload to a restored copy

-- On a non-prod server
-- 1) Restore last full + diffs + logs
-- 2) Then run:
DBCC CHECKDB (N'YourDB_Restore') WITH NO_INFOMSGS, ALL_ERRORMSGS;

4) Run on an AG readable secondary

Create a SQL Agent job on the secondary:

IF sys.fn_hadr_is_primary_replica('YourDB') = 0
BEGIN
    DBCC CHECKDB (N'YourDB') WITH NO_INFOMSGS, ALL_ERRORMSGS;
END

Tip: capture output to a table or to a file so you have a history of passes/failures.

5) Capture results for auditing

CREATE TABLE dbo.CheckDBLog
(
    RunID        int IDENTITY(1,1) PRIMARY KEY,
    DBName       sysname,
    RunTimeUTC   datetime2(0) DEFAULT SYSUTCDATETIME(),
    Passed       bit,
    ErrorText    nvarchar(max)
);

BEGIN TRY
    DBCC CHECKDB (N'YourDB') WITH NO_INFOMSGS, ALL_ERRORMSGS;
    INSERT dbo.CheckDBLog(DBName, Passed) VALUES (N'YourDB', 1);
END TRY
BEGIN CATCH
    INSERT dbo.CheckDBLog(DBName, Passed, ErrorText)
    VALUES (N'YourDB', 0, ERROR_MESSAGE());
    THROW; -- bubble up to fail the job
END CATCH;

Operational Considerations

  • Resource use: CHECKDB can be CPU and I/O heavy, and the snapshot can consume space. Schedule it outside peak hours.
  • Snapshot location: Database snapshots live on the same volume by default. Ensure there’s free space for the snapshot growth during the run.
  • Tempdb load: CHECKDB uses tempdb for work tables; size tempdb appropriately.
  • Parallelism: CHECKDB can go parallel. If it’s fighting your ETL, cap it with MAXDOP at the server level or schedule around ETL.

When People Ask “Why Not Skip It?”

If your risk tolerance is zero for data loss and downtime (and whose isn’t?), CHECKDB is your early-warning system. Skipping it is like turning off a smoke alarm because you bought fire-resistant drapes.

If you must reduce frequency, do it intentionally:

  • Keep daily PHYSICAL_ONLY, weekly full.
  • Or offload to restored copies / AG secondaries.
  • Maintain checksums, verified backups, and test restores.

Example SQL Agent Schedules (Plain English)

  • Nightly 1:00 AM: DBCC CHECKDB ... WITH PHYSICAL_ONLY on each user DB, staggered by server to avoid overlap.
  • Sunday 3:00 AM: Full DBCC CHECKDB on each user DB (or run on restored copies).
  • Monthly: Review dbo.CheckDBLog (or job history) and keep at least 12 months of pass/fail history.

Troubleshooting Failures (At a Glance)

  1. Read the output—it often points to the table/index.
  2. Check recent hardware/firmware changes and Windows/System event logs.
  3. Try a clean restore from the most recent good backup and re-run CHECKDB on the restored copy.
  4. If needed, use the minimum repair (REPAIR_REBUILD vs REPAIR_ALLOW_DATA_LOSS) on a copy, not production—restores are safer than repairs.

Summary

  • All-flash storage doesn’t remove the need for DBCC CHECKDB. Corruption can come from many places besides spinning disks: memory, firmware, drivers, power events, the file system, application logic, and humans.
  • Keep running it, but tune the approach: use PHYSICAL_ONLY daily and a full run weekly, rotate by filegroup, or offload to restored copies or AG secondaries.
  • Layer defenses: PAGE_VERIFY CHECKSUM, backup checksums, regular test restores, and a history of CHECKDB results.
  • Plan for resources: schedule off-peak, ensure snapshot/tempdb space, and monitor runtime.
  • If you ever need to choose between fewer CHECKDBs and none, choose fewernever none.

Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading