Meet sys.dm_os_memory_health_history in SQL Server 2025

Understanding memory health in a deeper, more continuous way


What is it?

In SQL Server 2025, Microsoft introduced a new dynamic management view (DMV) called sys.dm_os_memory_health_history. Microsoft Learn

Key points:

  • It captures snapshots of memory usage and health over time. Microsoft Learn
  • Each row is one snapshot. Microsoft Learn
  • Snapshots include various metrics: how much memory is available for new allocations, how much is used by reclaimable caches, which memory clerks are consuming the most, and a “severity” level indicating how healthy (or stressed) memory is. Microsoft Learn
  • It’s a preview feature — schema or behavior might change in future SQL Server 2025 updates. Microsoft Learn

Why it matters

If you’ve worked with earlier versions of SQL Server, you know memory pressure is one of the trickier diagnostics: sometimes everything looks fine until it’s not, and then you scramble. Traditional metrics (buffer pool size, page life expectancy, wait stats, etc.) give pieces, but often after symptoms are visible rather than before.

sys.dm_os_memory_health_history gives you:

  • A continuous view into memory health over time, rather than point-in-time metrics.
  • An early warning system: severity levels, allocation potential, top memory clerks can show you before things degrade badly.
  • Help diagnosing what component(s) are consuming memory (via memory clerks) so you can tell who the possible culprits are.
  • Better context: how much memory is “reclaimable” vs how much is already committed in non‐cache, non‐reclaimable usage.

In short: more data, earlier, more actionable.


What are the important columns & what they mean

Here are the main columns in the DMV, and what they tell you. (These are from Microsoft documentation.) Microsoft Learn

ColumnData TypeWhat it Represents / Why You’d Care
snapshot_timedatetime2When the snapshot was taken. Useful for ordering / trending over time.
allocation_potential_memory_mbintHow much memory is available for new allocations. A drop is an early warning of memory pressure.
external_cache_mbintMemory used by reclaimable caches (buffer pool, columnstore, etc.). High values mean SQL can free space if needed.
top_clerksnvarchar(4000), JSONJSON list of the top memory clerks by usage. Lets you see which components are consuming memory.
severity_leveltinyintHealth indicator: 1 = Low, 2 = Medium, 3 = High. Makes it easy to spot degrading memory conditions quickly.

Also: snapshots are taken roughly every ~15 seconds, and SQL Server keeps up to 256 snapshots. Older ones roll off. Data is reset on service / engine restart. Microsoft Learn


Examples: from Novice to Expert

Here are progressively more advanced ways to use this DMV, with sample queries. You can try them out, adapt them, build dashboards, alerts, etc.


Beginner: Just look at what’s happening now

If you’re new or just want to see the most recent memory health:

SELECT TOP 10
    snapshot_time,
    severity_level AS Severity,
    allocation_potential_memory_mb AS AllocationPotentialMB,
    external_cache_mb AS ReclaimableCacheMB
FROM sys.dm_os_memory_health_history
ORDER BY snapshot_time DESC;

What to watch for:

  • Is allocation_potential_memory_mb low (or dropping)?
  • Is severity_level ever 2 or 3?
  • Does reclaimable cache drop significantly?

Intermediate: Which memory clerks are using the most

Once you see warning signs, you’ll want to know why. This gets into the “top_clerks” JSON:

SELECT
    mh.snapshot_time,
    mh.severity_level,
    mh.allocation_potential_memory_mb,
    clerk.clerk_type,
    clerk.pages_allocated_kb
FROM sys.dm_os_memory_health_history AS mh
CROSS APPLY
    OPENJSON(mh.top_clerks)
    WITH (
       clerk_type sysname '$.clerk_type',
       pages_allocated_kb bigint '$.pages_allocated_kb'
    ) AS clerk
ORDER BY mh.snapshot_time DESC, clerk.pages_allocated_kb DESC;

This lets you see, over time, which clerks (buffer pool, columnstore, query workspace, etc) are consuming memory. Useful for narrowing down where memory is being “used but possibly misused”.


Intermediate+: Trends & alerting

Rather than looking only at raw output, you can build a trend or alert:

  • Compute moving averages of allocation_potential_memory_mb over, say, last N snapshots (e.g. last 5 mins)
  • If severity_level = 3 for more than X snapshots in a row, trigger an alert
  • Plot reclaimable cache vs total cache to see when cache shrinkage is kicking in

Example:

;WITH Recent AS (
  SELECT TOP (20)
     snapshot_time,
     severity_level,
     allocation_potential_memory_mb,
     external_cache_mb
  FROM sys.dm_os_memory_health_history
  ORDER BY snapshot_time DESC
)
SELECT
   MAX(snapshot_time) AS LastSnapshot,
   MIN(allocation_potential_memory_mb) AS MinAllocPotential,
   AVG(allocation_potential_memory_mb) AS AvgAllocPotential,
   MAX(external_cache_mb) AS MaxReclaimableCache,
   SUM(CASE WHEN severity_level = 3 THEN 1 ELSE 0 END) AS HighSeverityCount
FROM Recent;

Then you could have logic: if HighSeverityCount > 3 (i.e. 3 snapshots in a row with High severity), send an alert or investigate immediately.


Expert: Combining with other DMVs, capacity planning, and root cause

At the expert level, you tie together this memory health view with other system data and historical workloads, so you can not only respond to issues but prevent them, tune your server/memory allocation, and forecast future needs.

Some ideas:

  • Combine with memory clerks via sys.dm_os_memory_clerks or sys.dm_os_memory_nodes_processor_groups to see whether clerks consistently correlate with high memory usage.
  • Track query memory grant waits via sys.dm_exec_query_memory_grants — if allocation potential drops, are queries waiting (or failing)?
  • Watch disk IO behavior (logical & physical reads/writes). If memory is squeezed, you expect more physical reads (cache misses) and possibly disk bottlenecks.
  • Resource Governor settings: tie in configurations like max server memory, memory limits per workload, etc., to see if configuration is in sync with what usage patterns demand.
  • Forecasting for scaling: by looking at peak allocation potential, trends in severity, etc., to decide whether hardware upgrades or configuration changes are needed before service is negatively impacted.

Example of combining with memory grants:

-- Find queries waiting for memory when allocation potential was low
;WITH LowAlloc AS (
  SELECT TOP(100)
    snapshot_time,
    allocation_potential_memory_mb,
    severity_level
  FROM sys.dm_os_memory_health_history
  WHERE allocation_potential_memory_mb < 1000 -- threshold depends on your server size
    AND severity_level >= 2
  ORDER BY snapshot_time DESC
)
SELECT
   la.snapshot_time,
   la.allocation_potential_memory_mb,
   mg.requested_memory_kb,
   mg.granted_memory_kb,
   mg.status
FROM LowAlloc AS la
JOIN sys.dm_exec_query_memory_grants AS mg
  ON mg.requested_time BETWEEN la.snapshot_time AND DATEADD(SECOND,15, la.snapshot_time)
ORDER BY la.snapshot_time DESC, mg.requested_memory_kb DESC;

This helps you tie when memory health was poor and what queries were impacted, which is very useful for SLA or performance investigations.


Gotchas & Tips

  • Because it’s preview: behavior, schema might change. Don’t build irreversible systems around it yet (until SQL Server 2025 goes GA). Microsoft Learn
  • Snapshots roll off: only ~256 are stored. If your server runs continuously, older snapshots disappear — make sure your trend window fits inside what’s stored. Microsoft Learn
  • Restart clears history. So if you restart your SQL Server, past data is gone. Useful to know when you see a sudden drop in snapshots.
  • Be careful defining “low” thresholds: what’s low memory potential on one server might be totally fine on another. Tailor thresholds to your physical memory, load patterns, etc.
  • Parsing JSON (top_clerks) has some CPU/IO cost — if you run such queries often, make sure they are scheduled/off-peak or optimized.

Example real‐world scenario

Here’s a fictional but realistic scenario showing how this DMV helps you catch a problem early and act, with the outcome.

Scenario: You’re DBA for several large OLTP databases. Suddenly, your nightly batch jobs are taking longer. Users report slowness in dashboards. Query performance seems more inconsistent — sometimes fast, sometimes painfully slow.

You use the new DMV:

  1. You query recent snapshots and see that over the past 10 minutes, severity_level has been at High (3) for several snapshots. Also, allocation_potential_memory_mb is trending downward (say from 5000MB to 1500MB).
  2. You expand the JSON ‐ top_clerks ‐ and see that a particular clerk (e.g. “memory clerk for query workspace” or “columnstore buffer” or something non‐standard) has sharply increased its usage.
  3. You correlate with sys.dm_exec_query_memory_grants and see many queries are waiting or delayed due to memory grant waits for those same periods.
  4. Effect: Due to lower available memory, cache pressure increases, buffer pool shrinks, more disk reads. That explains slowness.
  5. Action: You increase max server memory (if possible), adjust memory grant settings, maybe redistribute some workload, or offload some non‐critical tasks. Also watch the workload to see if some queries need rewriting to use less memory or spill less.
  6. Outcome: After adjustment, severity drops back to Medium/Low, allocation potential stabilizes, and performance comes back. You may also schedule alerts so next time severity=3 persists more than, say, 3 snapshots, you get notified.

How to build dashboards / operationalize

To maximize value, you’ll likely want to build regular monitoring / dashboards around this DMV. Here are suggestions:

  • Use tools like SQL Server Management Studio, Power BI, Grafana, or other monitoring dashboards. Visualize severity over time, allocation potential, top clerk usage.
  • Create alerts when severity=3 persists for a given time or snapshots, or when allocation potential drops below a threshold.
  • Log snapshots (copy into your own telemetry table) if you want longer history than 256 snapshots or persistent storage across restarts.
  • Combine with other performance metrics (CPU, I/O, waits, etc.) so memory pressure is seen in context.

Where to start, as a DBA / Developer

If you want to begin using this DMV effectively, here’s a suggested path:

  1. Get familiar: run some SELECTs, see what your usual “healthy” values are. What is your server’s allocation potential during non-peak and peak times? What CAUTIOUS thresholds might you set?
  2. Baseline: capture steady state for a few days or weeks. Know what “normal” looks like.
  3. Alerts: build simple alerts around severity and allocation potential. Maybe “severity=3 for more than 2 snapshots in a row” or “alloc potential < X MB during business hours.”
  4. Investigate top_clerks: when memory stress appears, always check which clerks are using memory, to know whether the issue is buffer pool, columnstore, workspace grants, etc.
  5. Tune configuration: explore your max server memory, memory grant percent, possibly Resource Governor settings, etc., in light of the new data.
  6. Document patterns: some workloads (batch, ETL, analytical queries) may naturally need more memory; plan for that.

Microsoft Docs & Related Resources

  • Microsoft’s documentation of sys.dm_os_memory_health_history — includes schema, descriptions, examples. Microsoft Learn
  • Brent Ozar’s writeup “New Objects in SQL Server 2025” which mentions this DMV among others. Brent Ozar Unlimited®
  • Community discussions (e.g. Reddit threads) where DBAs compare new allocation potential metric vs older metrics like Page Life Expectancy, buffer pool behaviors. Reddit

Where this can go wrong + unanswered questions

Even with this DMV, some limitations or things to watch out for:

  • What exactly is “allocation potential” under different loads? Is it always reliable; how does OS-level memory pressure (outside SQL Server) affect it?
  • How accurate is “top_clerks” JSON under very high loads / contention? Might lags or sampling issues hide transient peaks.
  • How do features like buffer pool extension, memory‐optimized objects, external memory consumers (OS cache, other services) influence what this DMV reports?
  • What about multi‐instance setups, virtualization, containers? Behavior might differ.
  • Impact on performance: frequent querying / parsing JSON could add overhead.

Final thoughts: Why this is a big deal

  • This DMV makes memory monitoring much more proactive. Instead of reacting when users complain, you can observe trends and intervene earlier.
  • It gives visibility into parts of SQL Server memory usage that were harder to trace (memory clerks, cache reclaimable vs non‐reclaimable, etc.).
  • For high‐scale / high‐availability systems, memory management is often the tipping point. This helps reduce “memory surprises”.
  • It creates opportunities for better diagnostics, better configuration, better capacity planning.


Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading