SQL Server Best Practices and Configurations

Summary:

This document outlines essential SQL Server best practices, configuration settings, and performance optimizations. These recommendations are designed to enhance performance and efficiency. Please review and apply them according to your specific environment and requirements.

NTFS Allocation Unit Sizes

Microsoft recommends using a 64KB NTFS allocation unit size for all logical volumes containing SQL Server objects. This block size aligns properly with the disk segment boundaries for the underlying storage and the virtualization layer. Format all these disks with a 64KB NTFS allocation unit size to ensure the best alignment and performance.

Windows Indexing

The Windows Indexing service is not utilized by SQL Server unless features such as FILESTREAM are used. If these features are not leveraged, consider disabling Windows Indexing to improve system performance. Disabling indexing reduces CPU and I/O consumption and ensures that the SQL Server runs more efficiently.

SQL Server Instant File Initialization

When a SQL Server data or log file reaches its maximum capacity and needs to expand to accommodate additional data, SQL Server calculates the required space for growth. It then instructs the underlying operating system to write zeros over the target space before extending the file. This zero-write operation can be time-consuming, especially for large amounts of disk space, and it causes the database operation to pause until the process is completed, potentially slowing down overall performance.

Instant File Initialization (IFI) is a feature that can be enabled through the local security policy for the SQL Server service account. With IFI enabled, SQL Server skips the zero-write operation for data files (though not for log files) and simply allocates the space for use. This allows the operation to complete faster, improving performance, as there is no need to write zeros to the disk before the data file is extended.

Important Note: If a disk block has been previously written to, deleted, and then reclaimed by SQL Server as part of the data file expansion, there is a potential risk of exposing previously deleted data through a DBCC PAGE command on that specific disk block. While this is an unlikely security concern, some organizations choose not to enable Instant File Initialization because of this possibility.

Recommendation: Enable Instant File Initialization if the above note is acceptable to both the DBA and security teams. This setting can significantly improve SQL Server performance by passing unnecessary write operations, but security considerations should always be evaluated first.

SQL Server Lock Pages in Memory

Lock Pages in Memory (LPIM) ensures that SQL Server retains control over its memory resources, preventing Windows from paging it to disk. This is critical for maintaining performance, especially in high-demand environments. To enable LPIM, grant the SQL Server service account permission to ‘Lock Pages in Memory’ in the Local Security Settings.

SQL Server Large Memory Pages

SQL Server Enterprise Edition can utilize large memory pages to enhance performance, particularly in environments with more than 32GB of RAM. By reducing the number of memory pointers needed for larger SQL Server deployments, the database engine operates more efficiently. In a typical SQL Server setup, memory blocks are 4KB, which requires many pointers to manage memory. However, with large memory pages, the block size increases to 2MB, significantly reducing the number of pointers needed for memory management.

To enable large memory pages, use SQL Server trace flag 834 and restart the instance.

Important Notes for EC2:

  • Large memory pages require “Lock Pages in Memory” (LPIM) to be enabled for the SQL Server service account.
  • When running SQL Server on EC2 instances, enabling large memory pages is highly recommended to improve memory management and reduce Translation Lookaside Buffer (TLB) misses, which can enhance performance, especially for high-memory workloads.
  • EC2 instances with dedicated memory (like EC2 r5 or m5 instances) can significantly benefit from large memory pages due to their high-performance capabilities.
  • Unlike VMware, EC2 instances do not require a specific platform like vSphere to support large memory pages. However, enabling large memory pages in EC2 still helps optimize memory usage and overall database performance.
  • Note: Do not use large memory pages if column store indexing is in use in any SQL Server database on this instance. Be cautious if the EC2 instance is under memory pressure, as if the operating system is squeezed too hard, SQL Server will not return memory to the OS, which could lead to system instability.
  • To enable large memory pages, add trace flag 834 to a startup parameter for the SQL Server service through the SQL Server Configuration Manager. Restart the instance once this trace flag is added.

To validate that large memory pages are enabled, execute the following query.

SELECT physical_memory_in_use_kb / 1024 AS SQLServerMemoryUsageMB, large_page_allocations_kb / 1024 AS SQLServerLargePagesAllocationMB FROM sys.dm_os_process_memory WITH (NOLOCK) OPTION (RECOMPILE);

SQL Server Soft-NUMA 2016 above (Not sure how this works with EC2)

Soft NUMA is a feature that allows SQL Server to subdivide NUMA nodes into smaller ones, optimizing CPU scheduling. It is enabled when the number of cores per socket exceeds eight, and it can improve performance by reducing the scheduling footprint. To check if Soft NUMA is enabled, review the ‘sys.dm_os_sys_info’ DMV and the ‘softnuma_configuration’ column.

Adding this note I researched on EC2

Soft NUMA can work on EC2 instances, but there are some important considerations:

  1. EC2 Instance Types: Soft NUMA is generally relevant for larger instance types, especially those with multiple vCPUs per socket. AWS EC2 instances with a high number of vCPUs (like those in the m5 and r5 families) are likely to see Soft NUMA being enabled. EC2 instances with a smaller number of vCPUs (like t3 or c5 instances) may not benefit as much from Soft NUMA or may not have it enabled by default.
  2. VM Configuration: When using EC2, Soft NUMA may be automatically enabled depending on the number of vCPUs assigned to the instance. If your instance has a high number of vCPUs (over 8), you should check whether Soft NUMA is enabled by querying the sys.dm_os_sys_info DMV.
  3. Performance Considerations: Like with physical or virtual machines, Soft NUMA on EC2 can help SQL Server optimize CPU scheduling. But the performance benefits will depend on the workload. Be sure to test performance after enabling or disabling Soft NUMA, as the results can vary depending on the instance type and workload.
  4. Recommendation: For EC2, it’s important to test both enabling and disabling Soft NUMA to assess the impact on your SQL Server performance. You can also check AWS’s best practices for SQL Server on EC2 to ensure optimal instance configuration.

In summary, Soft NUMA works on EC2 instances with multiple vCPUs, but it’s recommended to monitor and test performance changes to confirm its effectiveness.

SQL Server Instance Configuration

SQL Server instances often have several default configuration settings that are based on legacy expectations, which may not be optimal for modern servers. While some of these defaults are generally acceptable, others should be reviewed and adjusted to ensure optimal performance, especially for modern deployments. The following list outlines important settings that should be reviewed and adjusted as part of the standard configuration for your SQL Server instances, regardless of your high availability (HA) or disaster recovery (DR) architecture.

Instance-Level Configuration Recommendations:

SQL Server has many instance-level configuration settings that should be reviewed and adjusted before making changes to the databases themselves.

1. Cost Threshold of Parallelism (CToP)

The Cost Threshold of Parallelism (CToP) setting controls when SQL Server should execute a query in parallel. Each query in SQL Server is assigned a “cost” based on the estimated difficulty and resource usage.

  • Default CToP value: 5 (this has remained unchanged since SQL Server 7.0).
  • How it works: Queries with a cost lower than the CToP value will run on a single thread. Queries with a cost higher than the CToP value may run in parallel across multiple threads.

However, the default value of 5 is outdated, and it doesn’t align with modern computing capabilities. Over time, computing power and database efficiency have increased, and a low CToP setting can cause SQL Server to run simple, low-cost queries in parallel, which is inefficient. Running simple queries in parallel creates overhead and reduces overall performance, as it takes up CPU resources that could be better used elsewhere.

Recommended Action:

  • Start by setting the CToP value to 25 for modern systems and monitor performance to see if it should be further adjusted as usage increases.
  • Why: A higher CToP value prevents simple queries from executing in parallel unnecessarily, which can lead to better resource management and overall performance.

By adjusting the CToP setting, you ensure that SQL Server makes more intelligent decisions about parallel query execution, leading to better system efficiency and performance.

Database-Level Alerts

Although the SQL Server Agent is enabled, its alerting feature is currently not being utilized on either instance. It is recommended to enable alerts for error numbers 823, 824, and 825, as well as errors with severity levels 16 through 25. This will allow email or Teams channel notifications when critical errors occur.

If other monitoring software is in place, SQL Server Agent alerts can serve as a secondary layer of notification for these important errors. (In real time)

Email Alerts: Due to security concerns, it’s recommended to use a dedicated email account exclusively for SQL alerts. Alternatively, we can consider sending alerts to a Teams channel for better security and faster response, rather than using email alerts.

While tools like DPA (Database Performance Analyzer) are useful, they are not real-time and tend to update every 5 minutes. Enabling instant alerts through SQL Server Agent ensures a more proactive approach to addressing issues.

Example Custom Alerts: For improved monitoring, we could consider setting up custom alerts for long-running queries and blocking situations that exceed 5-10 minutes.

Instance Level Trace Flags

SQL Server utilizes trace flags as configuration settings to enable or disable specific features of the database engine. Enabling certain trace flags can improve performance and operational efficiency. Below are some recommended trace flags, which should be enabled by default, after proper testing:

  • Trace Flag 1117: Ensures data files auto-grow at the same rate, helping reduce load balancing issues in SQL Server. (Note: This is no longer necessary on SQL Server 2016 and above.)
  • Trace Flag 1118: Forces the use of uniform extents instead of mixed extents, minimizing contention in the extent allocation. (Note: This is no longer necessary on SQL Server 2016 and above.)
  • Trace Flag 2371: Improves the efficiency of statistics auto-update.
  • Trace Flag 3226: Removes successful backup entries from the log, as these are already contained within the job history.
  • Trace Flag 4199: Enables query optimizer improvements. (See Note)

These trace flags can be applied at SQL Server service startup through an instance-level startup script. Once configured, the script can be mass deployed, placed in source control, and audited to verify that these changes are present on individual instances.

Important Note: I have implemented these trace flags in a SQL Agent job rather than configuring them in the server’s startup configuration.

Security Considerations:

Some organizational security teams may not approve of using startup stored procedures, as they can potentially allow for the injection of malicious code. Therefore, it is essential to validate the use of these trace flags with your security team to ensure that they are acceptable within your organization’s security policies.

Query Optimizer Fixes Prior to SQL Server 2014

Enabling query optimizer improvements required using trace flag 4199 at the instance level, which applied to the entire server. Since then, Microsoft introduced a database-level option to validate and enable these optimizer fixes. While most users may not notice significant changes, some improvements can enhance efficiency in specific processes. It is recommended to either enable trace flag 4199 on legacy SQL Server instances or use the database-level setting for appropriate databases. If your database is part of an Availability Group, this setting can be enabled or disabled on each replica as needed.

TempDB Configuration:

  • TempDB is typically handled during the SQL Server installation process, but it’s important to review its size settings to avoid performance issues. Ensure that trace flags 1117 and 1118 remain enabled to maintain consistent growth patterns for TempDB.

File Count, File Groups, and File Placement Note: I am more familiar with this setting in VMware, but the alternative for AWS is outlined below.)

When managing SQL Server file placement, it’s important to consider the underlying storage configuration to optimize performance. Both VMware and AWS provide solutions that perform I/O coalescing (queueing) by disk and disk controller, ensuring efficient data access. Here’s how each platform optimizes file placement:

VMware

In VMware environments, the Paravirtual SCSI (PVSCSI) controller is recommended for high-performance I/O operations. This controller enables the fastest possible performance from the underlying platform to the operating system. By using PVSCSI, VMware reduces queueing inside the OS, which improves the overall efficiency of SQL Server’s workload.

Key Considerations:

  • Reduced Queueing: The PVSCSI controller minimizes latency and enhances I/O performance.
  • Workload Balancing: It’s crucial for SQL Server DBAs to monitor and balance the SQL Server workload across the available disk controllers and their respective queues.
  • Diagnostics: DBAs can use the sys.dm_io_virtual_file_stats diagnostic query to monitor workload properties by file, including stall or latency rates for individual file operations.

AWS EBS NVMe Interface

In AWS environments, the EBS NVMe Interface serves as an alternative for high-performance file I/O. This interface is optimized for low-latency access and provides high throughput for workloads that require fast disk performance.

Best Practice:

  • EBS-Optimized: Ensure that EBS-Optimized is enabled (this is typically enabled by default on most instances). This setting ensures that the highest possible dedicated IOPS (Input/Output Operations Per Second) and throughput are available to your EC2 instances.

By configuring EBS-Optimized instances and utilizing NVMe interfaces, you can achieve better performance in AWS compared to standard EBS volumes. Both platforms (VMware and AWS) provide strong performance capabilities, but it’s important to fine-tune your environment based on your specific needs.

Database Owner

The database owner should never be assigned to an individual user. If that user leaves the organization or their account is disabled, the database could go offline since the owner no longer exists or is available. To avoid this, ensure the database owner is set to either the SA (System Administrator) account or a service account that remains active and accessible.

Optimize for Ad-Hoc Workloads

This setting impacts SQL Server’s plan cache behavior. By default, SQL Server saves a query plan in the cache each time a query is executed. If the same query is run again, the plan is retrieved from the cache, saving time and resources. However, if a query is rarely repeated, its cached plan continues to consume memory, potentially bloating the plan cache over time.

Enabling the “Optimize for Ad Hoc Workloads” option addresses this issue. When this setting is active, the first time a query is run, SQL Server doesn’t store the full plan. Instead, only a small “stub” is saved, which takes up minimal memory. If the query is run again, the full plan is cached.

In environments with many ad hoc queries, enabling this option can significantly reduce memory usage and improve the efficiency of the plan cache. There’s virtually no downside to enabling this feature on all SQL Server instances, and it can lead to better memory management without sacrificing performance.

Using TRUNCATE in SQL Replication

Using TRUNCATE in SQL replication can be a great way to quickly clear large tables while keeping their structure intact. Unlike DELETE, TRUNCATE is much faster and uses fewer resources, making it an attractive option for performance optimization. However, it comes with some limitations, especially when foreign key constraints are involved.

Advantages of Using TRUNCATE in SQL Replication:

  • Faster Performance: TRUNCATE is much quicker than DELETE because it doesn’t log individual row deletions, and it consumes fewer system and transaction log resources.
  • Table Structure Preservation: TRUNCATE removes all rows but keeps the table schema intact, meaning you don’t need to reconfigure the table structure for replication.
  • Safety: Since TRUNCATE can be rolled back within a transaction, it adds an extra layer of protection.

Things to Consider:

  • Foreign Key Constraints: TRUNCATE can’t be used if the table has foreign key relationships. However, it’s worth noting that in some cases, you can turn off the foreign key checks temporarily, but this can introduce risks, so it should be done cautiously.

Other Recommendations:

  • Set Max Memory, select compression in case maintenance job isn’t set correctly
  • Routine Database Maintenance: It’s highly recommended to use Ola Hallengren’s maintenance scripts for routine tasks like backups, index optimization, and consistency checks.
  • Naming Conventions: Ensure SQL Agent jobs follow a consistent naming convention. Each job should include a description and be categorized for easier identification (refer to the example in my current Dev environment).

Discover more from SQLyard

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLyard

Subscribe now to keep reading and get access to the full archive.

Continue reading