Introduction
SQL Server 2025 continues expanding its native regular expression capabilities with two highly practical functions:
• REGEXP_SUBSTR() – extract a substring that matches a pattern
• REGEXP_REPLACE() – replace substrings that match a pattern
These functions eliminate years of workarounds involving CHARINDEX, SUBSTRING, PATINDEX, XML methods, and CLR helpers. With SQL Server 2025, you can extract and transform text using concise, standards-aligned RegEx logic right inside the engine.
This article provides a full deep dive into how these functions work, how to use them correctly, and how they apply to real enterprise workloads such as data cleansing, auditing, ETL, and text normalization.
REGEXP_SUBSTR(): Extracting Substrings
Syntax
REGEXP_SUBSTR(expression, pattern [, position [, occurrence [, flags]]])
Parameters:
• expression – source text
• pattern – regular expression
• position – optional, starting character (default 1)
• occurrence – which match to return (default 1)
• flags – RegEx flags such as i (case-insensitive)
Basic Example
Extract the first sequence of digits from a text value.
SELECT REGEXP_SUBSTR('Order#4529 processed', '\d+') AS Extracted;
Output:
4529
Extracting Email Domains
Professional, real-world use case.
SELECT REGEXP_SUBSTR(Email, '@(.+)$') AS Domain
FROM EmployeeDirectory;
For an email john.smith@company.com, the result is:
@company.com
Combined with SUBSTRING or an additional pattern, you can extract just company.com.
Extracting Parts of an Address
Assume standardized formats:
SELECT REGEXP_SUBSTR(Address, '\b[0-9]{5}\b') AS ZipCode
FROM CustomerRecords;
Extracting Specific Occurrence
Extract the second numeric sequence:
SELECT REGEXP_SUBSTR('A12B34C56', '\d+', 1, 2) AS SecondNumber;
Output:
34
Case-Insensitive Substring Extraction
SELECT REGEXP_SUBSTR('DepartmentSales', 'sales', 1, 1, 'i');
REGEXP_REPLACE(): Transforming Text
Syntax
REGEXP_REPLACE(expression, pattern, replacement [, flags])
Typical Uses:
• Data cleansing
• Standardizing formats
• Removing invalid characters
• Masking sensitive data
• Normalizing user input
• Converting patterns into safer or standard values
Replacing Non-Alphanumeric Characters
Enterprise input validation:
SELECT REGEXP_REPLACE(Username, '[^A-Za-z0-9]', '') AS Cleaned
FROM UserAccounts;
Removes punctuation, spaces, and symbols.
Masking Sensitive Data
Mask everything except the last 4 digits.
SELECT REGEXP_REPLACE('123456789', '\d(?=\d{4})', '*') AS Masked;
Output:
Normalize Phone Numbers
Convert formats such as:
• (714) 555-1234
• 714-555-1234
• 714 555 1234
• 714.555.1234
into a consistent form (digits only).
SELECT REGEXP_REPLACE(Phone, '\D', '') AS NormalizedPhone
FROM Contacts;
Advanced Examples
Extract First Name and Last Name
Format: LastName, FirstName
SELECT
REGEXP_SUBSTR(FullName, '^[^,]+') AS LastName,
REGEXP_SUBSTR(FullName, '(?<=,\s).*') AS FirstName
FROM Employees;
Extract Version Numbers
SELECT REGEXP_SUBSTR('SQL2025-Version15.3.9', '\d+(\.\d+)*') AS Version;
Output:
15.3.9
Clean HTML Tags
SELECT REGEXP_REPLACE('<p>Hello World</p>', '<[^>]+>', '') AS CleanText;
Output:
Hello World
Performance Notes
REGEXP_SUBSTR and REGEXP_REPLACE are scalar functions. Their performance characteristics:
• Not SARGable
• Work best with VARCHAR(n) instead of NVARCHAR(MAX)
• Optimizer caches patterns for repeated use
• Can be CPU-heavy on large text columns
• Great for cleansing during ETL or one-time cleanup batches
• Avoid using on every row of a large OLTP workload
If performance becomes a concern:
• Pre-filter with LIKE
• Apply on staging tables
• Normalize data upstream when possible
Workshop: Hands-On RegEx Extraction and Replacement
Step 1. Create sample data
CREATE TABLE StringProcessing2025 (
Id INT IDENTITY,
InputValue VARCHAR(300)
);
INSERT INTO StringProcessing2025 (InputValue)
VALUES
('Invoice # 4529 – Approved'),
('Client: Smith, John'),
('Phone: (714) 555-1234'),
('Order2025-Version15.4'),
('Serial: AB-2025-XYZ');
Step 2. Extract first numeric value
SELECT
Id,
REGEXP_SUBSTR(InputValue, '\d+') AS ExtractedNumber
FROM StringProcessing2025;
Step 3. Extract second occurrence
SELECT
Id,
REGEXP_SUBSTR(InputValue, '\d+', 1, 2)
FROM StringProcessing2025;
Step 4. Replace non-digits
SELECT
Id,
REGEXP_REPLACE(InputValue, '\D', '') AS DigitsOnly
FROM StringProcessing2025;
Step 5. Standardize name format
Convert "Smith, John" → "John Smith":
SELECT
REGEXP_SUBSTR(InputValue, '(?<=,\s).+') + ' ' +
REGEXP_SUBSTR(InputValue, '^[^,]+') AS StandardizedName
FROM StringProcessing2025
WHERE InputValue LIKE 'Client:%';
Step 6. Remove labels or prefixes
SELECT
REGEXP_REPLACE(InputValue, '^[A-Za-z]+:\s*', '') AS Cleaned
FROM StringProcessing2025;
Final Thoughts
REGEXP_SUBSTR and REGEXP_REPLACE significantly upgrade SQL Server’s text processing capabilities. They allow DBAs and developers to handle validation, extraction, normalization, and transformation inside the database using concise and readable code. These features reduce the need for staging ETL code, CLR functions, and external scripting languages.
Tomorrow’s post, Day 8, focuses on REGEXP_INSTR() and REGEXP_COUNT(), which provide position-based searching and match-counting capabilities.
References
• REGEXP_SUBSTR Documentation
https://learn.microsoft.com/sql/t-sql/functions/regexp-substr-transact-sql
• REGEXP_REPLACE Documentation
https://learn.microsoft.com/sql/t-sql/functions/regexp-replace-transact-sql
• Regular Expression Reference
https://learn.microsoft.com/sql/t-sql/functions/regular-expression-reference
• SQL Server 2025 New Features
https://learn.microsoft.com/sql/sql-server/what-s-new-in-sql-server-2025
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


