SQL Server 2025 makes “AI next to the data” practical by adding native vector storage, vector indexing and search, and first class hooks to embedding endpoints like Azure OpenAI through CREATE EXTERNAL MODEL. This lets you build semantic search and Retrieval Augmented Generation (RAG) without standing up a separate vector database. [1] [2]
Below is an end to end walkthrough you can adapt for your own apps.
Prereqs (keep it simple)
You will need:
- SQL Server 2025 (17.x) and SSMS (optionally SSMS 21 for Copilot features). [3] [4]
- An embeddings endpoint:
- Azure OpenAI (recommended for production) using Managed Identity, or
- A local dev endpoint like Ollama. [5]
- Basic familiarity with JSON, REST concepts, and T-SQL. (Matches the module’s prereqs.) [1]
Architecture you are building
- Store documents (or rows of text) in a table.
- Chunk that text for better retrieval using
AI_GENERATE_CHUNKS. [6] - Generate embeddings into a
VECTOR(n)column usingAI_GENERATE_EMBEDDINGSbacked by anEXTERNAL MODEL. [7] [5] - Create a vector index using
CREATE VECTOR INDEX. [8] - Run semantic search:
- Exact scoring with
VECTOR_DISTANCE(no index), or - Approximate nearest neighbor search with
VECTOR_SEARCH(uses the vector index). [9] [10]
- Exact scoring with
- For RAG, retrieve top chunks in SQL, then pass those chunks to your app framework (LangChain or Semantic Kernel) to call the chat model. [1]
Step 1: Create a demo database and tables
This example uses a simple “knowledge base” table. You can swap in product docs, tickets, SOPs, student curriculum notes, anything.
CREATE DATABASE AIDemo;
GO
USE AIDemo;
GO
CREATE TABLE dbo.Documents
(
DocumentId int IDENTITY(1,1) PRIMARY KEY,
Title nvarchar(200) NOT NULL,
Body nvarchar(max) NOT NULL
);
CREATE TABLE dbo.DocumentChunks
(
ChunkId bigint IDENTITY(1,1) PRIMARY KEY,
DocumentId int NOT NULL,
ChunkText nvarchar(max) NOT NULL,
-- Use the embedding dimension your model returns (example: 1536, 1024, 768, etc.)
Embedding vector(1536) NULL,
CONSTRAINT FK_DocumentChunks_Documents
FOREIGN KEY (DocumentId) REFERENCES dbo.Documents(DocumentId)
);
GO
Tip: The vector datatype is native in SQL Server 2025 and stored efficiently while still being easy to work with. [2] [11]
Step 2: Load a little sample data
INSERT dbo.Documents (Title, Body)
VALUES
(N'League Rules', N'Teams register in advance. Match check in closes 30 minutes prior. Good sportsmanship is required.'),
(N'Coaching Notes', N'Focus on crosshair placement, movement discipline, and callouts. Review VODs weekly.'),
(N'Parent FAQ', N'Programs are coach led. Players improve teamwork, leadership, and communication skills.');
GO
Step 3: Chunk your text with AI_GENERATE_CHUNKS
Chunking improves retrieval quality because you search smaller, coherent passages instead of entire documents. SQL Server 2025 provides AI_GENERATE_CHUNKS as a table valued function. [6]
INSERT dbo.DocumentChunks (DocumentId, ChunkText)
SELECT
d.DocumentId,
c.chunk
FROM dbo.Documents AS d
CROSS APPLY AI_GENERATE_CHUNKS
(
source = d.Body,
chunk_type = N'FIXED',
chunk_size = 500,
overlap = 50
) AS c;
GO
Step 4: Register your embeddings model with CREATE EXTERNAL MODEL
This is the “secure integration” step. SQL Server 2025 can store an external model definition and then reference it from T-SQL AI functions. [5]
Option A (recommended): Azure OpenAI with Managed Identity
The docs note that SQL Server 2025 requires Azure Arc connection and primary managed identity enabled for this approach. [5]
Use the Microsoft Learn pattern from CREATE EXTERNAL MODEL for an embeddings model. [5]
Option B (easy dev): Local Ollama embeddings
This is straight from the docs example pattern (adjust URL and model name to your setup). [5]
-- Example dev setup with Ollama embedding endpoint
CREATE EXTERNAL MODEL MyOllamaEmbeddings
AUTHORIZATION dbo
WITH
(
LOCATION = 'https://localhost:11435/api/embed',
API_FORMAT = 'Ollama',
MODEL_TYPE = EMBEDDINGS,
MODEL = 'all-minilm'
);
GO
Step 5: Generate embeddings into your VECTOR column
Now you can generate embeddings directly in SQL with AI_GENERATE_EMBEDDINGS ... USE MODEL .... [7]
UPDATE c
SET Embedding = AI_GENERATE_EMBEDDINGS(c.ChunkText USE MODEL MyOllamaEmbeddings)
FROM dbo.DocumentChunks AS c
WHERE c.Embedding IS NULL;
GO
If your endpoint needs parameters (like dimensions), the function supports passing JSON parameters as shown in the docs. [7]
Step 6: Create a vector index for fast ANN search
CREATE VECTOR INDEX builds an approximate nearest neighbor index so VECTOR_SEARCH can avoid scanning everything. The official docs show DiskANN and metric options like COSINE and DOT. [8]
CREATE VECTOR INDEX IX_DocumentChunks_Embedding
ON dbo.DocumentChunks (Embedding)
WITH (METRIC = 'COSINE', TYPE = 'DISKANN');
GO
To verify the index exists:
SELECT *
FROM sys.vector_indexes
WHERE object_id = OBJECT_ID('dbo.DocumentChunks');
GO
sys.vector_indexes is the catalog view for vector indexes. [12]
Step 7: Semantic search (exact vs indexed)
Exact scoring (no index): VECTOR_DISTANCE
VECTOR_DISTANCE is exact and does not use the vector index. It is great for validation, small datasets, or reranking a small candidate set. [9]
DECLARE @queryText nvarchar(max) = N'How do teams check in for matches?';
DECLARE @queryEmbedding vector(1536) =
AI_GENERATE_EMBEDDINGS(@queryText USE MODEL MyOllamaEmbeddings);
SELECT TOP (5)
c.DocumentId,
c.ChunkText,
VECTOR_DISTANCE('COSINE', c.Embedding, @queryEmbedding) AS Distance
FROM dbo.DocumentChunks AS c
ORDER BY Distance ASC;
GO
Fast ANN search (uses index): VECTOR_SEARCH
VECTOR_SEARCH is the function designed to use vector indexes for approximate nearest neighbor search. [10] [9]
DECLARE @queryText nvarchar(max) = N'How do teams check in for matches?';
DECLARE @queryEmbedding vector(1536) =
AI_GENERATE_EMBEDDINGS(@queryText USE MODEL MyOllamaEmbeddings);
SELECT TOP (5)
c.DocumentId,
c.ChunkText,
s.distance
FROM VECTOR_SEARCH
(
TABLE = dbo.DocumentChunks,
COLUMN = Embedding,
QUERY = @queryEmbedding,
TOP_N = 5
) AS s
JOIN dbo.DocumentChunks AS c
ON c.ChunkId = s.key
ORDER BY s.distance ASC;
GO
The exact argument names can vary by doc version, so always align your call to the current VECTOR_SEARCH reference page. [10]
Step 8: Turn semantic search into RAG
RAG is just two steps:
- Retrieve the best chunks (you already did that with
VECTOR_SEARCH). - Provide those chunks as context to a chat model in your app layer (LangChain or Semantic Kernel), then ask your question. The Microsoft Learn module explicitly calls out building RAG apps using T-SQL plus frameworks like LangChain and Semantic Kernel. [1]
Pattern: a stored procedure that returns “context”
CREATE OR ALTER PROCEDURE dbo.GetRagContext
(
@Question nvarchar(max),
@TopN int = 5
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @q vector(1536) =
AI_GENERATE_EMBEDDINGS(@Question USE MODEL MyOllamaEmbeddings);
SELECT TOP (@TopN)
c.DocumentId,
c.ChunkText,
s.distance
FROM VECTOR_SEARCH
(
TABLE = dbo.DocumentChunks,
COLUMN = Embedding,
QUERY = @q,
TOP_N = @TopN
) AS s
JOIN dbo.DocumentChunks AS c
ON c.ChunkId = s.key
ORDER BY s.distance ASC;
END
GO
Then in your app (LangChain or Semantic Kernel)
High level flow:
- Call
dbo.GetRagContextwith the user question. - Concatenate returned
ChunkTextinto a context block. - Send a prompt like: “Answer using only this context…”.
- Call Azure OpenAI chat completion (or your chosen model).
That keeps SQL Server focused on retrieval and your app focused on generation, which is typically the cleanest production split. [1]
Step 9: Build “intelligent apps” with streaming changes (Change Event Streaming)
If your knowledge base changes frequently, you can stream table changes out to event consumers (for example, to trigger re-embedding, refresh caches, or update downstream systems). SQL Server 2025 introduces Change Event Streaming (preview), and it’s configured to publish change events to Azure Event Hubs. [13] [14]
At a high level, Microsoft’s configuration steps are:
- Create an Event Hubs namespace and event hub.
- Enable CES on the database.
- Create an event stream group that defines destination and credentials.
- Add tables to the group. [14]
Step 10: Optional analytics path with Microsoft Fabric mirroring
If you want near real time analytics without building fragile ETL, Fabric Mirroring can continuously replicate SQL Server 2025 data into Fabric. It requires Azure Arc for SQL Server 2025, and Microsoft provides a tutorial with the required permissions and setup flow. [15] [16]
Step 11: Use Copilot in SSMS (optional but helpful)
Copilot in SSMS can help you draft and refactor T-SQL, and answer questions about your database inside SSMS. The overview and install steps are documented by Microsoft. [3] [4]
References
[1] Microsoft Learn training module: “Build AI powered solutions using SQL Server 2025”.
[2] Microsoft Learn: Vector data type (overview).
[3] Microsoft Learn: Copilot in SQL Server Management Studio overview.
[4] Microsoft Learn: Install Copilot in SSMS.
[5] Microsoft Learn: CREATE EXTERNAL MODEL (Transact-SQL) examples including Azure OpenAI (Managed Identity) and Ollama.
[6] Microsoft Learn: AI_GENERATE_CHUNKS (Transact-SQL).
[7] Microsoft Learn: AI_GENERATE_EMBEDDINGS (Transact-SQL) examples and patterns.
[8] Microsoft Learn: CREATE VECTOR INDEX (Transact-SQL).
[9] Microsoft Learn: VECTOR_DISTANCE (Transact-SQL) behavior and index note.
[10] Microsoft Learn: VECTOR_SEARCH (Transact-SQL) purpose and ANN behavior.
[11] Microsoft Learn: Vector search and vector indexes overview.
[12] Microsoft Learn: sys.vector_indexes catalog view.
[13] Microsoft Learn: Change event streaming overview (preview).
[14] Microsoft Learn: Configure change event streaming steps.
[15] Microsoft Learn: Fabric mirroring from SQL Server (requirements and notes).
[16] Microsoft Learn: Tutorial to configure Fabric Mirroring from SQL Server.
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


