SQL Server Machine Learning Services: a practical guide for DBAs, developers, and data-warehouse teams

This is a hands-on walkthrough of SQL Server Machine Learning Services (MLS). You will see what it is, when to use it, how to install and secure it on Windows and Linux, and how to build fast, reliable scoring pipelines you can surface in reports. I’ll keep the voice practical and call out tips by role. References are linked throughout.


What is SQL Server Machine Learning Services?

Machine Learning Services lets you run Python and R next to your data using the sp_execute_external_script procedure. You can use open-source packages or Microsoft’s packages for training and scoring, and you do not have to ship large datasets across the network. The feature first appeared as R Services in SQL Server 2016, added Python in 2017, and continues in newer versions. Microsoft Learn+1

Key pieces you will touch:

  • Extensibility platform that hosts external runtimes and brokers data in and out of SQL.
  • SQL Server Launchpad service that spins up isolated worker processes for Python or R.
  • sp_execute_external_script to run code and pass DataFrames to and from T-SQL. Microsoft Learn+1

You can also do native scoring with the T-SQL PREDICT function for ONNX models, which avoids spinning up external runtimes and is ideal for low-latency scoring. Microsoft Learn+1


Why teams use it

  • DBA: keep data in place, schedule and audit ML tasks, and throttle ML jobs with Resource Governor so analytics cannot starve OLTP. Microsoft Learn+1
  • Developer: package business logic as a proc that returns predictions, then bind it to apps, APIs, SSRS RDL, or Power BI.
  • Data warehouse: train or score near your star schema, use columnstore to feed models, and switch to PREDICT for production scoring. Microsoft Learn

Supported platforms and install

Windows

Use the SQL setup UI to add Machine Learning Services (Python and/or R). After install, enable the server option and restart:

EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE;
-- Restart the SQL Server service (Launchpad is restarted with it)

Verify it is on:

EXEC sp_configure 'external scripts enabled';

The Launchpad service isolates external code with Windows AppContainers since SQL 2019. You do not have to change your code for this isolation model. Microsoft Learn+2Microsoft Learn+2

Linux

MLS is supported on major distros for SQL Server 2019 and 2022. Install the ML components, then manage with mssql-conf and systemctl like other SQL features. Microsoft Learn+1

First proof: run Python and R from T-SQL

-- Python: return a simple DataFrame
EXEC sp_execute_external_script
  @language = N'Python',
  @script   = N'import pandas as pd
df = pd.DataFrame({ "hello":["world"], "pi":[3.14159] })
OutputDataSet = df';
-- R: summarize a tiny dataset
EXEC sp_execute_external_script
  @language = N'R',
  @script   = N'df <- data.frame(x=c(1,2,3)); OutputDataSet <- summary(df)';

Both calls go through Launchpad into a sandboxed worker process. The engine streams your input rows to the script and streams result rows back. Microsoft LearnNiels Berglund


Security, permissions, and isolation

  • MLS is off by default. Turn it on with external scripts enabled and restart. Microsoft Learn
  • Grant users least privilege. Non-admins need GRANT EXECUTE ANY EXTERNAL SCRIPT in each database where they run code. Microsoft Learn
  • Launchpad uses worker identities and containers to isolate sessions. Review the MLS security architecture if you have strict controls. Microsoft Learn

Managing packages

You can pre-load libraries per database with CREATE EXTERNAL LIBRARY. For Python, zip the wheel into a .zip before upload. This keeps package versions deterministic for your proc. Microsoft Learn+1

CREATE EXTERNAL LIBRARY [scikit_learn]
FROM (CONTENT = 0x... /* zipped wheel bytes */)
WITH (LANGUAGE = 'Python');

Governance and reliability

Use Resource Governor external pools to cap CPU, memory, and I/O for ML jobs. This is critical on shared servers. Microsoft Learn

-- Example: limit ML jobs to 30% CPU and 4 GB
CREATE EXTERNAL RESOURCE POOL ml_pool
WITH (MAX_CPU_PERCENT = 30, MAX_MEMORY_PERCENT = 20);
ALTER RESOURCE GOVERNOR RECONFIGURE;

-- Map a workload group to ml_pool as needed (see docs)
SELECT * FROM sys.resource_governor_external_resource_pools;

If Launchpad fails to start code, check the service account, temp paths, and access to the working directories. Community posts list common fixes for Launchpad temp locations and permissions. Stack Overflow


Patterns that work in the real world

1) In-database scoring with Python or R (simple and flexible)

Good for ad-hoc analytics, prototypes, data quality checks, and scheduled batch scoring.

-- Score with a Python model stored in a table
DECLARE @model varbinary(max) =
  (SELECT TOP(1) model_bytes FROM dbo.Models WHERE ModelName='churn_v1');

EXEC sp_execute_external_script
  @language = N'Python',
  @script = N'
import pandas as pd, pickle, io
buf = io.BytesIO(model)
clf = pickle.load(buf)
pred = clf.predict_proba(InputDataSet)[:,1]
OutputDataSet = pd.DataFrame({"CustomerID": InputDataSet.CustomerID, "ChurnProb": pred})
',
  @params = N'@model varbinary(max)',
  @model  = @model,
  @input_data_1 = N'SELECT CustomerID, Tenure, ARPU, Tickets FROM dbo.CustomerFeatures';

This is easy to evolve, but it starts an external runtime per call. Use Resource Governor and schedule it off-peak. Microsoft Learn

2) Native scoring with ONNX and PREDICT (fast and scalable)

Train anywhere, convert to ONNX, store the model, then score with pure T-SQL.

-- Simplified: load an ONNX model into a table
INSERT dbo.ONNXModels(ModelName, ModelBytes)
SELECT 'churn_v2', BulkColumn
FROM OPENROWSET(BULK 'C:\models\churn_v2.onnx', SINGLE_BLOB) AS f;

-- Score without launching Python/R
SELECT c.CustomerID,
       PREDICT(MODEL = m.ModelBytes, DATA = c.*) WITH (ChurnProb float) AS p
FROM dbo.CustomerFeatures AS c
CROSS APPLY (SELECT ModelBytes FROM dbo.ONNXModels WHERE ModelName='churn_v2') AS m;

PREDICT eliminates Launchpad overhead and is ideal for real-time or high-throughput workloads. It is available in SQL Server 2017 and later. Microsoft Learn+1

3) Train elsewhere, score in SQL

A common production approach is to train in a dedicated environment (Azure ML, Databricks, or your own Python farm), export to ONNX, and deploy to SQL for scoring. This keeps SQL focused on what it does best while giving you repeatable, low-latency predictions. You can even validate the ONNX path on small edge deployments. Microsoft Learn


Operationalizing: jobs, monitoring, and drift checks

  • Use SQL Agent to schedule model refreshes and scoring jobs.
  • Log metrics to a table and alert if AUC or accuracy drops.
  • Keep your model binaries under source control and version them in a Models table.
  • Patch regularly. CUs often include fixes for the engine and extensibility. Microsoft Learn

Performance tips

  • Push joins, filters, and aggregations to SQL. Give Python/R only the columns and rows they need.
  • Prefer native scoring with PREDICT when latency matters. Save sp_execute_external_script for exploration, training, or exotic packages. Microsoft Learn
  • Partition large feature tables, and consider columnstore when your models read wide fact tables.
  • Cap ML with Resource Governor and watch worker concurrency under Launchpad. You can tune the number of security contexts (containers) if you need more parallel workers. Microsoft Learnsqlchitchat.com

How to show the results to users

Your scoring outputs are just tables or views, so you can surface them anywhere you already report from:

  • Paginated reports (RDL) on SSRS for operational lists, invoices, mailers. Users can filter by date or score band and export to PDF or Excel. Microsoft Learn+1
  • Power BI for interactive visuals or embedded scorecards. If you prefer paginated in the Power BI service, publish RDLs built with Power BI Report Builder. Microsoft Learn

End-to-end example: churn scores in production

  1. Prep
    • Fact table CustomerFeatures with keys and features.
    • Columnstore on the fact if it is large.
  2. Train
    • Train a model in Python outside SQL, convert to ONNX. Store as dbo.ONNXModels. Microsoft Learn
  3. Score
    • Create a view v_ChurnScores that uses PREDICT on the latest model. Microsoft Learn
  4. Serve
    • SSRS RDL lists customers with ChurnProb > 0.7 for retention outreach, scheduled at 7 a.m. daily. Microsoft Learn
  5. Govern
    • External pool ml_pool caps ML to 30% CPU. Track job times and drift metrics in a table. Microsoft Learn

Common pitfalls

  • External scripts are disabled and you forgot to restart after enabling. Check sp_configure. Microsoft Learn
  • Package hell across dev, test, prod. Use CREATE EXTERNAL LIBRARY to pin versions. Microsoft Learn
  • Launchpad temp and permissions cause runtime errors. Verify service identity and temp paths. Stack Overflow
  • Unbounded resource use from large DataFrames. Filter early, batch work, and use Resource Governor. Microsoft Learn

Summary

SQL Server Machine Learning Services brings Python and R to your data so you can build practical analytics without moving rows around. Use sp_execute_external_script for flexible processing and exploration. For production scoring, prefer PREDICT with ONNX because it is fast and simple to operate. Keep your instance safe by granting only the permissions needed and by capping ML with Resource Governor. Then surface the results in SSRS RDL or Power BI, the same way you deliver any other dataset. Microsoft Learn+3Microsoft Learn+3Microsoft Learn+3


Final thoughts

Start with one use case that matters. Build a tiny model, prove the value, and wire it into the same delivery paths your users already trust. Treat models like code: version them, test them, and monitor them. Over time you will shift more analytics to native scoring and keep external runtimes for training and specialty tasks. That balance usually gives you the best performance and the least operational risk. Microsoft Learn


References

  • Overview and docs hub: What is SQL Server Machine Learning Services, documentation home. Microsoft Learn+1
  • Run code: sp_execute_external_script. Microsoft Learn
  • Enable MLS: external scripts enabled, restart note. Microsoft Learn+1
  • Windows isolation: AppContainer model for Launchpad. Microsoft Learn
  • Linux support: install MLS on RHEL/SLES/Ubuntu; supported features for 2019 and 2022. Microsoft Learn+1
  • Native scoring: PREDICT function and native scoring overview. Microsoft Learn+1
  • Packages: CREATE EXTERNAL LIBRARY and package rules. Microsoft Learn+1
  • Resource governance: throttle ML jobs with Resource Governor, external pools. Microsoft Learn+1
  • Show results: SSRS native mode web portal; publish RDL to Power BI service. Microsoft Learn+1

Workshop: Building Your First Predictive Model with SQL Server Machine Learning Services

Objective:
Learn how to enable Machine Learning Services, run a Python model inside SQL Server, and visualize the predictions in Power BI.

Part 1: Environment Setup

  1. Enable external scripts
EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE;

Restart SQL Server and verify:

EXEC sp_configure 'external scripts enabled';

2. Check Launchpad service
Make sure SQL Server Launchpad is running. If not, start it through Services.msc.

3. Verify Python runtime

EXEC sp_execute_external_script
    @language = N'Python',
    @script = N'import sys; print(sys.version)';

Part 2: Prepare Data

Create a simple dataset for demonstration:

CREATE TABLE dbo.CustomerFeatures (
  CustomerID INT IDENTITY(1,1),
  Tenure INT,
  ARPU FLOAT,
  Tickets INT,
  Churn BIT
);

INSERT INTO dbo.CustomerFeatures (Tenure, ARPU, Tickets, Churn)
VALUES
(24, 90.5, 2, 0),
(6, 40.3, 5, 1),
(12, 55.1, 1, 0),
(36, 120.0, 3, 0),
(4, 35.2, 6, 1);

Part 3: Train a Model in Python from SQL

DECLARE @model VARBINARY(MAX);

EXEC sp_execute_external_script
    @language = N'Python',
    @script = N'
from sklearn.linear_model import LogisticRegression
import pandas as pd, pickle, io
model = LogisticRegression()
model.fit(InputDataSet[["Tenure","ARPU","Tickets"]], InputDataSet["Churn"])
buf = io.BytesIO()
pickle.dump(model, buf)
OutputDataSet = pd.DataFrame([{"model": buf.getvalue()}])
',
    @input_data_1 = N'SELECT Tenure, ARPU, Tickets, Churn FROM dbo.CustomerFeatures'
    WITH RESULT SETS ((model VARBINARY(MAX)));

SELECT @model = model FROM #tmp;

Store the model for reuse:

CREATE TABLE dbo.Models (ModelName NVARCHAR(50), ModelBytes VARBINARY(MAX));
INSERT dbo.Models VALUES ('churn_demo', @model);

Part 4: Score New Data

EXEC sp_execute_external_script
    @language = N'Python',
    @script = N'
import pandas as pd, pickle, io
buf = io.BytesIO(model)
clf = pickle.load(buf)
pred = clf.predict_proba(InputDataSet)[:,1]
OutputDataSet = pd.DataFrame({"CustomerID": InputDataSet.CustomerID, "ChurnProb": pred})
',
    @params = N'@model varbinary(max)',
    @model = @model,
    @input_data_1 = N'SELECT CustomerID, Tenure, ARPU, Tickets FROM dbo.CustomerFeatures';

Part 5: Visualize in Power BI

  1. Open Power BI Desktop.
  2. Connect to your SQL Server instance.
  3. Import the dbo.CustomerFeatures table and join it with your prediction results.
  4. Create visuals:
    • Bar chart showing Churn Probability by Tenure
    • Gauge or KPI card for Average Churn Probability

Part 6: Discussion and Extensions

  • Convert this workflow to use ONNX and native scoring with PREDICT.
  • Create an Agent job to refresh predictions weekly.
  • Add drift monitoring by logging accuracy and AUC into a ModelMetrics table.

Outcome:
By the end of this workshop, participants will have a working end-to-end pipeline for training and scoring models directly in SQL Server, along with a Power BI dashboard to share results with stakeholders.


Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading