1. Introduction — What Is Microsoft Fabric and Why Should You Care?
For two decades, building a modern data platform meant assembling a collection of separately licensed, separately configured, separately monitored services. You had Azure Data Factory for ingestion, Azure Data Lake Storage Gen2 for raw storage, Azure Synapse Analytics for the warehouse, Azure Databricks for Spark engineering, Power BI Premium for reporting, and Azure Purview for governance. Each one was excellent at its job. But together they created an operational puzzle that required specialist expertise to wire up, maintain, and govern.[1]
Microsoft Fabric changes that equation. Launched in general availability in November 2023, Fabric is an end-to-end analytics platform delivered as a single SaaS product. It unifies data engineering, data warehousing, real-time analytics, data science, and business intelligence under one roof, all sharing a single logical data lake called OneLake.[2]
Think of it this way: if Azure was a set of best-in-class tools purchased from different shops, Fabric is those same tools redesigned to work together from a single factory. The data doesn’t need to move. The security model is shared. Billing is consolidated. And every engineer, analyst, and data scientist is working on the same copy of the truth.
2. Why Fabric? The Business and Technical Case
Unified Storage — One Copy of the Truth
Every Fabric workload — Spark notebooks, T-SQL queries, Power BI reports, real-time streams — reads from and writes to the same OneLake. There is no copying data between a lake, a staging area, and a warehouse. When a data engineer transforms a Delta table in a notebook, that same table is immediately visible to a SQL developer in the Warehouse and to a Power BI report through Direct Lake.[3]
Open Formats by Default
OneLake stores everything in Delta Parquet format — the same open format used by Apache Spark, Databricks, and the broader data engineering ecosystem. You are never locked into a proprietary binary format that requires migration tools to escape.[3]
SaaS Simplicity
Fabric is delivered as software-as-a-service. You buy capacity (F-SKUs measured in Capacity Units), assign workspaces to that capacity, and start building. There is no infrastructure to provision, no Spark clusters to size, no storage accounts to configure. The platform manages the compute layer automatically.[1]
Lower Total Cost of Ownership
Organizations replacing five or six separately licensed Azure services with a single Fabric capacity often see significant reductions in cost, operational overhead, and time-to-delivery. We covered this in depth in the Fabric Cost Analysis post on SQLYARD.
Microsoft 365 Integration
Fabric is built into the Microsoft 365 trust boundary. Every workspace is backed by Entra ID. Sensitivity labels from Microsoft Purview Information Protection flow automatically through the platform.[4]
3. Why NOT Fabric? Honest Trade-offs
No platform is the right answer for every situation. Here is an honest look at scenarios where Fabric may not be the best choice.
| Scenario | Why Fabric May Not Be the Right Fit | Alternative to Consider |
|---|---|---|
| Heavy custom Spark engineering with fine-grained cluster control | Fabric Spark is managed — you cannot pin specific Spark versions, install arbitrary native libraries, or use custom Docker images. | Azure Databricks |
| Strict multi-cloud or cloud-agnostic requirements | Fabric is Microsoft-first. While OneLake supports S3 shortcuts, the control plane is Azure. | Databricks on AWS/GCP |
| Ultra-low latency OLTP workloads (<5ms response) | Fabric Warehouse and Lakehouse SQL endpoints are analytical engines, not OLTP databases. | Azure SQL Database, SQL Server |
| Legacy SSIS packages you cannot rewrite | Fabric does not run .dtsx files natively. | Azure Data Factory SSIS Integration Runtime |
| Strict on-premises data residency with no cloud allowance | Fabric is cloud-only with fixed regional anchoring per capacity. | SQL Server on-prem with SSAS, SSRS |
| Budget too small for a paid capacity | Trial capacities expire and free tiers are limited. | Azure Synapse Analytics Serverless |
For more on the Databricks vs Fabric decision, the SQLYARD comparison post walks through this in detail.
4. Every Major Fabric Component Explained
Fabric is organized into workloads — logical groupings of related capabilities. Below is every major workload and the key item types within each one.[1]
The single logical data lake for your entire Fabric tenant. All data stored by any Fabric item lives in OneLake in Delta Parquet format. Supports Shortcuts to external data sources (ADLS, S3, Google Cloud Storage, Dataverse) without copying data.
A data store combining a data lake with database capabilities (Delta tables, SQL analytics endpoint). The primary compute surface for data engineers. Supports Spark notebooks, pipelines, and SQL read access via an automatically generated SQL Analytics Endpoint.
A fully transactional T-SQL engine running over OneLake. Supports DML (INSERT, UPDATE, DELETE), DDL, views, stored procedures, and cross-database queries. Ideal for SQL Server DBAs and DW teams.
The orchestration and ingestion engine. Supports 200+ native connectors, Copy Data activities, conditional branching, scheduling, and triggers. The modern replacement for SSIS in Fabric workflows.
A Power Query-based low-code ETL tool. Enables analysts and business users to connect to sources, apply transformations, and land data into Lakehouses or Warehouses without writing code.
Browser-based interactive code environments supporting Python (PySpark), Scala, SQL, and R. Directly attached to a Lakehouse. Support cell-level execution, magic commands, built-in visualization, and MLflow integration.
Formerly called “datasets” in Power BI. A metadata and calculation layer between your data and reports. Can use Direct Lake, Import, or DirectQuery modes. Supports DAX measures, hierarchies, and relationships.
Interactive visualization layer. Built in Power BI Desktop or in-browser. Tightly integrated with Semantic Models and Direct Lake for fast analytics.
Covers KQL Databases, Eventstreams for ingesting streaming data, and the Real-Time Hub. Designed for event-driven workloads — IoT telemetry, log analytics, clickstream data — where freshness matters in seconds.
Notebooks with integrated MLflow experiment tracking, model registry, and scoring. Data scientists work on the same lake data as engineers. Publish predictions back to Delta tables or the Warehouse for BI consumption.
Replicates data from external operational databases (Azure SQL, Cosmos DB, Snowflake, Databricks Unity Catalog) into OneLake in near real-time. The mirrored data is immediately available as Delta tables.
Provides data catalog, lineage tracking, sensitivity labels, and policy enforcement across the entire Fabric tenant. Every Fabric item is automatically catalogued with full data lineage from source to report.
5. OneLake — The Foundation of Everything
OneLake is the most important concept in all of Microsoft Fabric. If you understand OneLake, everything else makes sense.[5]
The “OneDrive for Data” Analogy
Microsoft describes OneLake as “OneDrive for data.”[5] Just as OneDrive gives you a single place for all your documents that every Microsoft 365 app can access, OneLake gives your organization a single place for all your data that every Fabric workload can access. You create one workspace, one Lakehouse, and every tool — Spark, T-SQL, Power BI, Data Science — reads the same files. There is no copying data between systems because there are no separate systems.
Technical Architecture
OneLake is built on Azure Data Lake Storage Gen2.[5] It inherits ADLS Gen2’s hierarchical namespace, security model, and API compatibility. Existing tools that speak ADLS Gen2 — Azure Databricks, Azure Synapse, custom ADLS SDK applications — can connect directly to OneLake without modification.
The storage hierarchy is: Tenant → Workspace → Item (Lakehouse/Warehouse) → Tables or Files.
Delta Parquet — The Universal Format
All tabular data in OneLake is stored in Delta Parquet format. Delta Lake adds a transaction log (_delta_log) on top of Parquet files that enables ACID transactions, versioning, time travel, and schema enforcement. This is why a Spark notebook, a T-SQL query, and a Power BI Direct Lake model can all read the same table without any conversion step.[5]
OneLake Shortcuts
A Shortcut is a reference to data stored somewhere else — another Lakehouse, an external Azure Data Lake, an Amazon S3 bucket, a Google Cloud Storage bucket, or Dataverse.[5] Shortcuts behave like symbolic links. They appear as a folder in your Lakehouse but contain no actual data copy. When Spark or SQL queries a Shortcut, the storage layer transparently fetches the data from the source with intelligent caching to reduce egress costs.
V-Order Optimization
When Fabric writes Delta tables it can apply V-Order — a Microsoft-developed Parquet write-time optimization that sorts, compresses, and encodes data specifically for the VertiPaq engine used by Power BI. V-Order writes are slightly slower but reads are dramatically faster, often eliminating the need for explicit pre-aggregation of large tables.
6. Lakehouse — Where Your Data Lives and Transforms
A Lakehouse in Microsoft Fabric combines the flexibility and scale of a data lake with the querying convenience of a database.[6]
The Two Storage Areas
- Files area: Raw object storage. Land CSV, Parquet, JSON, images, binary blobs — anything. This is your Bronze landing zone. Files here are not queryable via SQL until converted to Delta tables.
- Tables area: Delta tables registered in the metastore. Any Delta table written here is automatically visible to Spark notebooks, SQL queries, and Power BI. This is where Silver and Gold layer data lives.
SQL Analytics Endpoint
Every Lakehouse automatically generates a SQL Analytics Endpoint — a read-only T-SQL connection point that exposes all Delta tables in the Tables area.[6] You can connect SQL Server Management Studio, Azure Data Studio, or any TDS-compatible tool directly to this endpoint and query your lake data without writing a single line of Spark.
7. Data Warehouse — T-SQL at Lake Scale
The Fabric Data Warehouse is a fully transactional T-SQL engine that separates compute from storage and stores data in Delta Parquet format on OneLake.[7] Every table in your Warehouse is also readable by Spark notebooks and other OneLake-connected tools.
Key Capabilities
- Full DML support:
INSERT,UPDATE,DELETE,MERGE - DDL:
CREATE TABLE,CREATE VIEW,CREATE PROCEDURE,CREATE SCHEMA - Cross-database queries against other Warehouses and Lakehouses in the same workspace
- Result set caching for repeated analytical queries
- Data clustering for physical data organization (see SQLYARD post on Data Clustering)
- COPY INTO from OneLake, ADLS, and external storage
- OPENROWSET for ad-hoc file querying without loading
When to Choose Warehouse Over Lakehouse
| Use Case | Lakehouse | Warehouse |
|---|---|---|
| PySpark / Scala transformations | ✅ Primary tool | ❌ Not applicable |
| T-SQL DML (UPDATE/DELETE/MERGE) | ❌ Read-only endpoint | ✅ Full support |
| Stored procedures and business logic in SQL | ❌ Not supported | ✅ Supported |
| Row-level security via SQL roles | Limited | ✅ Full RBAC |
| Serving Gold layer to SQL-heavy BI tools | Acceptable | ✅ Preferred |
| Cross-database join to multiple sources | Via Spark | ✅ Native T-SQL syntax |
8. Pipelines and Data Factory — Orchestrating the Flow
Fabric Pipelines are the orchestration engine for everything that needs to run in sequence, on a schedule, or conditionally. If you have an SSIS or Azure Data Factory background, Pipelines will feel immediately familiar.[8]
Core Pipeline Activities
- Copy Data: Ingests data from 200+ supported connectors (SQL Server, Oracle, Salesforce, REST APIs, SFTP, SAP, SharePoint, etc.) into a Lakehouse or Warehouse.
- Notebook activity: Executes a Fabric Notebook as a pipeline step. Pass parameters to drive dynamic behavior.
- Dataflow Gen2 activity: Runs a Dataflow Gen2 as an orchestrated step.
- Stored Procedure activity: Calls a stored procedure in a Warehouse for post-load transformations or data quality checks.
- Get Metadata / If Condition / ForEach / Until: Control flow activities for conditional branching and loops.
- Fail activity: Explicitly fail a pipeline with a custom error message when a validation check fails.
SSIS vs. Pipelines: The Migration Reality
Fabric does not run .dtsx SSIS packages natively. Three practical migration paths:
- Rebuild as Pipelines + Dataflow Gen2 — the recommended long-term path.
- Lift to Azure Data Factory SSIS Integration Runtime — run existing packages in Azure while gradually rebuilding.
- Hybrid: SSIS lands files to OneLake-connected storage, Fabric pipelines ingest and process from there.
Scheduling and Triggers
Pipelines can be triggered by a schedule, a storage event (a new file lands in OneLake), or called from another pipeline. A common production pattern is a daily scheduled pipeline running the full Bronze → Silver → Gold chain with email notification on failure.
9. Dataflow Gen2 — Low-Code Transformations
Dataflow Gen2 is a Power Query-based low-code ETL tool built into Fabric. Analysts and business users can connect to sources, apply transformation steps using a familiar Excel-like interface, and land results in a Lakehouse or Warehouse — without writing code.[9]
When to Use Dataflow Gen2
- Connecting to Excel, SharePoint lists, or CSV files with cleaning and reshaping requirements
- Flattening nested JSON or XML structures without Spark
- Business analyst-owned transformations that don’t need PySpark
- Rapid prototyping of ingestion logic before formalizing in notebooks
- Teams with Power BI dataflow experience migrating to Fabric
10. Notebooks and Spark — Code-First Engineering
Fabric Notebooks are browser-based interactive coding environments attached to a Lakehouse, running on managed Apache Spark.[10] They support Python (PySpark), Scala, SQL magic commands (%%sql), and R.
PySpark Patterns You Will Use Every Day
# Read a CSV from the Files/bronze area
df_raw = spark.read.option("header", True).option("inferSchema", True).csv("Files/bronze/sales/raw/")
from pyspark.sql.functions import col, trim, to_date, when, lit
df_silver = (
df_raw
.withColumn("order_date", to_date(col("order_date"), "yyyy-MM-dd"))
.withColumn("customer_name", trim(col("customer_name")))
.withColumn("status", when(col("status").isNull(), lit("Unknown")).otherwise(col("status")))
.dropDuplicates(["order_id"])
.filter(col("order_id").isNotNull())
)
df_silver.write.mode("overwrite").format("delta").saveAsTable("silver_sales_orders")
spark.sql("OPTIMIZE silver_sales_orders ZORDER BY (order_date)")
spark.sql("VACUUM silver_sales_orders RETAIN 168 HOURS")
Notebook Parameters
When a Notebook is run from a Pipeline, you can pass parameters to control its behavior — the date partition to process, the source table name, or a processing mode flag. Toggle the Parameters cell marker on the first code cell and declare your variables there.
11. Direct Lake — BI Without Import Refresh
Direct Lake is one of the most technically significant innovations in the entire Fabric platform.[11]
The Problem: Import vs. DirectQuery
- Import mode copies data into Power BI’s VertiPaq engine. Reports are fast but data goes stale between scheduled refreshes. Large datasets hit memory limits.
- DirectQuery mode queries the source live on every report interaction. Always fresh but slow for complex analytical models at scale.
Direct Lake: The Best of Both
Direct Lake is a third mode unique to Fabric. The VertiPaq engine reads Delta Parquet files in OneLake directly — not via a database query, and not via a pre-loaded import. The result is import-speed query performance on always-current lake data, with no scheduled refresh required.[11]
For a full deep dive, see Part 2: Direct Lake in Microsoft Fabric on SQLYARD.
12. Real-Time Intelligence — Streaming and Event Data
For IoT sensors, financial feeds, application logs, and user clickstreams, you need data fresh in seconds. This is where Real-Time Intelligence in Fabric comes in.[12]
Key Real-Time Components
- Eventstream: An ingestion pipeline for streaming data. Connect to Azure Event Hubs, IoT Hub, Kafka, or custom REST sources. Apply real-time transformations and route to a KQL Database or Lakehouse table.
- KQL Database (Kusto): A time-series optimized analytical store supporting Kusto Query Language. Ingest millions of events per second with sub-second query latency.
- Real-Time Hub: A tenant-wide catalog of all streaming data sources. Discover, connect to, and share streams across workspaces without duplicating setup.
- Activator: A rules-based alerting engine. Define conditions on streaming data and trigger Power Automate flows, Teams messages, or pipeline runs automatically.
See Part 3: Real-Time Semantic Models on SQLYARD for deeper coverage.
13. The Medallion Architecture — Deeply Explained
The Medallion Architecture is the most important design pattern for organizing data in Microsoft Fabric. Microsoft recommends it as the standard approach for Fabric lakehouses.[13]
The Core Idea
Data quality degrades at the source and improves with each transformation. The Medallion model formalizes this by dividing your data lake into three distinct quality layers, each named after a precious metal.
Data in its original format exactly as it arrived from the source. No transformation. No cleansing. No schema enforcement. This is your immutable archive and source of truth for reprocessing.
Validated, deduplicated, standardized data. Nulls handled. Data types enforced. Dates normalized. Business keys validated. Cross-source entities resolved to a single key. Delta tables. Still row-level, not yet aggregated.
Aggregated, modeled, and business-contextualized data. Star schema dimension and fact tables. Pre-computed KPIs. Subject-area data marts. Every table in Gold has a clear business owner and a defined refresh SLA.
Why Three Layers Instead of One?
- No recovery path: If the transformation logic had a bug and you only have one layer, the original data is overwritten and gone.
- Schema coupling: Every time the source system changes its schema, the report breaks.
- Mixing concerns: Raw ingestion logic, cleansing rules, and analytical aggregation code all in one place makes it impossible to isolate and fix issues.
- No audit trail: Regulatory requirements often require you to prove what the raw data looked like before transformation.
Three Recommended Physical Patterns
ACID Compliance and the Medallion Model
Delta Lake’s ACID transaction support means each write to a Silver or Gold Delta table is a transaction. If a notebook fails mid-write, the table is not corrupted — the incomplete transaction is rolled back and the previous version remains intact.[13]
Medallion Layer Reference
| Attribute | Bronze | Silver | Gold |
|---|---|---|---|
| Data quality | Raw, unvalidated | Validated, conformed | Aggregated, modeled |
| Format | Files (CSV, Parquet, JSON) or Delta | Delta tables | Delta tables |
| Schema enforcement | None | Enforced | Enforced, curated |
| Who writes it | Pipelines (raw ingest) | Notebooks, Dataflow Gen2 | Notebooks, Warehouse T-SQL |
| Who reads it | Data engineers only | Engineers, data scientists | Analysts, BI reports, business users |
| Reprocessable? | Is the source | Yes, from Bronze | Yes, from Silver |
| Fabric item type | Lakehouse (Files area) | Lakehouse (Tables area) | Lakehouse or Warehouse |
14. Hands-On Workshop — Beginner to Advanced
This workshop builds a complete Medallion lakehouse from scratch. Complete stages in order — each depends on the previous one. You will finish with a working Bronze → Silver → Gold pipeline connected to a Power BI Direct Lake report.
Stage 1 — Beginner: Set Up Your Workspace and Lakehouse
fabric_workshop. Assign a Fabric capacity or select Trial. Click Apply.
lakehouse_bronze, lakehouse_silver, and lakehouse_gold.
lakehouse_bronze. The left pane shows Tables (Delta tables) and Files (raw object storage). The SQL Analytics Endpoint icon in the top bar switches to a T-SQL view of the same Lakehouse.
order_id, order_date, customer_id, customer_name, product, quantity, amount, country. In lakehouse_bronze, click Files → … → New subfolder, create sales/raw. Upload your CSV there.
Stage 2 — Beginner/Intermediate: First Pipeline and Bronze Ingestion
pipeline_bronze_ingest.
lakehouse_bronze → Files/sales/raw/.
Stage 3 — Intermediate: Silver Transformation with PySpark
nb_bronze_to_silver. Add lakehouse_bronze as the default Lakehouse in the left pane.
from pyspark.sql.functions import col, trim, to_date, when, lit, upper
from pyspark.sql.types import DecimalType
bronze_path = "Files/sales/raw/"
df_raw = spark.read.option("header", True).option("inferSchema", True).csv(bronze_path)
print(f"Bronze row count: {df_raw.count()}")
df_silver = (
df_raw
.withColumn("order_date", to_date(col("order_date"), "yyyy-MM-dd"))
.withColumn("customer_name", trim(col("customer_name")))
.withColumn("product", trim(col("product")))
.withColumn("country", upper(trim(col("country"))))
.withColumn("quantity", when(col("quantity").isNull(), lit(0)).otherwise(col("quantity").cast("int")))
.withColumn("amount", col("amount").cast(DecimalType(18,2)))
.dropDuplicates(["order_id"])
.filter(col("order_id").isNotNull())
)
silver_table_path = "abfss://fabric_workshop@onelake.dfs.fabric.microsoft.com/lakehouse_silver.Lakehouse/Tables/silver_sales_orders"
df_silver.write.mode("overwrite").option("overwriteSchema", "true").format("delta").save(silver_table_path)
spark.sql(f"OPTIMIZE delta.`{silver_table_path}` ZORDER BY (order_date, customer_id)")
print("Silver write complete.")
lakehouse_silver. Click the SQL Analytics Endpoint icon and run:
SELECT TOP 10 * FROM silver_sales_orders ORDER BY order_date DESC;
Stage 4 — Intermediate: Gold Layer with Star Schema
nb_silver_to_gold. Add both lakehouse_silver and lakehouse_gold in the left pane.
df_sales = spark.read.format("delta").load(
"abfss://fabric_workshop@onelake.dfs.fabric.microsoft.com/lakehouse_silver.Lakehouse/Tables/silver_sales_orders"
)
gold_base = "abfss://fabric_workshop@onelake.dfs.fabric.microsoft.com/lakehouse_gold.Lakehouse/Tables/"
# Dim Customer
df_dim_customer = df_sales.select("customer_id","customer_name","country").dropDuplicates(["customer_id"]).withColumnRenamed("customer_id","customer_key")
df_dim_customer.write.mode("overwrite").format("delta").save(gold_base + "dim_customer")
# Dim Date
from pyspark.sql.functions import year, month, dayofmonth, date_format, quarter
df_dim_date = (df_sales.select("order_date").dropDuplicates()
.withColumn("year", year("order_date"))
.withColumn("month", month("order_date"))
.withColumn("quarter", quarter("order_date"))
.withColumn("month_name", date_format("order_date","MMMM"))
.withColumn("year_month", date_format("order_date","yyyy-MM")))
df_dim_date.write.mode("overwrite").format("delta").save(gold_base + "dim_date")
# Fact Sales
df_fact_sales = df_sales.select(
col("order_id").cast("string"),
col("order_date"),
col("customer_id").alias("customer_key"),
col("product"),
col("quantity").cast("int"),
col("amount").cast("decimal(18,2)"))
df_fact_sales.write.mode("overwrite").format("delta").save(gold_base + "fact_sales")
spark.sql(f"OPTIMIZE delta.`{gold_base}fact_sales` ZORDER BY (order_date, customer_key)")
print("Gold layer complete.")
Stage 5 — Intermediate/Advanced: Master Pipeline
pipeline_master_etl.
nb_bronze_to_silver, (2) nb_silver_to_gold on success of step 1, (3) optional data quality check. Connect with success arrows so each step only runs if the previous succeeded.
Stage 6 — Advanced: Direct Lake Semantic Model and Power BI
lakehouse_gold SQL Analytics Endpoint. Click New semantic model. Select fact_sales, dim_customer, dim_date. Confirm. Fabric creates the model in Direct Lake mode.
fact_sales[customer_key] → dim_customer[customer_key] (Many-to-One) and fact_sales[order_date] → dim_date[order_date] (Many-to-One).
Total Revenue = SUM(fact_sales[amount])
Order Count = DISTINCTCOUNT(fact_sales[order_id])
Avg Order Value = DIVIDE([Total Revenue], [Order Count])
Revenue YTD = TOTALYTD([Total Revenue], dim_date[order_date])
Stage 7 — Advanced: Incremental Processing and Watermarking
from pyspark.sql.functions import max as spark_max, lit
from datetime import datetime
watermark_table = "abfss://fabric_workshop@onelake.dfs.fabric.microsoft.com/lakehouse_silver.Lakehouse/Tables/etl_watermark"
try:
df_watermark = spark.read.format("delta").load(watermark_table)
last_processed = df_watermark.filter(col("table_name") == "silver_sales_orders").select("last_updated_ts").collect()[0][0]
except:
last_processed = datetime(2000, 1, 1)
print(f"Last processed: {last_processed}")
df_new = spark.read.format("delta").load("abfss://.../lakehouse_bronze.Lakehouse/Tables/bronze_sales").filter(col("ingested_ts") > lit(last_processed))
df_new_silver = transform_to_silver(df_new)
from delta.tables import DeltaTable
silver_dt = DeltaTable.forPath(spark, "abfss://.../lakehouse_silver.Lakehouse/Tables/silver_sales_orders")
silver_dt.alias("target").merge(df_new_silver.alias("source"), "target.order_id = source.order_id").whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
new_max_ts = df_new_silver.select(spark_max("ingested_ts")).collect()[0][0]
spark.createDataFrame([("silver_sales_orders", new_max_ts)], ["table_name","last_updated_ts"]).write.mode("overwrite").format("delta").save(watermark_table)
print(f"Watermark updated to: {new_max_ts}")
15. Governance, Domains, and Security
An analytics platform without governance is a liability. Fabric provides a layered governance model that maps onto how most enterprises are already organized.[4]
Tenant-Level Administration
The Fabric Admin portal controls tenant-wide policies: which workload types are enabled, who can create workspaces, external sharing permissions, and data residency settings. For enterprise deployments, restrict workspace creation to approved groups and require capacity assignment review before any new workspace goes live.
Workspaces and Roles
Every workspace has four built-in roles: Admin, Member, Contributor, and Viewer. For Medallion deployments a common pattern is: data engineering team has Contributor on Bronze and Silver; BI team has Viewer on Silver and Contributor on Gold; data consumers have Viewer on Gold only.
Domains
Domains are a logical grouping mechanism above workspaces. A Finance domain might contain Finance Bronze, Silver, Gold, and Reports workspaces. Domains enable domain-level administrators, sensitivity policies, and discovery in the OneLake Catalog — the foundation of a data mesh architecture in Fabric.
Microsoft Purview Integration
Every Fabric item is automatically catalogued in Microsoft Purview. Sensitivity labels propagate automatically from Lakehouse tables to any Power BI report consuming them. Lineage graphs show exactly how data flows from source through Bronze, Silver, Gold, and into reports.
Row-Level Security
In the Warehouse, RLS is implemented via SQL security predicates — the same mechanism used in SQL Server. In Semantic Models, RLS is defined using DAX filter expressions. Both apply correctly to Direct Lake models.
16. Cost and Capacity Planning
Fabric is billed by Fabric Capacity Units (CUs). Every workload — Spark, Warehouse queries, pipelines, Power BI refreshes — consumes CUs from the assigned capacity.[14]
Key Cost Principles
- Capacity smoothing: Short peaks above capacity are allowed and smoothed over the following 24 hours rather than causing immediate throttling.
- Pause and resume: Pause capacities when not in use — you are not charged for paused capacity. Automate with the Fabric REST API or Azure Automation.
- OneLake storage: Billed separately per GB at low per-TB rates. Storage is rarely the dominant cost.
- Reservation discounts: 1-year and 3-year commitments reduce CU costs by 40–60% vs pay-as-you-go.
| SKU | CUs | Recommended for |
|---|---|---|
| F2 | 2 | Proof of concept, individual developers |
| F4 | 4 | Small team development |
| F8 | 8 | Small production workloads |
| F16 | 16 | Medium production analytics |
| F32 | 32 | Full Direct Lake features enabled |
| F64 | 64 | Enterprise analytics, Direct Lake at full performance |
| F128+ | 128+ | Large-scale enterprise, multiple heavy workloads |
For detailed cost breakdown and right-sizing strategies see the Fabric Cost Analysis post on SQLYARD.
17. Final Thoughts
Microsoft Fabric represents the most significant shift in the Microsoft data platform since Azure launched. It is not an incremental improvement over Synapse or ADF — it is a ground-up rethinking of how analytics infrastructure should work: unified storage, shared compute, open formats, and a single governance model spanning every workload from raw ingestion to published report.
The Medallion Architecture gives Fabric deployments the structural discipline they need to stay maintainable as they grow. Bronze keeps your raw data safe. Silver makes it technically trustworthy. Gold makes it business-useful. When something breaks — and things always break in data — you have a clear recovery path at every layer.
For SQL Server and Azure SQL DBAs, Fabric is not a replacement for your existing skills — it is an extension of them. The Warehouse speaks T-SQL. The SQL Analytics Endpoint in the Lakehouse speaks T-SQL. Direct Lake removes the import/refresh complexity that made large Power BI deployments painful. Your SQL expertise is directly transferable and immediately valuable.
If you are just getting started: create a trial, build a workspace, drop a CSV into a Lakehouse, write three lines of PySpark, and query the result in the SQL Analytics Endpoint. The entire loop takes under 30 minutes and everything you learn scales directly to an enterprise deployment.
18. Fabric Data Agent — Conversational AI Over Your Data
The Fabric Data Agent is one of the most practical AI features in the entire Fabric platform. Announced at Microsoft Ignite 2025 and currently in preview, it lets any user — technical or not — ask plain-English questions about data stored in OneLake and receive accurate, governed answers without writing a single line of SQL, DAX, or KQL.[21]
What Problem Does It Solve?
In most organizations, getting an answer from data still requires either writing a query, building a report, or filing a ticket with the BI team. Executives, operations managers, and sales leads who need quick answers are blocked by the need for technical skills they don’t have. At their core, Fabric data agents are interoperable AI agents that connect your enterprise data to the agentic web — think of them as virtual analysts, each serving as an expert on a specific domain of data. A user can simply ask “What were total sales in California last quarter?” and the agent writes the query, executes it against the right data source, and returns a readable answer with the underlying query visible for transparency.
How It Works
The Fabric data agent uses large language models (LLMs) to help users interact with their data naturally, applying Azure OpenAI Assistant APIs and behaving like an agent. When a question is submitted:
- The agent parses the natural language question
- It identifies the most relevant data source using the user’s own credentials and permissions
- It generates the appropriate query — T-SQL, DAX, or KQL depending on the source
- It executes the query and returns a plain-English answer alongside the query itself
Critically, the agent never bypasses security. It operates entirely within the requesting user’s existing row-level security, column-level security, and workspace permissions. If a user cannot see certain data directly, the agent cannot surface it either.
Supported Data Sources
A Fabric data agent supports up to five data sources in any combination, including lakehouses, warehouses, KQL databases, Power BI semantic models, ontologies, and Microsoft Graph. This means a single agent can answer questions that span your Gold lakehouse, your Warehouse, and your existing Power BI semantic models simultaneously — without the user knowing or caring which system the answer came from.
Setting Up a Data Agent
agent_sales_analytics).
You are a sales analytics assistant for SQLYARD Corp.
You have access to sales orders, customer data, and product tables.
- "Revenue" always means the sum of the amount column in fact_sales
- "Active customer" means a customer with at least one order in the last 90 days
- Always filter to the current fiscal year unless the user specifies otherwise
- Do not return individual customer PII — aggregate only
Integration with Copilot, Teams, and Copilot Studio
Once published, the Data Agent can be surfaced in multiple places. Business users can query it directly from Copilot in Power BI without switching applications. Developers can connect it to Microsoft Copilot Studio to build custom organizational copilots that call the agent as part of a broader workflow. It also supports the Model Context Protocol (MCP), enabling interoperability with other AI agents across the Microsoft ecosystem.
- Maximum five data sources per agent
- Best accuracy with approximately 25 or fewer tables
- Currently supports queries in English only
- Read-only — no write-back or data modification
- Does not perform advanced analytics, machine learning, or causal inference — it retrieves and summarizes structured data
- Agent usage consumes Fabric capacity CUs — monitor usage to avoid throttling other workloads
- Executive KPI summaries without navigating dashboards
- Sales and operations teams getting quick ad-hoc answers
- Reducing BI team ticket volume for simple data questions
- Embedding conversational data access in Teams or internal portals
19. Fabric IQ — The Semantic Intelligence Layer (Preview)
Fabric IQ is the newest and most ambitious addition to the Microsoft Fabric platform. Announced at Microsoft Ignite 2025, Fabric IQ is a unified intelligence platform powered by semantic understanding and agentic AI — and it represents a fundamental shift in what Fabric is designed to do. If the rest of Fabric is about storing, moving, and querying data, Fabric IQ is about giving that data meaning.[22]
The Problem Fabric IQ Solves
Even the best-engineered data platform has a blind spot: it stores data but not meaning. Your Gold lakehouse has a customer_id column, but it doesn’t know that in your business a “customer” is different from a “prospect,” that “revenue” excludes returns, or that an “active asset” means something specific to your operations team. That semantic knowledge lives in people’s heads, in documentation, and in the inconsistent definitions scattered across reports and spreadsheets. Without that grounding, AI cannot reason about cascading effects, constraints, or objectives. It cannot make decisions you can trust because it lacks context, policy, and meaning.
Fabric IQ solves this by giving you a place to formally define that business meaning — and making it available to every tool, report, and AI agent across your Fabric environment.
Key Components of Fabric IQ
Ontology (Preview)
At the core of Fabric IQ is the Ontology item, which introduces the semantic foundation that connects people, processes, systems, actions, rules, and data into a unified ontology. You define your business entities (Customer, Product, Asset, Order), their properties, and the relationships between them. You then bind those entities to actual tables in your Lakehouses, Warehouses, and semantic models. Once bound, every AI agent, report, and query that touches those tables understands what they represent in business terms — not just raw column names.
Fabric Graph
Once your ontology is defined, Fabric automatically builds a navigable Fabric Graph — a visual, queryable representation of how your business entities relate to each other. This is not a diagram tool; it is a live, queryable knowledge graph. AI agents can traverse it to understand multi-hop relationships: a supply chain disruption affecting a supplier affects components, which affects products, which affects orders — the graph makes that chain of reasoning explicit and automated.
Operations Agent (Preview)
Operations Agents in Fabric continuously monitor your business in real time, reason over live conditions, evaluate trade-offs, and automatically take actions to advance desired business outcomes. They use the ontology definitions, rules, and actions as their playbook. For example: an Operations Agent monitoring a logistics network can detect when delivery constraints are violated, evaluate alternative routes against business rules, and trigger the appropriate workflow — without a human in the loop.
Plan (Preview)
Plan removes the need for separate planning tools or spreadsheet-based workflows, allowing organizations to bring goals, plans, and actual results together on shared semantic models. Finance teams can build budgets, forecasts, targets, and scenario models directly on top of governed Fabric data. Projections write back securely and flow immediately through connected Power BI reports — no export to Excel, no version control nightmare, no stale numbers.
How Fabric IQ Connects to Everything Else
Fabric IQ enables the unification of analytical and operational data by combining data from various sources across OneLake — like lakehouses, eventhouses, and Power BI semantic models — into a single consistent model. The ontology sits above all of it as a shared semantic layer. The Data Agent (Section 18) becomes dramatically more accurate when grounded in an ontology — instead of guessing what columns mean from their names, it reads the business definitions you have formally declared.
Who Should Start Exploring Fabric IQ Now?
Fabric IQ is in preview and most suitable for organizations that:
- Already have a mature Medallion lakehouse with clean Gold layer tables
- Are dealing with inconsistent business definitions across teams and reports
- Want to build AI agents that reason about business context, not just query data
- Are planning or budgeting processes still running in disconnected Excel spreadsheets
- Are evaluating agentic AI use cases that require grounded, governed decision-making
For the official documentation, start at Microsoft Learn — What is Fabric IQ? and Microsoft Learn — Fabric Data Agent.
References
- Microsoft Learn — What is Microsoft Fabric?
- Microsoft Fabric Blog — General Availability (November 2023)
- Microsoft Learn — OneLake, the OneDrive for data
- Microsoft Learn — Governance and Compliance in Microsoft Fabric
- Microsoft Learn — OneLake Overview and Architecture
- Microsoft Learn — What is a Lakehouse in Microsoft Fabric?
- Microsoft Learn — What is Data Warehousing in Microsoft Fabric?
- Microsoft Learn — Data Factory in Microsoft Fabric
- Microsoft Learn — Dataflow Gen2 Overview
- Microsoft Learn — Fabric Notebooks
- Microsoft Learn — Direct Lake Overview
- Microsoft Learn — Real-Time Intelligence in Microsoft Fabric
- Microsoft Learn — Implement Medallion Lakehouse Architecture in Fabric
- Microsoft Learn — Microsoft Fabric Licenses and Capacity SKUs
- SQLYARD — Building a Microsoft Fabric Medallion Lakehouse + Warehouse
- SQLYARD — From Ingestion to Insights
- SQLYARD — Fabric Cost Analysis Explained
- Microsoft Learn — Delta Optimization and V-Order in Fabric
- Microsoft Learn — Lakehouse and Warehouse: Better Together
- Microsoft Learn — Lakehouse End-to-End Scenario Tutorial
- Microsoft Learn — Fabric Data Agent Overview
- Microsoft Learn — What is Fabric IQ (Preview)?
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


