Part 1 — Dimensional Modeling and Warehouse Design
Dimensional modeling is the backbone of analytical systems. Whether your platform of choice is Snowflake, Fabric, BigQuery, Databricks, Redshift, Synapse, or a lakehouse built on Delta or Iceberg, dimensional modeling gives you the ability to scale analytics, deliver consistent metrics, simplify downstream transformations, and keep your data warehouse predictable and stable.
This part goes deep into warehouse design, grain definition, facts and dimensions, SCD Type 2, surrogate keys, conformed dimensions, and the bus matrix. Everything you build later in this series depends on the foundation laid here. The code examples work across all major platforms — the schema definitions follow standard SQL that runs on Snowflake, Fabric Warehouse, BigQuery, Redshift, Synapse, and Databricks SQL.
Compatibility: DDL examples use ANSI SQL patterns that work across Snowflake, Fabric Warehouse, BigQuery, Redshift, Synapse, and Databricks SQL. Platform-specific syntax differences are noted inline. The dimensional model you build here becomes your Gold layer in Parts 3, 5, and 6.
1 Why Dimensional Modeling Still Matters Beginner
Modern platforms promote the idea that table formats, compute engines, or storage architectures solve analytical complexity. In reality, architectural decisions only work well when the underlying data model is correct. A poorly modeled Gold layer will be slow, inconsistent, and hard to govern regardless of which platform it runs on.
Dimensional modeling — pioneered by Ralph Kimball and documented in the Kimball Group’s modeling techniques — provides a set of design principles that have proven durable across three decades of platform changes. They remain the standard for analytical data modeling because they solve real problems that alternative approaches do not.
Dimensional modeling provides:
- Predictable join paths that BI tools can navigate efficiently
- Stable business metrics that do not change when source systems change
- Clear grain definitions that prevent double-counting and aggregation errors
- Conformed dimensions that allow unified reporting across subject areas
- Historical accuracy through SCD Type 2 — the ability to answer “what was true at the time?”
- High performance in columnar warehouses — star schemas are optimized for the engines Snowflake, BigQuery, and Fabric use internally
- A structured foundation for semantic layers, dbt models, and BI tools
This part teaches you how to design the dimensional model that sits on top of your Bronze and Silver layers and forms your Gold layer — the analytics-ready data that feeds your warehouse, lakehouse SQL endpoints, and semantic models.
2 Core Concepts: Facts, Dimensions, and Grain Beginner
Dimensional modeling has two main structures: fact tables and dimension tables. Every other concept in this part builds from these two.
Dimensions — Describing Business Entities
Dimensions describe the who, what, where, and when of business events. They contain descriptive attributes and serve as the lookup tables that give context to numeric measurements in fact tables.
- Customer — name, email, city, loyalty tier, segment
- Product — name, category, subcategory, unit cost, brand
- Date — full date, year, quarter, month, week, day of week, fiscal period
- Store — name, region, district, address, store type
- Employee — name, department, manager, hire date, role
Dimensions contain attributes, support historical tracking through SCD Type 2, and always use a surrogate key as the primary key — never the source system’s natural key.
Facts — Measuring Business Events
Facts record measurements about business events. They contain numeric measures and foreign keys that point to dimension tables.
- FctOrderLine — quantity, unit price, extended amount, discount
- FctSubscription — monthly recurring revenue, plan amount, billing cycle
- FctGameSession — session duration, score, events, level reached
- FctWebEvent — page views, clicks, time on page, bounced flag
- FctInvoice — invoice amount, tax, payment amount, days to pay
Facts are narrow by design — they contain numeric measures and foreign keys, not descriptive text. The descriptive context lives in dimensions. This separation is what makes columnar warehouses fast: they scan the narrow fact table and only join to the dimensions they need.
Grain — The Most Important Design Decision
The grain defines exactly what a single row in a fact table represents. It must be defined before writing any transformations, building any warehouse layer, or creating any BI reports. Getting the grain wrong means every aggregation built on top of it is potentially wrong.
Define grain before anything else. Every question about a fact table — which measures belong here, which dimensions it joins to, how rows are inserted — depends on the grain. If two people on your team have different mental models of what a fact row represents, every downstream metric will be inconsistent.
Examples of well-defined grains:
- One row per order line item (not per order — per line item)
- One row per game match round per player
- One row per subscription per billing period
- One row per web event (click, page view, or form submission)
- One row per daily store inventory snapshot per product
3 Designing Dimensions Beginner
Dimensions must be designed with four properties in mind: they must be surrogate-keyed, richly attributed, conformed across fact tables, and capable of tracking history when attributes change over time.
DimCustomer — The Universal Pattern
The schema below is the standard dimensional customer design. It works across Fabric Warehouse, Snowflake, BigQuery, Redshift, Synapse, SQL Server, and Databricks SQL. The same structure becomes a Delta or Iceberg table in your Gold layer on a lakehouse.
-- DimCustomer: standard SCD Type 2 dimension
-- Works across: Snowflake, Fabric Warehouse, BigQuery, Redshift, Synapse, Databricks SQL
CREATE TABLE DimCustomer (
CustomerKey BIGINT NOT NULL, -- surrogate key (IDENTITY/AUTOINCREMENT)
CustomerNaturalKey VARCHAR(100) NOT NULL, -- source system ID (CRM ID, User ID)
FullName VARCHAR(200) NOT NULL,
Email VARCHAR(200),
City VARCHAR(100),
StateProvince VARCHAR(100),
Country VARCHAR(100),
LoyaltyTier VARCHAR(50),
-- SCD Type 2 tracking columns
EffectiveFrom DATE NOT NULL,
EffectiveTo DATE NOT NULL, -- '9999-12-31' = currently active
IsCurrent TINYINT NOT NULL, -- 1 = active, 0 = historical
CONSTRAINT PK_DimCustomer PRIMARY KEY (CustomerKey)
);
Key design decisions explained:
CustomerKey— the surrogate key. Every fact table that references a customer uses this column. Never the natural key. This insulates the warehouse from source system key changes.CustomerNaturalKey— the stable source system identifier. Used during ETL to match incoming records to existing dimension rows.EffectiveFrom/EffectiveTo— the validity range for this version of the record. Active rows haveEffectiveTo = '9999-12-31'.IsCurrent— a convenience flag for joining from fact tables. Always join facts to dimensions usingAND IsCurrent = 1unless you specifically need historical analysis.
DimDate — The Required Dimension
Every fact table joins to a date dimension. Never store a raw date in a fact table and skip the date dimension — you lose the ability to filter by fiscal year, quarter, week, or any calendar attribute without recalculating them on every query.
-- DimDate: calendar dimension
-- Populate once from a date spine generator script
-- Covers every date your warehouse will ever query
CREATE TABLE DimDate (
DateKey INT NOT NULL, -- YYYYMMDD integer key (e.g. 20250115)
FullDate DATE NOT NULL,
DayOfWeek TINYINT NOT NULL, -- 1=Monday through 7=Sunday
DayName VARCHAR(10) NOT NULL,
DayOfMonth TINYINT NOT NULL,
DayOfYear SMALLINT NOT NULL,
WeekOfYear TINYINT NOT NULL,
MonthNumber TINYINT NOT NULL,
MonthName VARCHAR(10) NOT NULL,
Quarter TINYINT NOT NULL,
QuarterName VARCHAR(6) NOT NULL, -- 'Q1', 'Q2', etc.
Year SMALLINT NOT NULL,
IsWeekend TINYINT NOT NULL,
IsHoliday TINYINT NOT NULL,
FiscalYear SMALLINT, -- populate if fiscal calendar differs
FiscalQuarter TINYINT,
CONSTRAINT PK_DimDate PRIMARY KEY (DateKey)
);
Generate DimDate once and forget it. Populate it with every date from your earliest data through at least 10 years into the future. A date spine generator script that inserts one row per calendar day takes seconds to run and should never need to be rebuilt. Keep the surrogate key as an integer in YYYYMMDD format — it is human-readable and extremely fast to join on.
4 Designing Fact Tables Beginner
Facts must always follow the grain and contain only what belongs at that grain: foreign keys to dimensions, numeric measures, and the natural key of the transaction for auditing. No descriptive attributes. No business logic applied to measures inside the fact table itself — that belongs in the semantic layer.
FctOrderLine — Standard Transaction Fact
-- FctOrderLine: one row per order line item
-- Grain: one line item per product per order
-- Foreign keys reference surrogate keys in dimension tables
CREATE TABLE FctOrderLine (
OrderLineKey BIGINT NOT NULL, -- surrogate key
OrderNaturalKey VARCHAR(100) NOT NULL, -- source order ID for audit trail
OrderDateKey INT NOT NULL, -- FK to DimDate
CustomerKey BIGINT NOT NULL, -- FK to DimCustomer
ProductKey BIGINT NOT NULL, -- FK to DimProduct
StoreKey BIGINT NOT NULL, -- FK to DimStore
-- Measures -- numeric values only
Quantity INT NOT NULL,
UnitPrice DECIMAL(18,2) NOT NULL,
DiscountAmount DECIMAL(18,2) NOT NULL DEFAULT 0,
ExtendedAmount DECIMAL(18,2) NOT NULL, -- Quantity * UnitPrice - DiscountAmount
CONSTRAINT PK_FctOrderLine PRIMARY KEY (OrderLineKey),
CONSTRAINT FK_OrderLine_Date FOREIGN KEY (OrderDateKey) REFERENCES DimDate(DateKey),
CONSTRAINT FK_OrderLine_Customer FOREIGN KEY (CustomerKey) REFERENCES DimCustomer(CustomerKey),
CONSTRAINT FK_OrderLine_Product FOREIGN KEY (ProductKey) REFERENCES DimProduct(ProductKey),
CONSTRAINT FK_OrderLine_Store FOREIGN KEY (StoreKey) REFERENCES DimStore(StoreKey)
);
Why fact tables stay narrow. Analytical engines — Snowflake, Fabric Warehouse, BigQuery, Databricks — are columnar. They scan only the columns you select. A narrow fact table with 8–12 columns scans dramatically faster than a wide one with 50+ columns. If you are tempted to add descriptive text to a fact table, that attribute belongs in a dimension instead.
Computing Derived Measures
Keep computed measures simple in the fact table — basic arithmetic only. Complex business logic (gross margin, churn rate, lifetime value) belongs in the semantic layer as DAX measures, LookML measures, or dbt Metrics, not in the Gold table itself. This keeps the Gold layer stable when business rules change.
-- Simple derived measure is acceptable in Gold
-- ExtendedAmount is deterministic arithmetic on stored columns
ExtendedAmount = Quantity * UnitPrice - DiscountAmount
-- Complex business logic belongs in the SEMANTIC LAYER, not Gold
-- Example: gross_margin_pct = (ExtendedAmount - CostAmount) / ExtendedAmount
-- This changes when cost accounting rules change -- keep it in DAX/LookML/dbt Metrics
5 Slowly Changing Dimensions — SCD Type 2 Intermediate
SCD Type 2 is the most important dimension management technique in data warehousing. It tracks historical attribute changes by creating a new row each time a dimension attribute changes, rather than overwriting the existing row. This gives you the ability to answer historical questions accurately — “What was this customer’s loyalty tier at the time they placed this order?” — even years after the attribute changed.
Use SCD Type 2 when you need to answer questions like:
- What was the customer’s region when they first subscribed?
- Which product category did this item belong to at the time of the sale?
- What store district was processing this transaction before the reorganization?
SCD Type 2 MERGE — Works Across All Major Warehouses
-- SCD Type 2 MERGE pattern
-- Close the current row when attributes change, insert a new current row
-- Works on: Snowflake, Fabric Warehouse, BigQuery, Redshift, Synapse
-- (Databricks Delta MERGE covered in Part 6)
MERGE INTO DimCustomer AS tgt
USING StgCustomer AS src
ON tgt.CustomerNaturalKey = src.CustomerID
AND tgt.IsCurrent = 1 -- only match against the current active row
-- When a match exists AND attributes have changed: close the current row
WHEN MATCHED AND (
tgt.FullName <> src.FullName
OR tgt.Email <> src.Email
OR tgt.City <> src.City
OR tgt.StateProvince <> src.StateProvince
OR tgt.LoyaltyTier <> src.LoyaltyTier
)
THEN UPDATE SET
tgt.EffectiveTo = CAST(GETDATE() AS DATE),
tgt.IsCurrent = 0
-- When no match exists (new customer): insert as current row
WHEN NOT MATCHED BY TARGET THEN
INSERT (
CustomerNaturalKey, FullName, Email, City,
StateProvince, Country, LoyaltyTier,
EffectiveFrom, EffectiveTo, IsCurrent
)
VALUES (
src.CustomerID,
src.FullName,
src.Email,
src.City,
src.State,
src.Country,
src.LoyaltyTier,
CAST(GETDATE() AS DATE),
CAST('9999-12-31' AS DATE),
1
);
-- Note: This MERGE closes changed rows but does NOT insert the new version.
-- A second INSERT step is needed to add the new current row after closing the old one.
-- See Part 6 for the complete two-step SCD2 pattern and the Databricks Delta approach.
Why the MERGE only closes rows. Most SQL MERGE implementations cannot both close a row and insert a new version in the same statement because the new row’s surrogate key must be generated after the close. The standard pattern is: MERGE to close changed rows, then INSERT the new current versions. Part 6 covers the complete pattern including surrogate key generation and the Databricks Delta MERGE that handles both steps.
Platform-Specific MERGE References
6 dbt Implementation of SCD Type 2 Intermediate
dbt’s incremental model pattern handles SCD Type 2 in a way that is portable across all supported warehouses. The model identifies changed rows by joining the staging source to the current dimension table, then inserts the new versions. dbt handles the incremental refresh and the unique key enforcement.
-- dbt incremental model for DimCustomer SCD Type 2
-- File: models/gold/dim_customer.sql
{{ config(
materialized = 'incremental',
unique_key = 'customer_natural_key',
on_schema_change = 'sync_all_columns'
) }}
WITH staged AS (
SELECT * FROM {{ ref('stg_crm_customer') }}
),
-- Identify rows that are new or have changed attributes
changes AS (
SELECT s.*
FROM staged s
LEFT JOIN {{ this }} t
ON s.customer_id = t.customer_natural_key
AND t.is_current = 1
WHERE t.customer_natural_key IS NULL -- new customer
OR s.customer_name <> t.full_name -- name changed
OR s.email <> t.email -- email changed
OR s.city <> t.city -- city changed
)
SELECT
{{ dbt_utils.generate_surrogate_key(['customer_id']) }} AS customer_key,
customer_id AS customer_natural_key,
customer_name AS full_name,
email,
city,
state AS state_province,
country,
updated_at AS effective_from,
CAST('9999-12-31' AS DATE) AS effective_to,
1 AS is_current
FROM changes
dbt packages for SCD2: The dbt_utils package provides generate_surrogate_key() which creates a consistent hash-based surrogate key from natural key columns. For full Type 2 with row closing, the dbt snapshots feature handles the complete open/close lifecycle automatically — recommended for production SCD2 implementations. Snapshots are covered in depth in Part 5.
7 Conformed Dimensions Intermediate
A conformed dimension is a dimension table that is shared across multiple fact tables. DimCustomer, DimDate, DimProduct, and DimRegion are examples of dimensions that typically appear in many different subject areas — orders, subscriptions, support tickets, web events. When these dimensions are conformed — meaning they use the same surrogate keys, the same attribute definitions, and the same grain — you can join fact tables from different subject areas through the shared dimension.
Conformed dimensions are what make enterprise reporting possible. Without them, every subject area has its own version of “customer” with different keys, different attribute names, and different historical tracking logic — and unified reporting across subject areas becomes an engineering project rather than a query.
-- Unified reporting across subject areas using conformed dimensions
-- This query combines order revenue with game session counts
-- across the same customer and date dimensions
SELECT
d.MonthName,
d.Year,
SUM(o.ExtendedAmount) AS Revenue,
COUNT(s.SessionKey) AS GameSessions,
COUNT(DISTINCT o.CustomerKey) AS ActiveBuyers
FROM FctOrderLine o
JOIN FctGameSession s ON o.CustomerKey = s.CustomerKey
AND o.OrderDateKey = s.SessionDateKey
JOIN DimDate d ON o.OrderDateKey = d.DateKey
GROUP BY d.MonthName, d.Year
ORDER BY d.Year, MIN(d.MonthNumber);
This query only works correctly if DimDate and DimCustomer are conformed — if both fact tables use the same date keys and the same customer surrogate keys. If each fact table defined its own version of these dimensions, this join would be meaningless or impossible.
8 The Bus Matrix Intermediate
The bus matrix is a planning and documentation tool that maps every fact table to every dimension it joins to. It becomes the architectural blueprint for your warehouse — the single document that shows how all subject areas relate to each other and which dimensions are shared.
Build the bus matrix before writing any DDL. It forces every design decision to be made explicitly and prevents the common mistake of discovering halfway through implementation that two fact tables use incompatible versions of the same dimension.
| Fact Table | Date | Customer | Product | Store | Region | Platform | Arena |
|---|---|---|---|---|---|---|---|
| FctOrderLine | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FctSubscription | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FctGameSession | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FctMatchEvent | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FctWebEvent | ✓ | ✓ | ✓ | ✓ | ✓ |
The checkmarks show which dimensions each fact uses. Every checked dimension must be conformed — same surrogate key, same grain, same attribute definitions — across every fact table that uses it. Kimball’s bus architecture documentation covers this in depth.
9 From Dimensional Model to the Lakehouse Gold Layer Intermediate
The dimensional model you design in this part becomes the blueprint for your Gold layer — regardless of which platform you implement it on. The schemas are identical. Only the physical implementation details differ by platform.
| Platform | Gold Layer Implementation | Notes |
|---|---|---|
| Snowflake | Managed tables with clustering keys | Micro-partitioning handles physical layout automatically. Add cluster keys on large tables. Covered in Part 7. |
| Microsoft Fabric | Delta tables via Lakehouse or Warehouse tables | Lakehouse uses Delta format. Warehouse uses columnstore. Both expose SQL endpoint. Covered in Parts 7 and 8. |
| Databricks | Delta Lake tables in Gold schema | Z-order on join keys. Unity Catalog governs access. ACID transactions on all writes. Covered in Part 8. |
| BigQuery | Partitioned, clustered tables | Partition by date, cluster by customer key. External tables for lake data. Covered in Part 7. |
| Redshift | Distribution and sort key tables | Distribution key on highest-cardinality join column. Sort key on date. Covered in Part 7. |
| AWS Iceberg | Iceberg tables via Glue Catalog | Hidden partitioning, multi-engine access via Athena, EMR, Spark. Covered in Part 8. |
Design once, implement everywhere. The bus matrix, the dimension schemas, the fact grain, and the surrogate key strategy you define here are platform-independent. Parts 7 and 8 show you how to implement the physical details on each platform.
10 Workshops
Novice
Build Your First Star Schema
- Create
DimCustomerwith surrogate key - Generate a
DimDatetable using a calendar script - Create a staging table of orders
- Build
FctOrderLinewith foreign keys - Run a revenue by month query joining both tables
- Confirm grain: one row per order line item
Intermediate
Create a Conformed Dimension
- Build a second fact table (
FctGameSession) - Reuse
DimCustomerandDimDatefrom above - Create
DimPlatformas a new dimension - Draw the bus matrix for both fact tables
- Write a unified query joining both facts through shared dimensions
- Validate row counts and grain consistency
Advanced
Implement SCD Type 2 and Rebuild a Fact
- Add SCD2 columns to
DimCustomer - Introduce historical changes in staging data
- Implement the two-step MERGE and INSERT SCD2 pattern
- Rebuild
FctOrderLinewith correct surrogate key lookups - Run historical queries comparing revenue against changing customer attributes
- Validate that old fact rows still point to the correct historical dimension version
References
- Kimball Group — Dimensional Modeling Techniques
- Kimball Group — Bus Matrix and Conformed Dimensions
- Snowflake — Performance Best Practices
- BigQuery — MERGE Statement
- Snowflake — MERGE Statement
- Microsoft Fabric Warehouse — MERGE
- Databricks — Delta Lake MERGE
- Delta Lake Documentation
- Microsoft Fabric Lakehouse Overview
- dbt — Incremental Models
- dbt Utils — generate_surrogate_key
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


