Part 13 — Security, Governance, Lineage, and Compliance
A modern data warehouse or lakehouse must be secure, governed, traceable, and compliant. Without a strong governance and security framework, data becomes a liability rather than an asset — untrusted, misused, and non-compliant with regulatory requirements that carry real financial and reputational consequences.
This part covers the complete governance stack: identity and access management, role-based and row-level security, column-level masking, data classification and tagging, lineage tracking, audit logging, encryption, and regulatory compliance. Every concept is illustrated with working code across Unity Catalog, Microsoft Purview, Snowflake, BigQuery, and Fabric.
How this connects to the series: Governance is not a final step applied after the platform is built — it is a design constraint that must be considered from the beginning. Access control decisions made in Parts 7 and 8 (who can query Gold tables), lineage from the transformation pipeline in Parts 5 and 6, and the security model applied to the semantic layer in Part 9 all feed into the governance architecture defined here. This part brings it all together into a coherent, auditable framework.
- Role-Based Access Control Across Platforms
- Row-Level Security
- Column-Level Security and Data Masking
1 Governance Principles — Security, Trust, and Compliance Beginner
Data governance is the framework of policies, processes, and technical controls that ensure data is used correctly, protected appropriately, and handled in compliance with regulations. For data engineering teams, governance translates into concrete technical implementations: who can access what, what sensitive data looks like to unauthorized users, where data came from, and who did what with it.
The Five Governance Pillars
- Access control — only authorized users and systems can access the data they need for their role. Enforced through RBAC, RLS, and CLS.
- Data protection — sensitive data is masked, tokenized, or encrypted for unauthorized viewers. PII is never exposed to users who do not have a business need for it.
- Lineage and cataloging — every data asset is documented, every transformation is traceable, and every downstream dependency is known.
- Auditing — every query, every access, every schema change is logged and immutable. When something goes wrong, you can answer “who saw what, and when?”
- Compliance — data handling meets the requirements of applicable regulations: GDPR for EU personal data, HIPAA for US healthcare data, PCI DSS for payment card data, SOC 2 for service organization controls.
Governance applied after the fact is dramatically harder than governance built in from the start. Retrofitting row-level security onto a data warehouse that has been serving unsecured reports for two years means updating every report, every semantic model, and every connection. Design access control, masking policies, and lineage tracking as part of your initial architecture — not as an afterthought when a compliance audit arrives.
2 Identity and Access Management Beginner
Every governance capability depends on a reliable, centralized identity system. When a user queries a Gold table, the warehouse must know who they are, what groups they belong to, and what they are authorized to access. All of this flows from the identity provider (IdP).
| Platform | Identity Provider | Integration Method |
|---|---|---|
| Microsoft Fabric / Azure SQL | Microsoft Entra ID (AAD) | Native — no configuration needed |
| Databricks | Entra ID, Okta, or any SAML 2.0 IdP | SCIM provisioning to Unity Catalog |
| Snowflake | Okta, AAD, any SAML 2.0 IdP | SSO via SAML or OAuth |
| BigQuery | Google Cloud Identity or Workspace | IAM native integration |
| Redshift | AWS IAM, Okta, AAD via federation | IAM role mapping or SAML federation |
| Synapse | Microsoft Entra ID | Native — same as Fabric |
Use groups, not individual users, for access control. Grant permissions to security groups (Data Engineers, Analysts, Finance Team, Regional Managers) rather than to individual users. When someone joins or leaves a team, you update their group membership in the identity provider — all their data permissions follow automatically. Managing permissions per user does not scale and leads to access control drift over time.
3 Role-Based Access Control Across Platforms Intermediate
RBAC assigns permissions to roles, and users inherit permissions through role membership. The goal is the principle of least privilege: every user and system has access to exactly what they need for their function and nothing more. A business analyst needs SELECT on Gold tables. A data engineer needs SELECT and INSERT on Bronze, Silver, and Gold. A pipeline service account needs INSERT on specific target tables only.
Databricks Unity Catalog RBAC
-- Unity Catalog: three-level namespace with granular permission control
-- Permissions cascade: catalog → schema → table → column
-- Environment separation: separate catalogs for dev/test/prod
CREATE CATALOG IF NOT EXISTS prod COMMENT 'Production analytics data';
CREATE CATALOG IF NOT EXISTS dev COMMENT 'Development and testing';
-- Grant catalog-level access to groups (synced from Entra ID / Okta via SCIM)
GRANT USE CATALOG ON CATALOG prod TO `data-engineers`;
GRANT USE CATALOG ON CATALOG prod TO `analysts`;
GRANT USE CATALOG ON CATALOG prod TO `bi-team`;
-- Schema-level access: engineers get full access, analysts read-only on Gold only
GRANT USE SCHEMA, SELECT, MODIFY ON SCHEMA prod.bronze TO `data-engineers`;
GRANT USE SCHEMA, SELECT, MODIFY ON SCHEMA prod.silver TO `data-engineers`;
GRANT USE SCHEMA, SELECT, MODIFY ON SCHEMA prod.gold TO `data-engineers`;
GRANT USE SCHEMA, SELECT ON SCHEMA prod.gold ON CATALOG prod TO `analysts`;
-- Analysts cannot see Bronze or Silver -- only Gold and semantic views
-- Service account for pipeline automation: write access to specific tables only
GRANT USE SCHEMA, SELECT, INSERT, MODIFY ON TABLE prod.gold.dim_customer TO `pipeline-svc`;
GRANT USE SCHEMA, SELECT, INSERT, MODIFY ON TABLE prod.gold.fct_order_line TO `pipeline-svc`;
-- Verify permissions
SHOW GRANTS ON SCHEMA prod.gold;
SHOW GRANTS ON TABLE prod.gold.fct_order_line;
Snowflake RBAC
-- Snowflake: hierarchical role model
-- Best practice: functional roles (analyst_role, engineer_role) granted to user roles
-- Create functional roles
CREATE ROLE IF NOT EXISTS analyst_role;
CREATE ROLE IF NOT EXISTS engineer_role;
CREATE ROLE IF NOT EXISTS pipeline_role;
-- Grant data access to functional roles
GRANT USAGE ON DATABASE analytics TO analyst_role;
GRANT USAGE ON SCHEMA analytics.gold TO analyst_role;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.gold TO analyst_role;
GRANT SELECT ON FUTURE TABLES IN SCHEMA analytics.gold TO analyst_role;
GRANT USAGE ON DATABASE analytics TO engineer_role;
GRANT USAGE ON ALL SCHEMAS IN DATABASE analytics TO engineer_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN DATABASE analytics TO engineer_role;
-- Grant roles to users (or to SSO groups via SCIM)
GRANT ROLE analyst_role TO USER alice;
GRANT ROLE engineer_role TO USER bob;
-- Grant access to virtual warehouses
GRANT USAGE ON WAREHOUSE bi_warehouse TO ROLE analyst_role;
GRANT USAGE ON WAREHOUSE etl_warehouse TO ROLE engineer_role;
-- Verify effective permissions
SHOW GRANTS TO ROLE analyst_role;
SHOW GRANTS TO USER alice;
BigQuery IAM
# BigQuery: IAM roles via Google Cloud IAM
# Applied at project, dataset, or table level
# Dataset-level access via gcloud CLI:
bq add-iam-policy-binding \
--member="group:data-analysts@company.com" \
--role="roles/bigquery.dataViewer" \
project:gold
# Table-level access (more granular):
bq add-iam-policy-binding \
--member="serviceAccount:pipeline@project.iam.gserviceaccount.com" \
--role="roles/bigquery.dataEditor" \
"project:gold.fct_order_line"
# BigQuery predefined roles:
# bigquery.dataViewer -- SELECT only
# bigquery.dataEditor -- SELECT, INSERT, UPDATE, DELETE
# bigquery.admin -- full control including schema management
4 Row-Level Security Intermediate
Row-level security (RLS) restricts which rows a user can see within a table. A regional sales manager sees only rows for their region. A country-level analyst sees only their country’s data. RLS must be enforced at the data layer — not in dashboards or application code — so that every access path respects the security policy.
Snowflake Row Access Policies
-- Snowflake: row access policy using a mapping table
-- More flexible than hard-coded conditions -- users/regions can be managed in SQL
-- Create a policy mapping table: user → allowed regions
CREATE TABLE governance.user_region_access (
username VARCHAR(200) NOT NULL,
region VARCHAR(100) NOT NULL
);
INSERT INTO governance.user_region_access VALUES
('alice@company.com', 'Northeast'),
('alice@company.com', 'Southeast'),
('bob@company.com', 'West');
-- Create the row access policy
CREATE OR REPLACE ROW ACCESS POLICY rls_region
AS (region_column VARCHAR) RETURNS BOOLEAN ->
-- Admins see everything; other users see only their allowed regions
IS_ROLE_IN_SESSION('analyst_admin')
OR EXISTS (
SELECT 1
FROM governance.user_region_access
WHERE username = CURRENT_USER()
AND region = region_column
);
-- Apply policy to the fact table
ALTER TABLE gold.fct_order_line
ADD ROW ACCESS POLICY rls_region ON (region);
-- Now every query on fct_order_line is automatically filtered
-- alice@company.com sees Northeast + Southeast rows only
-- bob@company.com sees West rows only
-- analyst_admin role sees all rows
-- Verify policy is applied
SHOW ROW ACCESS POLICIES IN TABLE gold.fct_order_line;
Databricks Unity Catalog Row Filters
-- Databricks: row filter functions in Unity Catalog
-- Applied at the table level -- cannot be bypassed regardless of query method
-- Create a row filter function
CREATE OR REPLACE FUNCTION prod.governance.filter_by_region(region STRING)
RETURNS BOOLEAN
LANGUAGE SQL
RETURN
IS_MEMBER('data-admin-group') -- admins see everything
OR region = (
SELECT allowed_region
FROM prod.governance.user_region_map
WHERE username = CURRENT_USER()
LIMIT 1
);
-- Apply the row filter to the Gold fact table
ALTER TABLE prod.gold.fct_order_line
SET ROW FILTER prod.governance.filter_by_region ON (region);
-- Verify row filter is active
DESCRIBE EXTENDED prod.gold.fct_order_line;
-- Look for: Row Filter section showing the function and column it applies to
5 Column-Level Security and Data Masking Intermediate
Column-level security (CLS) controls which columns users can see. Data masking goes further — it allows unauthorized users to see a column exists but shows them a masked or replaced value rather than the real one. This is critical for PII (personally identifiable information) like email addresses, phone numbers, Social Security numbers, and payment card data.
Snowflake Dynamic Data Masking
-- Snowflake: dynamic masking policy -- same column, different values per role
-- Authorized users see the real value; others see a masked version
-- Create a masking policy for email addresses
CREATE OR REPLACE MASKING POLICY mask_email
AS (val STRING) RETURNS STRING ->
CASE
-- PII analysts see full email
WHEN IS_ROLE_IN_SESSION('pii_analyst_role') THEN val
-- Other analysts see partial email: first char + *** + @domain.com
WHEN IS_ROLE_IN_SESSION('analyst_role') THEN
CONCAT(SUBSTR(val, 1, 1), '***@', SPLIT_PART(val, '@', 2))
-- All others see completely masked value
ELSE '***@***.***'
END;
-- Apply masking to the email column in the dimension table
ALTER TABLE gold.dim_customer
MODIFY COLUMN email SET MASKING POLICY mask_email;
-- Test masking behavior:
-- As analyst_role: a***@company.com (partial masking)
-- As pii_analyst_role: alice@company.com (real value)
-- As other roles: ***@***.*** (full masking)
Databricks Unity Catalog Column Masks
-- Databricks: column masking via Unity Catalog
-- Same approach as row filters -- function applied at table level
CREATE OR REPLACE FUNCTION prod.governance.mask_email(email STRING)
RETURNS STRING
LANGUAGE SQL
RETURN
CASE
WHEN IS_MEMBER('pii-access-group') THEN email
ELSE CONCAT(SUBSTR(email, 1, 1), '***@', SPLIT_PART(email, '@', 2))
END;
-- Apply column mask to the dimension table
ALTER TABLE prod.gold.dim_customer
ALTER COLUMN email SET MASK prod.governance.mask_email;
-- Column mask applies to all queries on this table regardless of access path
-- Direct SQL, Power BI, Tableau, Spark -- all see the masked value
BigQuery Column-Level Security with Policy Tags
# BigQuery: column-level security via Data Catalog policy tags
# 1. Create a policy tag taxonomy in Data Catalog
# 2. Tag sensitive columns with the appropriate policy tag
# 3. Grant data catalog fine-grained reader role to authorized groups
# Create taxonomy and tags via gcloud:
gcloud data-catalog taxonomies create "PII_Data" \
--location=us \
--description="Tags for personally identifiable information"
gcloud data-catalog taxonomies policy-tags create \
--taxonomy="PII_Data" \
--location=us \
--display-name="EmailAddress" \
--description="Personal email addresses"
# Apply policy tag to column in BigQuery schema (via console or API):
# Column: email in table project.gold.dim_customer
# Tag: PII_Data/EmailAddress
# Grant fine-grained reader to authorized group:
gcloud data-catalog taxonomies policy-tags add-iam-policy-binding \
"PII_Data/EmailAddress" \
--location=us \
--member="group:pii-analysts@company.com" \
--role="roles/datacatalog.categoryFineGrainedReader"
# Without the fine-grained reader role, the column returns NULL for that user
# With the role, the full value is returned
6 Data Classification and Tagging Beginner
Data classification identifies which data is sensitive, what type of sensitivity it represents (PII, PHI, financial, confidential), and what handling rules apply. Tags attach metadata to tables and columns that can be used to enforce policies, generate compliance reports, and communicate data sensitivity to consumers.
Databricks Unity Catalog Tags
-- Databricks: apply tags to schemas, tables, and columns for classification
-- Tag the entire schema as containing PII
ALTER SCHEMA prod.gold
SET TAGS ('data_sensitivity' = 'contains_pii', 'data_domain' = 'customer');
-- Tag specific PII columns in the dimension table
ALTER TABLE prod.gold.dim_customer
ALTER COLUMN email
SET TAGS ('pii_type' = 'email', 'gdpr_subject' = 'true', 'ccpa_subject' = 'true');
ALTER TABLE prod.gold.dim_customer
ALTER COLUMN customer_name
SET TAGS ('pii_type' = 'full_name', 'gdpr_subject' = 'true');
-- Search for all PII columns across the entire Unity Catalog
SELECT
table_catalog,
table_schema,
table_name,
column_name,
tag_name,
tag_value
FROM system.information_schema.column_tags
WHERE tag_name = 'pii_type'
ORDER BY table_catalog, table_schema, table_name, column_name;
Snowflake Object Tagging
-- Snowflake: create and apply governance tags
-- Create a tag for PII classification
CREATE OR REPLACE TAG governance.pii_type
ALLOWED_VALUES 'email', 'full_name', 'phone', 'ssn', 'credit_card', 'address';
-- Apply tags to sensitive columns
ALTER TABLE gold.dim_customer
MODIFY COLUMN email SET TAG governance.pii_type = 'email';
ALTER TABLE gold.dim_customer
MODIFY COLUMN customer_name SET TAG governance.pii_type = 'full_name';
-- Query all tagged PII columns for compliance reporting
SELECT
table_schema,
table_name,
column_name,
TAG_VALUE('governance.pii_type') AS pii_classification
FROM TABLE(INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
'gold.dim_customer', 'table'
))
WHERE TAG_VALUE('governance.pii_type') IS NOT NULL;
7 Lineage Tracking — Where Did the Data Come From? Intermediate
Data lineage answers the questions that every organization faces during an audit, a data quality incident, or a regulatory review: Where did this number come from? Which upstream sources feed this report? If the Bronze CRM data had an error last Tuesday, which Gold tables and dashboards were affected? Without lineage, answering these questions requires days of manual investigation. With lineage, the answer is a graph query.
Types of Lineage
- Table lineage — which tables read from which other tables across transformations
- Column lineage — which source column contributed to which target column
- Pipeline lineage — which pipeline jobs transformed which tables
- Dashboard lineage — which reports and dashboards depend on which semantic models and tables
Databricks Unity Catalog Lineage
-- Databricks: Unity Catalog automatically captures lineage for SQL queries
-- and notebook cell executions -- no manual configuration required
-- View table-level lineage for the Gold fact table
SELECT *
FROM system.access.table_lineage
WHERE target_table_full_name = 'prod.gold.fct_order_line'
ORDER BY event_time DESC
LIMIT 20;
-- Shows: which tables were read to produce fct_order_line, timestamps, job IDs
-- View column-level lineage
SELECT
source_table_full_name,
source_column_name,
target_table_full_name,
target_column_name
FROM system.access.column_lineage
WHERE target_table_full_name = 'prod.gold.fct_order_line'
AND target_column_name = 'extended_amount';
-- Shows: extended_amount came from silver.orders.quantity * silver.orders.unit_price
dbt Lineage Graph
# dbt: automatic lineage from model dependency graph
# Run after dbt compile to generate the lineage manifest
dbt compile --target prod
# Generates target/manifest.json containing full lineage graph
# View lineage in dbt docs:
dbt docs generate --target prod
dbt docs serve
# Opens browser UI showing full DAG: sources → staging → intermediate → marts
# Query lineage programmatically from the manifest:
import json
with open("target/manifest.json") as f:
manifest = json.load(f)
# Find all models that depend on stg_crm_customer
target_model = "model.analytics.stg_crm_customer"
dependents = [
node for node, data in manifest["nodes"].items()
if target_model in data.get("depends_on", {}).get("nodes", [])
]
print("Models downstream of stg_crm_customer:")
for dep in dependents:
print(f" {dep}")
Microsoft Purview Lineage
# Microsoft Purview: enterprise lineage across all Azure data services
# Automatically scans: Azure SQL, Fabric, Synapse, ADF, Power BI
# After connecting Purview to your data sources via the Purview portal:
# Purview → Data map → Sources → Add source → Azure SQL / Fabric / Synapse
# Purview discovers assets and builds lineage automatically from:
# - ADF pipeline activities (Copy Activity shows source → sink)
# - Power BI refresh lineage (which dataset feeds which report)
# - Synapse pipelines
# - Fabric Data Factory pipelines
# Query Purview lineage via REST API:
import requests
purview_account = "your-purview-account"
response = requests.get(
f"https://{purview_account}.purview.azure.com/datamap/api/lineage"
f"/uniqueAttributeByType/azure_sql_table"
f"?typeName=azure_sql_table&qualifiedName=mssql://server/db/schema/table",
headers={"Authorization": f"Bearer {access_token}"}
)
lineage = response.json()
# Returns upstream and downstream lineage for the specified table
8 Audit Logging Intermediate
Audit logs record who accessed what data, when, and from where. They are the evidence layer for compliance frameworks, security incident investigations, and governance audits. All major platforms maintain audit logs automatically — the data engineering team’s responsibility is to ensure logs are retained for the required period and monitored for anomalies.
-- Snowflake: query access history from ACCOUNT_USAGE
-- Available with a 3-hour latency; retained for 365 days on Enterprise+
SELECT
query_id,
user_name,
role_name,
query_type,
query_text,
start_time,
end_time,
bytes_scanned,
rows_produced
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD('day', -7, CURRENT_TIMESTAMP())
AND CONTAINS(LOWER(query_text), 'dim_customer') -- queries touching customer data
ORDER BY start_time DESC
LIMIT 100;
-- Find users who accessed PII tables in the last 30 days
SELECT
user_name,
COUNT(*) AS query_count,
SUM(bytes_scanned) AS bytes_scanned,
MIN(start_time) AS first_access,
MAX(start_time) AS last_access
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
AND CONTAINS(LOWER(query_text), 'email')
GROUP BY user_name
ORDER BY query_count DESC;
-- Databricks: Unity Catalog audit logs via system tables
SELECT
event_time,
user_identity.email AS user_email,
service_name,
action_name,
request_params,
response.status_code AS status
FROM system.access.audit
WHERE event_time >= CURRENT_TIMESTAMP - INTERVAL 7 DAYS
AND action_name IN ('commandSubmit', 'createTable', 'dropTable', 'grantPrivilege')
ORDER BY event_time DESC
LIMIT 100;
-- Detect potential data exfiltration: unusually large queries by individual users
SELECT
user_identity.email AS user_email,
DATE(event_time) AS query_date,
COUNT(*) AS query_count,
SUM(CAST(request_params.bytes_read AS BIGINT)) / POW(1024,3) AS gb_read
FROM system.access.audit
WHERE event_time >= CURRENT_TIMESTAMP - INTERVAL 30 DAYS
AND action_name = 'commandSubmit'
GROUP BY 1, 2
HAVING gb_read > 10 -- flag users reading more than 10 GB in a day
ORDER BY gb_read DESC;
9 Encryption at Rest and in Transit Beginner
All major cloud data platforms encrypt data at rest and in transit by default. The data engineering team’s responsibility is to understand the encryption model used, confirm that encryption is enabled and verified, and for regulated industries, ensure that customer-managed keys (CMK) are used when required by compliance frameworks.
| Platform | At Rest | In Transit | Customer-Managed Keys |
|---|---|---|---|
| Snowflake | AES-256 (default), Tri-Secret Secure (CMK) | TLS 1.2+ | Tri-Secret Secure with AWS KMS / Azure Key Vault |
| Microsoft Fabric / Azure SQL | AES-256 (TDE default) | TLS 1.2+ | Azure Key Vault (BYOK) |
| Databricks | Cloud provider default + DBFS encryption | TLS 1.2+ | AWS KMS / Azure Key Vault / GCP KMS |
| BigQuery | AES-256 (default) | TLS 1.2+ | Cloud KMS CMEK |
| Redshift | AES-256 (optional, enable explicitly) | SSL required | AWS KMS or HSM |
Amazon Redshift encryption is not enabled by default. Unlike other platforms, Redshift requires explicit enablement of encryption at cluster creation time. Enable it for all production clusters — it cannot be enabled on an existing unencrypted cluster without migrating to a new one. This is a common compliance gap found during AWS security audits.
10 Regulatory Compliance Frameworks Intermediate
Regulatory compliance frameworks define what organizations must do to legally handle specific types of data. Understanding which frameworks apply to your organization and what they require technically is essential for designing a compliant data platform.
| Framework | Applies To | Key Technical Requirements |
|---|---|---|
| GDPR | Organizations processing EU residents’ personal data | Right to erasure (delete PII on request), data minimization, purpose limitation, breach notification within 72 hours, data processing records |
| CCPA | Businesses collecting California residents’ personal data | Right to deletion, right to opt out of data sale, disclosure of data categories collected, privacy policy requirements |
| HIPAA | US healthcare organizations and their business associates | PHI encryption at rest and in transit, access controls, audit logs for PHI access, Business Associate Agreements (BAA) with cloud providers |
| PCI DSS | Organizations handling payment card data | Cardholder data encrypted or tokenized, access controls on cardholder data environment, quarterly vulnerability scans, audit logs retained 1 year |
| SOC 2 | Service organizations storing customer data | Security, availability, processing integrity, confidentiality, and privacy controls — audited annually by a third-party CPA |
GDPR Right to Erasure — Technical Implementation
-- GDPR: implement right to erasure (right to be forgotten)
-- When a customer requests deletion of their personal data,
-- all PII must be removed from Bronze, Silver, Gold, and audit logs
-- Step 1: Identify all tables containing the customer's data
-- Use the lineage graph from Section 7 to find all affected tables
-- Step 2: Delete or anonymize PII in each layer
-- Gold: update dimension to anonymize PII (preserve fact rows for analytics)
UPDATE gold.dim_customer
SET
customer_name = 'ANONYMIZED',
email = CONCAT('anon_', customer_natural_key, '@deleted.invalid'),
city = NULL,
state_province = NULL,
country = NULL
WHERE customer_natural_key = :subject_customer_id;
-- Silver: delete the customer record (rebuilds from Bronze on next pipeline run)
DELETE FROM silver.customer
WHERE customer_id = :subject_customer_id;
-- Bronze: anonymize (cannot delete -- this would break pipeline replay audit)
-- Best practice: mark as deleted and anonymize PII fields
UPDATE bronze.crm_customer
SET
raw_data = OBJECT_DELETE(
OBJECT_INSERT(raw_data, 'customer_name', 'ANONYMIZED', true),
'email'
),
gdpr_deleted_at = CURRENT_TIMESTAMP()
WHERE raw_data:id::STRING = :subject_customer_id;
-- Step 3: Log the erasure request for compliance documentation
INSERT INTO governance.erasure_requests (
request_id, subject_id, requested_at, completed_at, tables_affected, status
)
VALUES (
:request_id, :subject_customer_id,
:requested_at, CURRENT_TIMESTAMP(),
'bronze.crm_customer, silver.customer, gold.dim_customer',
'COMPLETED'
);
11 End-to-End Governance Architecture Intermediate
A complete governance system integrates all the components covered in this part into a coherent, auditable framework. This is the reference architecture for a production data platform that can pass compliance audits, respond to security incidents, and maintain stakeholder trust.
| Layer | Component | Implementation |
|---|---|---|
| Identity | Centralized IdP with SSO | Entra ID / Okta + SCIM to Unity Catalog / Snowflake |
| Access Control | Role-based permissions by group | Unity Catalog / Snowflake RBAC on Bronze/Silver/Gold schemas |
| Row Security | Row-level filtering by user attribute | Snowflake row access policies / Databricks row filter functions |
| Column Security | PII masking by role | Snowflake dynamic masking / Databricks column masks / BigQuery policy tags |
| Classification | PII/PHI column tagging | Unity Catalog tags / Snowflake object tags / Purview classification |
| Lineage | End-to-end data lineage | Unity Catalog auto-lineage + dbt manifest + Purview for cross-service |
| Audit | Immutable access logs | Snowflake ACCOUNT_USAGE / Databricks system.access.audit / CloudTrail |
| Encryption | At rest and in transit | Platform defaults + CMK for regulated data (HIPAA, PCI) |
| Compliance | GDPR/CCPA erasure + HIPAA/PCI controls | Erasure request table + anonymization procedures + BAAs with cloud providers |
Start with RBAC and column masking — they deliver the most governance value with the least implementation complexity. Correct RBAC prevents unauthorized access. Column masking protects PII from users who do not need it. Both are platform-native features requiring no additional infrastructure. Once these are in place, add row-level security for multi-tenant scenarios, then invest in lineage and tagging as the platform matures.
12 Workshops
Novice
Apply RBAC and RLS
- Create two roles:
analyst_roleandengineer_role - Grant SELECT on Gold to
analyst_role - Grant full access on all schemas to
engineer_role - Create a table with a region column
- Apply a row access policy restricting rows by current user
- Test: connect as both roles and verify different row sets are returned
Intermediate
Build Lineage with dbt and Purview
- Build dbt models for DimCustomer, DimProduct, and FctOrders
- Run
dbt docs generateand explore the lineage graph - Connect Microsoft Purview to your Azure SQL or Fabric source
- Run a Purview scan and view the discovered assets
- Validate column-level lineage end-to-end: source → Bronze → Silver → Gold
Advanced
Full Governance Implementation
- Enable Unity Catalog and define catalogs, schemas, and tables
- Apply PII tags to email, name, and phone columns
- Create column masking policies showing partial email to analysts
- Apply row filters by region using a security mapping table
- Query
system.access.auditto verify all accesses are logged - Implement the GDPR erasure procedure for a test customer ID
- Verify: anonymized customer still has fact rows but no recoverable PII
References
- Databricks — Unity Catalog Documentation
- Databricks — Unity Catalog Lineage
- Databricks — Unity Catalog Permissions
- Databricks — Row and Column Filters
- Microsoft Purview Documentation
- Microsoft Purview — Lineage Overview
- Microsoft Purview — Data Classification
- Microsoft Fabric — Security Overview
- Snowflake — Access Control Overview
- Snowflake — Row Access Policies
- Snowflake — Dynamic Data Masking
- Snowflake — Tag-Based Masking
- Snowflake — ACCOUNT_USAGE (Audit Logs)
- BigQuery — Column-Level Security
- BigQuery — IAM and Access Control
- AWS KMS Documentation
- Azure Key Vault Documentation
- dbt — Project Lineage
Discover more from SQLYARD
Subscribe to get the latest posts sent to your email.


