Data Platforms in Azure, Part 5: Choosing the Right Service

Over the last four parts of this series, we’ve gone deep into Azure’s major data platforms:

  • Cosmos DB (Part 1) — globally distributed NoSQL.
  • Azure Data Lake (Part 2) — scalable storage for analytics.
  • Azure Database for MySQL (Part 3) — managed open-source relational.
  • Azure Database for PostgreSQL (Part 4) — managed Postgres, with Hyperscale.

Now let’s put it all together. Which service do you use, and when?


The Decision Framework

Here’s the big question: What kind of workload are you running?

  1. Transactional (OLTP): Lots of small reads and writes (e.g., ecommerce, SaaS apps).
  2. Analytical (OLAP): Large queries scanning millions of rows, often for BI or ML.
  3. Globally distributed / low-latency: Users all over the world need fast access.
  4. Unstructured / semi-structured data: JSON, logs, telemetry, files.
  5. Hybrid / mixed: Some workloads blend these needs.

Service by Service Breakdown

🔹 Azure SQL Database

  • Best for: Traditional apps written for SQL Server.
  • Pros: Full T-SQL, strong relational features, easy migration from on-prem SQL.
  • Cons: Proprietary; less flexible for JSON/document workloads.
  • Example: A payroll system or CRM built on SQL Server.

🔹 Azure Cosmos DB

  • Best for: Low-latency, globally distributed workloads.
  • Pros: Multi-region writes, five APIs (SQL, Mongo, Cassandra, Gremlin, Table), sub-10ms latency.
  • Cons: RU/s pricing model can be confusing; not ideal for complex joins.
  • Example: A gaming leaderboard with players across continents.

🔹 Azure Data Lake Storage (Gen2)

  • Best for: Big data and analytics pipelines.
  • Pros: Schema-on-read, hierarchical namespace, works seamlessly with Synapse, Databricks, Spark.
  • Cons: Not for OLTP; querying raw files can be slower unless optimized.
  • Example: Storing raw IoT sensor data, then analyzing with Synapse.

🔹 Azure Database for MySQL

  • Best for: Web and mobile apps already using MySQL.
  • Pros: Open source compatibility, managed backups and HA, cost-effective for moderate workloads.
  • Cons: Limited to MySQL feature set, fewer extensions than Postgres.
  • Example: A WordPress site or LAMP stack SaaS app.

🔹 Azure Database for PostgreSQL

  • Best for: Apps needing Postgres features, extensions, or scale-out.
  • Pros: JSONB, PostGIS, Hyperscale (Citus) for sharding, strong standards compliance.
  • Cons: Slightly steeper learning curve; Hyperscale adds complexity.
  • Example: A geospatial app using PostGIS for mapping and routing.

Quick Comparison Table

ServiceTypeStrengthsWeaknessesExample Use Case
SQL DatabaseRelational (PaaS)T-SQL, strong transactional support, easy migration from SQL ServerProprietary, less JSON supportERP or CRM app
Cosmos DBNoSQLGlobal distribution, multi-API, low latencyRU pricing, limited joinsReal-time gaming, IoT
Data LakeStorage + AnalyticsSchema-on-read, cheap, scales to PBs, works with SynapseNot OLTP, slower queries without optimizationRaw log storage, ML training
MySQL (Flexible Server)Relational (OSS)Familiar, cost-effective, managed serviceLimited extensions, single nodeLAMP apps, WordPress
PostgreSQL (Flexible + Hyperscale)Relational (OSS)Rich data types, extensions, horizontal scale with CitusComplexity at scale, learning curveGeospatial apps, SaaS scale-out

Migration Tips

  • From SQL Server: Use Data Migration Assistant → Azure SQL Database or Managed Instance.
  • From MongoDB or Cassandra: Cosmos DB’s Mongo and Cassandra APIs are often drop-in replacements.
  • From on-prem MySQL/Postgres: Use Azure Database Migration Service for minimal downtime.
  • For analytics: Land raw data in Data Lake, then layer Synapse/Databricks on top.

Putting It Together

A real-world example:

Imagine you’re running a youth esports platform (matches, rosters, stats, video replays).

  • Match stats (fast lookups, global players): Cosmos DB (SQL API).
  • Video replays and logs: Data Lake Storage Gen2.
  • Tournament scheduling and billing: Azure SQL Database.
  • Content management system (WordPress site): MySQL Flexible Server.
  • Advanced analytics (skill tracking, ML models): Data Lake + Synapse.

No one service fits all — the trick is to match the workload to the right Azure data platform.


Wrap-Up

Azure’s data platform ecosystem is broad, but not redundant. Each service is optimized for a different type of workload.

  • Choose Cosmos DB for global NoSQL.
  • Choose Data Lake for analytics storage.
  • Choose MySQL/Postgres when you want open source relational.
  • Choose SQL Database if you’re in the Microsoft SQL Server world.

With these five parts, you’ve now got a beginner-to-advanced view of how Azure’s main data services fit together.

In the future, I’ll extend this series with hands-on labs (e.g., connecting Power BI to Data Lake, or building a Hyperscale Postgres cluster) so you can practice what we’ve covered.


References


Discover more from SQLYARD

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from SQLYARD

Subscribe now to keep reading and get access to the full archive.

Continue reading