After attending the Fabric Community Conference last March 2024 in Vegas, I got even more excited about Fabric Spark 💚
Let’s explore why Fabric Spark stands out compared to Azure Synapse Spark from my point of view. This is not a complete list, just the points that stand out for me!
Fabric vs. Synapse Spark Comparison:
- Source Control:
- Fabric: Currently only supports Azure DevOps Repos. GitHub is on its way.
- Synapse Spark: Supports both Azure DevOps Repos and GitHub.
- Pull Request Review:
- Fabric: Notebook artifacts contain code only with the file extension of the notebook language
- Pyspark as .py and Spark SQL as .sql. This definitely makes our lives easier!
- Synapse Spark: Notebook artifacts in JSON format with metadata. This is not easily readable within a PR file comparison.
- Fabric: Notebook artifacts contain code only with the file extension of the notebook language
- Spark Runtimes:
- Fabric: Public preview 13 with Apache Spark 3.5 and Delta Lake 3.0.
- Fabric is ahead with Spark Runtimes and the compatible Delta Lake version. The MS Product Team promised to increase the frequency of releases (but will never match Databricks).
- Synapse Spark: Public preview 3.4 with Apache Spark 3.4.1 and Delta Lake 2.4.0
- Synapse is lagging behind. This is mainly a problem for Delta Lake versions, to be able to take advantage of new features (i.e. the Merge statement).
- Fabric: Public preview 13 with Apache Spark 3.5 and Delta Lake 3.0.
- Pool Warmup Times:
- Fabric: Starter pools allow notebooks to run in seconds.
- Synapse Spark: Default pools take 4-8 minutes for the notebooks to start.
- Pro Dev Tooling:
- Fabric: Use VS Code and the VS Code Extension for developing locally and running notebooks on a remote Fabric Spark Cluster.
- Synapse Spark: None Available. Browser-side development only; local spark cluster installation required for complex workflows.
- Minimal Nodes:
- Fabric: Allows running notebooks on a single node.
- Fabric allows you to run notebooks on a single node, running multiple notebooks with a small workload, in parallel, will be quicker with limited resources, compared to Synapse.
- Synapse Spark: three-node minimum.
- Fabric: Allows running notebooks on a single node.
- Delta Lake Maintenance:
- Fabric: Autotune automatically adjusts Apache Spark configuration to speed up workload execution and optimize overall performance
- Synapse Spark: Custom optimization only.
These are my main main reasons I want to move my workload to a Fabric Workspace. 💚
Agree or miss something?! Let me know!