Why Fabric Spark is the way to go

After attending the Fabric Community Conference last March 2024 in Vegas, I got even more excited about Fabric Spark 💚

Let’s explore why Fabric Spark stands out compared to Azure Synapse Spark from my point of view. This is not a complete list, just the points that stand out for me!

Fabric vs. Synapse Spark Comparison:

  • Source Control:
    • Fabric: Currently only supports Azure DevOps Repos. GitHub is on its way.
    • Synapse Spark: Supports both Azure DevOps Repos and GitHub.
  • Pull Request Review:
    • Fabric: Notebook artifacts contain code only with the file extension of the notebook language
      • Pyspark as .py and Spark SQL as .sql. This definitely makes our lives easier!
    • Synapse Spark: Notebook artifacts in JSON format with metadata. This is not easily readable within a PR file comparison.
  • Spark Runtimes:
    • Fabric: Public preview 13 with Apache Spark 3.5 and Delta Lake 3.0.
      • Fabric is ahead with Spark Runtimes and the compatible Delta Lake version. The MS Product Team promised to increase the frequency of releases (but will never match Databricks).
    • Synapse Spark: Public preview 3.4 with Apache Spark 3.4.1 and Delta Lake 2.4.0 
      • Synapse is lagging behind. This is mainly a problem for Delta Lake versions, to be able to take advantage of new features (i.e. the Merge statement).
  • Pool Warmup Times:
    • Fabric: Starter pools allow notebooks to run in seconds.
    • Synapse Spark: Default pools take 4-8 minutes for the notebooks to start.
  • Pro Dev Tooling:
    • Fabric: Use VS Code and the VS Code Extension for developing locally and running notebooks on a remote Fabric Spark Cluster.
    • Synapse Spark: None Available. Browser-side development only; local spark cluster installation required for complex workflows.
  • Minimal Nodes:
    • Fabric: Allows running notebooks on a single node.
      • Fabric allows you to run notebooks on a single node, running multiple notebooks with a small workload, in parallel, will be quicker with limited resources, compared to Synapse.
    • Synapse Spark: three-node minimum.
  • Delta Lake Maintenance:
    • Fabric: Autotune automatically adjusts Apache Spark configuration to speed up workload execution and optimize overall performance
    • Synapse Spark: Custom optimization only.

    These are my main main reasons I want to move my workload to a Fabric Workspace. 💚

    Agree or miss something?! Let me know!

    Leave a comment

    search previous next tag category expand menu location phone mail time cart zoom edit close