Exploratory Data Analysis with Azure Synapse Analytics

Build a Power BI Report on Parquet files with the Serverless SQL Pool

Azure Synapse

Last week, 3th of December, Azure Synapse Analytics was announced to be General Available. Another reason to have some fun with a few of the new features in the Azure Synapse Workspace. Like building a Power BI report with T-SQL without provisioning a ‘traditional’ SQL Server instance and building a data pipeline. 

In this blog, we will be using the Serverless SQL Pool included in the workspace, to create just that. Some of the benefits of the Serverless SQL Pool:

  1. Exploratory Data Analysis with T-SQL on your storage account
  2. Connectivity with tools like SSMS and ADS
  3. You are only charged for the data processed

To know more about the Serverless SQL Pool, be sure to check the official documentation. A link is in the references at the end of this blog.

In the end, I will also provide an example of the ‘create external table as select’ statement. This is extremely useful if you need to reuse the results of your data analysis. It allows you to export the data to your storage account in the desired format. 

Overview of actions

So how did I do this? What steps are needed to build the report:

  1. Provision a Synapse workspace
  2. Have data in a storage account
  3. Create views in the database
  4. Import the views in Power BI desktop

Maybe this is a little blunt, but I promise it won’t be difficult. We will go through the steps to take!

Provision a Synapse workspace

In this case, I used the Azure portal to create a new workspace and used all of the standard settings for creation.

  1. Navigate to the Azure Portal
  2. Click ‘Create a resource’
  3. Type ‘Azure Synapse Analytics’
  4. Fill in the ‘Basics’ tab and click ‘Review and Create’
Create an Azure Synapse Workspace 

Have data in a storage account

This is could be the hardest part. In the examples, I use an export of the AdventureWorksLT sample database in Parquet format. If you do not have parquet files, CSV will do the job as well.

You could place your files in the storage account that is created with the Synapse workspace. However, if you would like to use an existing storage account from the same tenant, that is also possible. Just make sure you assign the right roles to the storage account:

Connect your Storage Account to the Synapse Workspace

In case you are not using the created ADLS Account to read data from, a ‘Linked Service’ needs to be configured to the Storage Account. One way to do this is by navigating to the ‘Manage’ tab on the left side. Choose ‘Linked Services’ and click ‘New’. Choose the appropriate storage type and fill out the details.

Create Linked Service

Authentication via Managed Identity is recommended.

  1. Assign the Synapse Workspace (system assigned) Managed Identity at least the ‘Storage Blob Data Reader’ role on the storage account via the ‘Access control (IAM)’ tab
  2. Assign your own Azure Active Directory user at least the ‘Storage Blob Data Reader’ role.

Being ‘Owner’ or ‘Contributor’ on the storage account is not sufficient. Not having this specific ‘Storage Blob Data’ role will result in permission errors when trying to query the files.

Storage Account Role 

Create views in the database

Before we can create views we need a database to create the views on. To do this, open the Synapse Studio via the Azure Portal

Azure portal Synapse Resource

Navigate to the ‘Data’ tab. Click ‘Add new resource’ and choose ‘SQL Database’. In the options pick ‘Serverless’ for pool type and fill out a fitting name for your database.

Create SQL Database

Now we are ready to create a view in the database. In the example below I wrote a join between ‘SalesOrderHeader’ and ‘SalesOrderDetail’.

CREATE VIEW [dbo].[vFactSalesOrder_Parquet] AS
SELECT SOH.[SalesOrderID]
,SOH.[RevisionNumber]
,SOH.[OrderDate]
,SOH.[DueDate]
,SOH.[ShipDate]
,SOH.[Status]
,SOH.[OnlineOrderFlag]
,SOH.[SalesOrderNumber]
,SOH.[PurchaseOrderNumber]
,SOH.[AccountNumber]
,SOH.[CustomerID]
,SOH.[ShipToAddressID]
,SOH.[BillToAddressID]
,SOH.[ShipMethod]
,SOH.[CreditCardApprovalCode]
,SOH.[SubTotal]
,SOH.[TaxAmt]
,SOH.[Freight]
,SOH.[TotalDue]
,SOH.[Comment]
,SOH.[rowguid]
,SOD.[SalesOrderDetailID]
,SOD.[OrderQty]
,SOD.[ProductID]
,SOD.[UnitPrice]
,SOD.[UnitPriceDiscount]
FROM OPENROWSET(
BULK 'https://<storageAccountName&gt;.blob.core.windows.net/raw/reservoir/adventureworkslt/2020/01/01/SalesLT.salesorderheader_full.parquet',
FORMAT='PARQUET'
) AS SOH
INNER JOIN OPENROWSET(
BULK 'https://<storageAccountName&gt;.blob.core.windows.net/raw/reservoir/adventureworkslt/2020/01/01/SalesLT.salesorderdetail_full.parquet',
FORMAT='PARQUET'
) AS SOD
ON SOH.SalesOrderID = SOD.SalesOrderID
GO

Navigate to the ‘Develop’ tab. Click ‘Add new resource’ and choose ‘SQL Script’. Add and execute your script. To verify if the view works navigate to the ‘Data’ tab, select your create database, views and under ‘Actions’ choose for ‘Select Top 100 rows’.

Select TOP 100 rows

Import the views in Power BI desktop

Almost there! Open Power BI Desktop and choose ‘SQL Server’ as your data source. The server name can be found in the Azure Portal. It will likely be

<synapse-workspace-name>-ondemand.sql.azuresynapse.net

Choose your created view and load the data!

Load data in Power Bi Desktop

Big Datasets

Be aware of the limitations of Import and Direct Query mode. In another example, I made a view of the NYC Taxi Open dataset. And loaded it to Power Bi with the Direct Query mode to prevent the .pbix file from blowing up with the 1.572.000.000 rows.

Power BI Desktop with NYC Taxi Dataset

Create External Table As

Another great feature of the Serverless SQL Pool is the use of the ‘Create External Table As Select’ statement. Basically, it does two things:

  1. It exports the results to Hadoop or Azure Blob Storage
  2. And in parallel, it creates an External Table in your database

To be able to use this statement you need to have a few things in place:

  • A database master key for encryption
  • A database scoped credential for authentication
  • An external data source to store and read the data
  • An external file format to determine how to store the file
Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<PASSWORD>' ;
Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH IDENTITY='SHARED ACCESS SIGNATURE' ,
SECRET = '<SHARED ACCESS SIGNATURE KEY>'
GO
;
Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE SqlOnDemandDemo_ds WITH (
LOCATION = 'https://<storageAccountName&gt;.blob.core.windows.net',
CREDENTIAL = AzureStorageCredential
)
;
CREATE EXTERNAL FILE FORMAT parquetfile1
WITH (
FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
)
;
Create an external table and export the results to the Storage Account
Example is based on AdventureWorks
CREATE EXTERNAL TABLE dbo.FactSalesOrder
WITH
(
LOCATION = '/<storageContainer>/FactSalesOrder',
DATA_SOURCE = SqlOnDemandDemo_ds,
FILE_FORMAT = parquetfile1
)
AS SELECT * FROM [dbo].[vFactSalesOrder]

Wrap Up

So my goal was to show you that it is fairly easy to set up an Azure Synapse Workspace and use the Serverless SQL Pool to explore data in the data lake. The advantages are many, among them:

  • The ability the use T-SQL
  • Connectivity from familiar tools as SSMS and ADS
  • Very cost-efficient as you pay per TB of processed data

References:

Microsoft documentation:

Serverless SQL pool – Azure Synapse Analytics | Microsoft Docs

Set or Change the Column Collation – SQL Server | Microsoft Docs

CREATE EXTERNAL TABLE AS SELECT (Transact-SQL) – SQL Server | Microsoft Docs

Learning links:

Tutorial: Use serverless SQL pool to analyze Azure Open Datasets in Azure Synapse Studio – Azure Synapse Analytics | Microsoft Docs

Realize Integrated Analytical Solutions with Azure Synapse Analytics – Learn | Microsoft Docs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close