In today’s blogpost, we will explore a near real-time Lakehouse data model in Power BI with Dataverse as data source. We seek real-time data in Power BI reports from Dataverse. We have Synapse Link for Dataverse in Power Platform, and Serverless SQL Pool is available in Azure Synapse. So, can we create near real-time Power BI reports from Dataverse with Synapse Link and with Serverless SQL Pool? Here is a model we will follow to answer the question.
First, we use Synapse Link and store raw data in Azure Data Lake gen2. Secondly, we ingest data to Serverless SQL Pool and create External Tables in Azure Synapse. Finally, we connect the database with Azure Synapse Analytics SQL connector in Power Query and create a data model in Power BI.
We followed the steps below and developed a Lakehouse data model in Power BI:
- Configure data export from Dataverse to Azure Data Lake Gen2 with Synapse Link
- Create external tables in Serverless SQL Pool in Azure Synapse
- Connect to the tables above from Power Query with SQL server direct query mode
- Create a dataset in Power BI
- Visualize data in real-time in Power BI
Demo environment information:
- Power Platform free trial with Dynamics 365 Customer Engagement license
- Office 365 E5 license (includes Power BI Pro license) and Premium Per User license free trial
- Azure subscription for storage account and for Azure Synapse
Synapse Link in Dataverse allows exporting data from Dataverse to Azure Data Lake. Configuration is a few steps to complete in Power Platform. We can select tables in Dataverse to export data and set partitions by month or by year. Data from Dataverse is exported in a folder structure with csv files in Data Lake gen2. We can attach a storage account to a workspace in Azure Synapse.
Trickle Feed service in Azure Data Lake with Synapse Link provides near-real time data refresh from Dataverse to Data Lake. This is a great advantage to source near-real time data when connecting from Power BI Power Query in Direct Query mode. However, there is not yet an out-of-the-box direct query connector in Power Query to access Azure Data Lake gen2 as of writing (December 2022). Resource: How to configure Synapse Link
Observation in Serverless SQL Pool
We create an external table in Serverless SQL Pool with csv files in Azure Data Lake Synapse Link including account table. Serverless SQL Pool allows creating views from external tables but it does not allow creating table by joining with other tables as of writing this blog (December 2022). This would provide advantages when designing datasets for different analytical reports. We can connect tables and views with Azure Synapse Analytics SQL connector and with Direct Query mode in Power BI.
A challenge we experience is in defining schema of the account table in Synapse link. First, we need to define schema in SQL script when creating an external table in Serverless SQL Pool. The data types accepted by Synapse are different from Dataverse-defined data types. Secondly, we have header information in model.json file but the csv files do not include the header information. Therefore, it was challenging to map the header information and the sequence of columns from the model.json file to the columns in the SQL script. Lastly, the number of account columns was 290+. One mistake in the sequence and/or in selection of data types to the columns led to errors.
Creating an account table in Lake Database guides us how to configure with no code; however, we learned that it is still necessary to map the header information with the model.json file and define the data types in the Lake Database model.
We created a dataset with account table in Power BI after connecting the external table with a Synapse connector in direct query mode. We created a simple report and published to a workspace in Power BI Service. We observed that data was updated within a minute in a csv account file in Data Lake as well as the account external table in Synapse Serverless SQL Pool. However, we do not see near real-time data update in the Power BI report. We learned that it requires data refresh in the account dataset in Power BI Service as well as automatic page refresh. Therefore, how to achieve the near real-time reports is dependent on the type of Power BI licenses. Pro license allows minimum dataset refresh every 30 minutes but automatic page refresh is not supported. It requires premium license (premium per user or dedicated) for automatic page refresh. It is possible to set a minimum page refresh every few seconds with premium license, but need to consider optimal performance and justify the costs. Resource: Automatic page refresh in Power BI
Meeting the goal?
We can achieve near real-time data model with Serverless SQL Pool External Tables as data source in direct query mode in Power BI from Dataverse with premium license. We should always seek the best performance model within reasonable costs. There are pros and cons when choosing Serverless SQL Pool for the data model. Here’s an example of cost consideration when choosing Serverless SQL Pool with direct query mode:
Cost impact on the data model
SQL Serverless pool provides high performance in querying data. It is great in direct query mode in Power BI. On the other hand, SQL Serverless Pool could become quite expensive. SQL Serverless pool costs per CPU usage (DTU). It costs around $5 USD per terabyte of processing data and usage as of writing (December 2022). Every direct query sent from Power BI visuals by end user will hit the DTU. The minimal query per cost is 10 MB. In calculation, it costs 1 cent for 2GB of data processing and usage. For example, if we sent a query to see a trend of taxi usage last 5 years for every month and it required querying the 1 TB of files in Azure Data Lake, then the single query would cost $5… so, depending on the complexity of the query and the number of users who send the queries, it could get very expensive overtime. It is mandatory to best design which datasets in the model should be direct query mode and which datasets should be import/dual mode to optimize the cost and report performance in balance.
We observed that near real-time Power BI reports with direct query model in Serverless SQL Pool is achievable with Premium licenses and a potential cost challenge. It is recommended using Serverless SQL Pool for ad-hoc or one-time analysis with large datasets. Yet, even if Serverless SQL Pool in Direct Query model is chosen, designing a model for minimizing costs, optimizing report performance, managing data refresh duration and model size to meet your license SKU will be the key success factors. These balance checks still remain.
Happy Power BI’ing!