Using pushdataset in PowerBI to create near real time logging dashboard

Recently i participated in a hackerthon, in which the goal was to create a near real time monitoring dashboard using Microsoft PowerBI. The data was already generated and persisted in SQLServer and needed to be queried efficiently. Since i am working with a different tech stack now, this endeavour was completely new for me and i learned a lot, which i like to document here.

TLDR: Using PowerBI’s API endpoint functionality, it is possible to push data from the SQLServer only once to the service instead pulling it through directquery everytime a user refreshes the datase, which increases performance and near real time monitoring.

What are datasets in PowerBI?

Visualization in PowerBI needs access to some form of datasource, which is represented as dataset. Except for streaming datasets, which we will be talking about later, datasets can also represent a complex data model. Datasets are a form of abstraction and the underlying sourcetype could be arbitrary such as an excel sheet or a database (powerbi-datasets).

How to get data?

Data from a database could be pulled through either import or directquery (powerbi-connect-data). Using directquery was the first design idea, however the downside of this method is, that it will execute the underlying query whenever a user is interacting with the visualization. If many people work with the dashboard and click and refresh simultaneously, it would generate a lot of unnecessary queries against the database.

Streaming/Push datasets and near real time monitoring

With a push dataset, data is pushed into the Power BI service [..] and automatically creates a new database in the service to store the data. Once a report is created using the push dataset, any of its visuals [..] update in real-time whenever the data is updated (powerbi-pushdataset).

Another method which PowerBI provides is, to open an API endpoint, in which data can be pushed to a streaming/push dataset. The difference between those datasets is, that streaming dataset will only hold the latest data, while push dataset can also hold historic data (powerbi-pushdataset). Using a simple web request it is possible to push the data as json objects to the endpoint. For this task, the SQLServer Agent can be utilized to query the database and push the data to the push dataset every 5 seconds. A template example for the script can be found in thinkbi.de.

The SQL Agent queries the database and pushs it to the PowerBI endpoint

Learnings

  • From our observation, the pushdataset automatically holds distinct data, which means you could push the same data multiple times without effecting any downstream results
  • The PowerBI Endpoint apparently only allows 120 requests per second (techtalkercorner)
  • The pushdataset only holds up to 200.000 rows and will delete rows accordiing to FIFO (powerbi-retentionpolicy)

Challenges

Endpoint request limit

The aforementioned scripts queries the database and creates a web request for every row of the resultset. If a resultset turns out to contain more than 120 rows, it will lead to a failure because of the endpoint limitation. Therefore, a slight modification of the payload wrapping has been implemented, to create a list of json objects instead of single json objects, which are then pushed to the API.

Optimization possibilities

One of the major problems of the direct query is, that whenever a user refreshes the data, a complete query will be executed against the database, which could lead to unnecessary workload for the machine if multiple users refreshes the same dashboard. In comparison to that, the push dataset acts as a central data hub and queries are executed only once. To achieve near real time, the SQL Agent could be scheduled to query the database every 5 seconds and only filter for data within the last 10 seconds (“delta”). Nonetheless it will query the database even if there are no changes. Herein could lie a possible optimization, where change data captures (CDC) are utilized to only push new data.

Initialization

We couldn’t get to know, where the pushdataset is persisted, however we assume it is only stored within the memory. If for some reason the PowerBI crashes, historic data might be lost and needs to be initialized again.

Further references

Demo: https://www.youtube.com/watch?v=rjXJ0GSUMBg

Simple example of the API:https://blogs.lobsterpot.com.au/2020/07/16/getting-started-with-power-bi-push-datasets-via-rest-apis/

Leave a Comment

Your email address will not be published.

hungsblog | Nguyen Hung Manh | Dresden