Working with JSON files in Synapse Serverless

As more and more systems become SaaS systems, more and more of the data sources for your modern data warehouse return JSON files. Azure Synapse offers many different ways to handle files in your Data Lake. You can use Synapse Pipelines, including Mapping- and Wrangling Dataflows, you can use Spark, SQL Pools or you can use Synapse Serverless.

If you are primarily modeling data for Power BI or other BI tool to import data infrequently and you want to put your SQL skills to good use, using Synapse Serverless can be a very economic option. With Synapse Serverless you pay for data volume so preparing and modeling moderately sized data that gets read, say once a day, can be the cheapest option in Synapse.

There are however some obstacles to overcome. Since the data is in the data lake and therefore in a file format SQL is not always the easiest option. Synapse helps you a great deal by offering a “right click on file and select top 1000” option to get basic SQL syntax. When you have more complex file types like JSON that can have nested structures and multiple entries in them it gets more complicated.

In this session we will look at how you can query different types of JSON files in different ways. We will use a practical example of data from the Power BI Scanner API and Microsoft Graph API so that the audience will come away with tangible benefits. We will talk about different options and the implication they have. We will particularly focus on amount of data read as that is the driver of cost in Synapse Serverless.

At the end of the session the audience will take away different methods to query JSON files as well as sample scripts so that they can get quickly started in their own environment.

Ásgeir Gunnarsson

Data Platform MVP

Hafnarfjörður, Iceland

View Speaker Profile