Metadata-driven dataextraction with Synapse Analytics/Azure Data Factory

The first step in a data warehouse or data platform is to extract data from your sources. This can be time consuming, but if you use the metadata from your sources, you only have to do an effort once per source type. This will result in huge savings in development cost, allowing your data engineers to actually deliver data products the business and data scientist can use.

The goal of this workshop is to learn how to leverage Azure Synapse Analytics to extract data in a metadata-driven fashion. At the end of the day you will have built your own solution which can extract data from databases in an efficient and flexible manner, so you are in charge which objects are loaded and which not.

We will start off by taking a look at the architecture we want to build, so we get a good understanding of what we will be building. From there we will take a look at the database we will use to collect our source metadata and which we will later on use to orchestrate the solution.

Next up we will create all the resources we will need to get our solution up-and-running, which include:
- Azure Synapse Analytics Workspace; here is where we will build our pipelines
- Azure SQL Database; this is our metadata database
- Azure Data Lake Storage Gen. 2; here we will store our extracted data
- Azure KeyVault; this service is used to securely store our secrets
- Azure Synapse Analytics Serverless SQL Pool; to enable us to query the extracted data

Now we are good to go and start building!

First, we need to set up the connections to our data source. This is provided by the trainer.

Next, we will create pipelines to extract the metadata into our metadata database and configure the settings in there to enable us to select the objects and attributes we want to extract.

After we've created the infrastructure to extract the metadata from our source we will start with the most fun part! Extracting the data into our platform. In our case we will save the data as Parquet-files in a Azure Data Lake Storage Gen. 2.

Finally, we will trigger the created pipelines to extract the data into our platform. But since this in Parquet, we cannot open the files to validate if the data is actually correct. To remedy this, we will generate external tables in our Serverless SQL Pool, allowing us to query the extracted data with T-SQL.

Target audience:
- BI Developers/Data Engineers
- Data Architects

Recommended knowledge:
- Basic knowledge of data extraction theories
- Basic knowledge of SQL

- Experience with data extraction in another tool is beneficial.
- Azure Data Factory/Azure Synapse Analytics experience is NOT needed.

Equipment/software needed
- A laptop (preferably with mouse)
- Working internet connection
- Azure subscription with sufficient credits, $50,- should be enough
- SQL Server Management Studio (latest version)

Within this workshop you will learn how to build a solution to extract all your data in a metadata driven fashion. This will enable you to ingest data into your data platform easily so you can focus on creating data-products that will drive your business!

Rohan Horstman

Data Solution Architect

Arnhem, The Netherlands

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Metadata-driven dataextraction with Synapse Analytics/Azure Data Factory

Rohan Horstman

Links

Actions