Ivanna Jurkiv Ditlevsen

Data Engineer

Copenhagen, Denmark

Actions

I am a data engineer that really likes processes and cares way too too much about data quality and documentation.

Having began my career as a data analyst in a decentralized self-service setup, I experienced first-hand the pitfalls of bad data and poor governance. Pretty dashboards and large quantities of data equal mean very little if data is of poor quality, documentation is non-existent and roles and responsibilities are not properly defined.

And so now in my role of a Data Engineer, I make it a priority to not only understand the new fancy tools but to also make sure that pipelines I build are robust and sufficiently documented and that processes are in place to support them.

What keeps me up at night, you ask? Data quality, data catalogues, data governance and all things related to Microsoft data products

Area of Expertise

Business & Management

Topics

Azure Data Factory
Azure Synapse Analytics

Pitfalls and remedies for parallel data pipeline runs

Being able to run pipelines in parallel is one of the major benefits of data pipelines, but whether in Microsoft Fabric, Azure Synapse or Azure Data Factory parallel runs may also cause trouble. In Microsoft Fabric we may even have pipelines breaking if we do not manage parallelism carefully.

In this session, we explore the potential problems with parallelism and cover how to make up for some of these risks by managing parallelism using hierarchical pipelines, REST API, and control flow activities. All of this is delivered together with some of our personal experiences and stories about dealing with pipeline design problems.

After attending this session you should know about some of the pitfalls of parallelism and solutions to making sure your pipelines run smoothly.

Topics:
- Data integration design
- Hierarchical pipelines
- REST API endpoints for Data Pipelines
- Control flow activities

How to optimize Azure Synapse pipelines using SQL database meta data tables

Azure Synapse pipelines allow to easily ingest data from a wide range of sources, and to orchestrate pipelines.

I am accustomed to using Synapse pipelines exactly for these purposes. I am also used to working in multiple deployment environments (DEV, UAT and PROD).

One challenge that I stumbled upon with this kind of setup was the effort and time it would take to make updates to the list of tables to be ingested and datasets to be refreshed in PowerBI.

For instance, to ingest data from an additional table, I would need to update the relevant parameter in one of the Synapse pipelines in DEV environment. Then it would be submitted in a pull request, released to UAT, tested there. Only after a few weeks, would the new data start being ingested in PROD.

Doesn't this sound too complex for what it really is?

If you agree, then tag along and I will show you how to use a few meta data tables built in a SQL database to optimize Synapse pipelines.

In my talk, I will focus on 2 examples:
1. using meta data tables to optimize ingestion of data tables
2. using meta data tables to optimize automatic refresh of PowerBI datasets from Synapse pipelines

Note: the session includes a walk-through the solution and assumes familiarity with Azure Synapse pipelines or Azure Data Factory

Naming guidelines for medallion architecture

You probably read the title and thought ‘why would anyone be interested in talking about how to properly name files and tables? It’s simple.’ And yet, because it is simple, not many data engineers think about it.

Inconsistencies in naming of resources, tables and columns lead to broken pipelines, additional onboarding time and hours spent in vain brainstorming creative names.

This all can be avoided with proper naming guidelines.

In this talk, you will learn about naming guidelines that we follow for:
•azure resources such as storage accounts, KeyVault and various types of secrets
•file names in bronze, silver and gold layers
•table and column names

You should also tag along if you wish to get some ideas on how to actually enforce naming guidelines in your teams, and to hear a few examples of the benefits you can reap from consistent naming guidelines.

How I transitioned from Data Analyst to Data Engineer

Roughly a year ago, I made the jump from data analytics to data engineering.

After working as a Data Analyst for 2 years and having built hundreds of Alteryx workflows and tens of Tableau dashboard, I had enough.

I entered the world of data engineering to see how proper, robust ingest and transform pipelines are built, and to have a small contribution to delivering high quality data to data analysts and other data consumers.

The transition was not easy but it is very much rewarding.

In this talk, I will share my tips and tricks on how to make the switch, and some don'ts you can avoid to make the transition smoother.

Southampton Data Platform and Cloud user group (6 PM GMT, UK) User group Sessionize Event Upcoming

June 2025

Fabric February 2024 Sessionize Event

February 2024 Oslo, Norway

New Stars of Data #6 Sessionize Event

October 2023

Ivanna Jurkiv Ditlevsen

Data Engineer

Copenhagen, Denmark

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Ivanna Jurkiv Ditlevsen

Actions

Links

Area of Expertise

Topics

Sessions

Pitfalls and remedies for parallel data pipeline runs

How to optimize Azure Synapse pipelines using SQL database meta data tables

Naming guidelines for medallion architecture

How I transitioned from Data Analyst to Data Engineer

Events

Southampton Data Platform and Cloud user group (6 PM GMT, UK) User group Sessionize Event Upcoming

Fabric February 2024 Sessionize Event

New Stars of Data #6 Sessionize Event

Ivanna Jurkiv Ditlevsen

Links

Actions