Marisol Steinau
Data Solution Architect
Tuttlingen, Germany
Actions
I began my professional journey with a background in finance, but in 2019, I took a bold leap into the world of data engineering. Since then, I've been passionately working with Microsoft technologies and have found my true calling in the cloud. My career is driven by a constant pursuit of new challenges and opportunities, always eager to explore and embrace what's next.
Area of Expertise
Topics
Conflict Management – The Git Way
Git is a version control system that is on the verge of becoming ubiquitous in the IT world. Many data platforms and tools nowadays provide an integration for Git or strive to adapt their formats to make them compatible with Git for reaping the full benefits of contemporary version control. Recent examples of this are the ‘.tmdl’ file format for Power BI projects and the support for Git built into Microsoft Fabric. Git enables a safer, better, and more productive way of working. Yet one does need to know how to work with Git and deal with the challenges that come from using Git.
One challenge stems from using branches in Git to independently work on different features. At one point or another, these branches need to be merged back into the main solution, which can lead to merge conflicts when files have been altered in different ways. Being able to resolve these merge conflicts in a consistent and correct way is crucial. Microsoft Fabric promotes the use of branches, yet severely lacks the capabilities to optimally deal with merge conflicts in the Fabric UI. One frequent recommendation is to avoid merge conflicts altogether, which, in all honesty, won’t work. You will need to deal with them, which is not that hard if you know what you’re doing.
We will explorer what kind of merge conflicts typically occur when working with Fabric items. We will discuss when the means of Fabric of resolving a merge conflict can be used and when they better not be used. In this case, we show advanced techniques to solving merge conflicts using a combination of Fabric and native Git capabilities. And sometimes the only solution is to work entirely within Git. Whatever it may be, this session will give you the knowledge to be able to deal with any kind of merge conflicts that may arise, from the most basic issue to more complicated conflicts.
Being able to work with Git is as essential for Microsoft Fabric as working with Spark notebooks or knowing delta parquet. And you will become acquainted with merge conflicts sooner than later, so why not be prepared when the need arises?
Beyond the Basics: Advanced SCD2 Implementation for Compound Dimension Tables
Slowly Changing Dimensions (SCD) are widely known, and numerous blogs and tutorials cover how to implement them using languages like T-SQL, PySpark, or low-code approaches such as Dataflows and Pipelines. However, these examples typically focus on a single table—like a customer or product table. In my project, I faced a more complex scenario: my customer data was derived from multiple source tables, combining details such as customer names, addresses, statuses, partners, and more. Each source had different update frequencies, and it was crucial to track historical changes in customer data for driving personalized discounts, offers, and sales strategies based on status and partner relationships.
This session will demonstrate how I approached building a compound customer table, implementing SCD-Type 2 logic without resorting to full data reloads each time. Instead, I designed a delta load mechanism, ensuring only the changed data was processed. Using Microsoft Fabric and Notebooks, I solved the challenge of efficiently managing and updating this complex dataset. While the solution is showcased in Microsoft Fabric, the techniques can be applied across other environments. Join this session to learn how to handle multi-source dimensions with a practical, scalable approach that minimizes reprocessing and enhances data accuracy.
The Azure Admin Tool Chest for Data Engineers
I am a data engineer working with Databases, Data Lakes, Synapse and Databricks in Microsoft Azure. Some of the problems I encountered could not be solved with SQL or Python, but instead required the skills of an Azure Admin. And since all the real ones were busy, I had to administrate the things myself.
I will show with real world examples I encountered what parts of the Azure Admin Tool Chest can be relevant for the humble data engineer. I will explain:
- what headache could have been prevented with delete locks
- why I should have adhered to the principle of least privileged when assigning RBAC roles to access my Synapse workspace.
- what is RBAC anyway
- how to set up cost management to prevent Databricks or Azure Synapse stampeding over your budget
- where monitoring your resources and setting up basic alerts makes sense to prevent common catastrophes
In this session, we’ll look beyond the traditional tasks of the data engineer and see what’s there in the realm of Azure Administration. This knowledge can empower you to solve problems when others are too busy. The Azure Admin Tool Chest makes your skillset more well-rounded.
You shall not pass! Designing access in Microsoft Fabric
Microsoft Fabric has been out there for a bit already and new features keep emerging almost constantly. Keeping up isn't a piece of cake, especially it seems that there are (too) many options out there for securing data, aren't there?
As with all security topics, it requires some in-depth consideration to achieve the proper result. You need to protect your data from things such as accidental deletion, manipulation, unwanted resharing, and other big headaches. Developing an appropriate security plan requires you to be aware of all the options that Microsoft Fabric offers. And here is where you have to make your way through the labyrinth of different data restriction possibilities.
According to the "Microsoft Fabric security white paper", the multi-layer security model of Fabric offers workspace permissions, item permissions, and granular permissions for each Fabric engine. In this session, I aim to break this down into simple terms to spare you reading the entire paper (and all the linked content). You'll walk out with an overview of
- Workspace roles: Why it is a good idea to assign them to security groups or M365 groups
- Item level security: How to control access to individual Fabric items when users do not have access to a workspace and why you should generally stay away from it.
- Securing the SQL Analytics endpoint: Applying object-level security, column-level security, row-level security, and dynamic data masking.
But wait, there's more! Currently in preview, OneLake data access roles apply role-based access control to data stored in OneLake, determining what folders user see when accessing the "lake view" of the data via the lakehouse, UX, notebooks, or OneLake APIs.
That raises questions: Does this mean OneLake RBAC applies to Lakehouse Items only?! Does OneLake RBAC work together with Workspace roles? Can OneLake RBAC roles be combined with Lakehouse item permissions? What about SQL Analytics endpoints? Or Shortcuts?
It is apparent why permission assignment isn't anyone's favorite topic. However, hopefully after this session you'll find that the labyrinth of data restriction options no longer feels overwhelming. Knowing your way around this labyrinth gives you the confidence to apply access control in practice to keep all data in Fabric perfectly secure.
Fabric, Git, & Pipelines - A Chord in Harmony
Nowadays with the rise of Microsoft Fabric a data engineer has the option to store data in warehouses, lakehouses, or KQL databases. As if we had to tell you that, that's the natural order of things. But there are other things a data engineer needs that do not belong in that kind of storage. Where do the SQL scripts, the notebooks, or the KQL query sets go?
If you don't know better, your local hard drive. If you are very bold, you might store them in Microsoft SharePoint. Or what's good enough for your data is good enough for your scripts, after all this saves on storage costs. Luckily, there is a better option: Git.
The scripts and notebooks are stored in a Git repository. Git is a distributed version control system initially developed for the Linux kernel. It allows for easily versioning your files and for collaboration on the same notebook. Git is a tool that has many amazing features, but like any tool it requires knowledge to use. Make your first steps into Git as a data engineer here and learn what Git can do for you when you develop a data solution. We show working with Git inside and outside of Microsoft Fabric with a focus on applicability and best practices.
Ok, now that your SQL scripts and your Fabric notebooks are fully versioned with Git, how to get them from the Git repository onto the Fabric workspaces? Of course, you could copy them manually, but wouldn't it be nicer if they just deployed automagically to the right place whenever a change occurs? The DevOps world has a solution: DevOps Pipelines. Learn how a pipeline can take your notebook and other Fabric items and deploy them to dev, staging, and prod environments, adapting connection strings and other parameters to match each environment automatically. We demonstrate with Microsoft Fabric how to automate your workflow so you can directly see the benefit.
Git and DevOps pipelines have helped software engineering immensely. And it might just do the same for data. Taking the load off your back by automating tasks in Microsoft Fabric can make your daily life easier. And working properly with version control gives you safety and recovery from error for your Fabric items. Let Git and Pipelines shine together to make your Fabric endeavor brighter
Infrastructure as Code, why should I care?
As a data engineer tasked with setting up Azure Synapse and data lakes for each client manually, I found myself frustrated by the repetitive nature of these tasks. Motivated by a desire to work smarter, not harder, I embarked on a journey to embrace Infrastructure as Code (IaC), with a particular focus on Azure Bicep. In this session, I'll share my experiences and insights gained from leveraging Bicep to automate deployment workflows and eliminate manual drudgery. From the initial challenges of manual setup to the newfound efficiencies unlocked through Bicep, attendees will gain practical knowledge on streamlining data infrastructure deployment. Join me as we explore how Bicep empowers data engineers to work more efficiently, enforce best practices, and achieve consistency across client environments. Whether you're a seasoned data engineer or new to the world of IaC, this session will inspire you to harness the power of automation and banish repetitive tasks for good.
DevOps for the humble data engineer
A data engineer stores data in warehouses, lakehouses or even a simple database. As if we had to tell you that, that's the natural order of things. But there are other things a data engineer needs that do not belong in that kind of storage. Where do the SQL scripts, the workbooks, or the little python program go?
If you don't know better, your local hard drive. If you are very bold, you might store them in Microsoft SharePoint. Or what's good enough for your data is good enough for your scripts, after all this saves on storage costs. Luckily, there is a better option: Git.
The scripts and workbooks are stored in a Git repository. Git is a distributed version control system initially developed for the Linux kernel. It allows for easily versioning your files and for collaboration on the same workbook. Git is a tool that has many amazing features, but like any tool it requires knowledge to use. Make your first steps into Git as a data engineer here and learn what Git can do for you when you develop a data solution.
Ok, now that your stored procedure SQL script and your Databricks/Fabric/Synapse workbook is fully versioned with Git, how do get it from the Git repository onto the SQL database or the Databricks workspace? Of course you could copy it manually, but wouldn't it be nicer if it just deployed automagically whenever a change occurs? The DevOps world has a solution: Pipelines. Learn how a pipeline can take your workbook and deploy it to dev, staging, and prod environments, adapting connection string and other parameters to match each environment automatically. Pipelines are also a fully integrated part of Microsoft Fabric, we will use for a practical demonstration.
Git and DevOps pipelines have helped software engineering immensely. And it might just do the same for data. At the very least they might just ease your daily life. Isn't that possibility worth knowing more about Git and Pipelines?
SQL Konferenz 2025 Sessionize Event Upcoming
Data Community Day Austria 2025 Sessionize Event
DATA BASH '24 Sessionize Event
SQL Days 2024
DevOps for the humble data engineer
SQL Konferenz 2024 Sessionize Event
Data Saturday Oslo 2024 Sessionize Event
Data Saturday Rheinland 2024 Sessionize Event
Data Saturday Croatia 2024 Sessionize Event
Data Saturday München 2024 Sessionize Event
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top