© Mapbox, © OpenStreetMap

Speaker

Tomasz Kostyrka

Tomasz Kostyrka

Data Platform Architect, GetInData | Part of Xebia; Databricks Champion

Kraków, Poland

Actions

Data Platform Architect with ten years of experience in various positions related to the Data field.

Proficient with the Microsoft technology stack - started his journey with SQL Server and the SSIS/AS/RS suite, currently primarily focused on Azure Cloud, Snowflake, and Databricks platforms. Highly enthusiastic about all kinds of automation and implementing the DevOps/DataOps practices in projects.

Privately, a husband and father of two, suffering from chronic lack of time and sleep deprivation.

Area of Expertise

  • Information & Communications Technology

Topics

  • Snowflake
  • Azure
  • datawarehousing
  • DataOps
  • DevOps
  • Data Platform
  • Databricks

Going Live with dbt-core on Databricks and MS Fabric

During this session, we'll dive into building a production-ready flow for dbt projects on Databricks and Microsoft Fabric. We'll start with a brief introduction to what Analytics Engineering is, why the ELT approach is dominating platforms like Databricks, Snowflake, and MS Fabric right now, and how dbt helps us design the transformation layer.

After a quick demo, we’ll move on to the main part of the session, focusing on transitioning our locally running projects to production environments. We’ll discuss this topic using both Databricks and MS Fabric as examples, presenting several scenarios for tackling this challenge. We'll concentrate on key stages of the process, including:

- Automated deployment using CI/CD pipelines along with testing
- Orchestration (including ADF, Fabric Pipelines, Airflow, Workflows)
- Compute resources needed for processing (Databricks Clusters, Fabric Notebooks, Docker + ACR/ACA Jobs)
- Authentication using service principals
- Automatic generation and hosting of project documentation in Azure

Note: Besides introducing dbt, we won't go into the details of specific features during the session; our focus will be on operationalizing the processes.

DevOps from day 1 – kickstart your data project the right way

In data projects, DevOps practices often take a backseat at the start. Concepts like IaC, CI/CD, or branching strategies are seen as obstacles and pushed to a later stage.

Initially, everything seems to flow smoothly, but soon enough, issues start creeping in - code conflicts, pipelines running with user permissions, undocumented & unapproved changes etc. These all come back to us when it's time to migrate to production - often ending in sleepless nights or a weekend spent with colleagues at work.

In this session, I want to show that adopting DevOps from Day 1 not only saves you from these headaches but can actually speed up development in a well-functioning team rather than slowing it down. We’ll go through a session-long live demo mixed with theory slides, where we’ll build a simple (but complete!) platform project from scratch, fully equipped with the right tools so that even our very first print("Hello World!") is deployed through a complete CI/CD pipeline.

We'll cover:
> Automated project setup in Azure DevOps (repos, policies, environments)
> Deploying the initial cloud infrastructure with Terraform
> Provisioning Entra group hierarchy and technical accounts (SPNs)
> Initializing a Databricks project using Asset Bundles
> Setting up and configuring a dbt project
> Creating CI/CD pipelines for all components and deploying our final print("Hello World!")

Can we prove in 60 minutes that DevOps is an accelerator, not a roadblock? Let’s find out!

Databricks Platform Engineering - The No Man's Land

"No Man's Land" typically refers to areas that are not controlled by either side in a conflict. In the realm of data platforms, this can signify the unclear boundaries and responsibilities between DevOps and Data Engineering teams. There may be confusion about roles, leading to a lack of collaboration or misalignment in goals.
During this session, I will attempt to debunk the myth that data platforms must be built in ways that diverge from the standards widely accepted by DevOps and Cloud specialists. We will explore the reasons behind the reluctance of DevOps Engineers to engage in Data projects and the common disregard among Data Engineers for best practices from the Cloud Engineering and DevOps domains.
I will present an example of a scalable data platform architecture based on Azure Databricks, focusing on automation and scalability. Key topics will include Networking, Security, Cost Management, and Access Management, often referencing the Cloud Adoption Framework and its Cloud Scale Analytics component. We will cover the core components of an Azure Databricks solution, dividing them into central (Account, Unity Catalog) and local (Databricks Workspace) elements. Our approach will adhere to the "Everything as Code" philosophy, starting with Infrastructure as Code (IaC) tools like Terraform and Bicep, and extending to Databricks Asset Bundles wrapped in mature CI/CD processes.
We will also discuss the skills that a Cloud/DevOps Engineer should possess, beyond the usual standards, to successfully implement a project for such a platform in accordance with these principles.
In the practical part of the session, I will share lessons learned from the past few years of working on the implementation and optimization of such platforms. I will discuss mistakes made at various stages of building and deploying platforms, as well as best practices and solutions that, developed over time, have enabled us to deploy and standardize projects faster while continuously improving their quality.

Azure Data Platform as Code

The aim of this session is to demonstrate how an enterprise-ready Azure Data Platform can be set up from scratch in days instead of months. I will present the most important lessons I've learned over the last year while working on such an automation framework. I'll discuss failures, dead ends, drawn conclusions, and the approach we ultimately developed and successfully implemented.

During this one-hour session, I'll address, among other topics:
- Landing Zones, Cloud Adoption & Cloud Scale Analytics Frameworks - why should the 'Data people' also understand this stuff?
- Automation from day one & Everything as Code.
- Networking, Security, Monitoring.
- Why a bunch of accelerators work better than an out-of-the-box solution.
- Project timeline, proper analysis and collaboration with the client - the keys to success.

A quick journey through optimization techniques. Told differently than usual.

In this session, we will walk through the main optimization techniques - starting with classic indexes (B-Tree) for relational databases, via Z-Order, and Liquid Clustering for Lakehouses, and ending with the V-Order mechanism, recently introduced by Microsoft.

We will delve into the mathematical foundations behind the mechanisms to fill in some gaps and mention concepts that are often overlooked when presenting these techniques. But don't be scared, We'll introduce this theoretical knowledge in a very accessible way!

We'll cover sorting, partitioning, the origin of the Z-Order curve, and many others. We'll also break down the Parquet file into its components to fully understand how different pushdown mechanisms work.

We will talk about optimization techniques that you already know, but we'll do it in a way different than usual ;).

dataMinds Connect 2025 Sessionize Event Upcoming

October 2025 Mechelen, Belgium

DATA:Scotland 2025 Sessionize Event Upcoming

September 2025 Glasgow, United Kingdom

Data Saturday Oslo 2025 Sessionize Event Upcoming

August 2025 Oslo, Norway

SQLBits 2025 - General Sessions Sessionize Event Upcoming

June 2025 London, United Kingdom

Data Platform Next Step Sessionize Event Upcoming

June 2025 Billund, Denmark

Data Point Prague 2025 Sessionize Event

May 2025 Prague, Czechia

SQLDay 2025 Sessionize Event

May 2025 Wrocław, Poland

Global Azure Torino 2025 Sessionize Event

May 2025 Turin, Italy

AzureDay Poland 2025 Sessionize Event

March 2025 Warsaw, Poland

Data Platform Next Step 2024 Sessionize Event

June 2024 Copenhagen, Denmark

SQLDay 2024 Sessionize Event

May 2024 Wrocław, Poland

Global Azure Torino 2024 Sessionize Event

April 2024 Turin, Italy

Data Saturday Oslo 2023 Sessionize Event

September 2023 Oslo, Norway

Tomasz Kostyrka

Data Platform Architect, GetInData | Part of Xebia; Databricks Champion

Kraków, Poland

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top