Speaker

Christian Henrik Reich

Christian Henrik Reich

Cloud data architect

Copenhagen, Denmark

Actions

Currently work @ twoday’s Data & AI DK's department for Technologies and architecture, and is a part of Mugato as a senior developer and AI developer. Started programming as kid, and still do. Have made everything from embedded programming to data warehouses. Last decade, focus has mainly been on data. From optimizing and infrastructure to designing and building data solutions in cloud and on-premise.

Area of Expertise

  • Information & Communications Technology

Topics

  • Database
  • Azure Data Platform
  • data engineering
  • Azure Data & AI
  • Microsoft Data Platform
  • Data Warehousing
  • All things data
  • Microsoft Fabric
  • Azure Machine Learning
  • Apache Spark
  • Delta Lake
  • Databricks
  • SQL Sever
  • Azure OpenAi
  • Azure AI Foundry
  • Microsoft (Azure) AI + Machine Learning

Transforming Data into Gold: A Live Demonstration of Microsoft Fabric Lakehouse

Dive into the dynamic world of data management with this talk, where we'll unveil a live, end-to-end demonstration of a Metadata-Driven Microsoft Fabric Lakehouse. This session is designed for data professionals eager to explore innovative solutions in data architecture.

Experience first-hand how to transform raw data into valuable insights within a span of 60 minutes, utilizing a robust Lakehouse architecture on Microsoft Fabric. Our open-source framework is tailored to boost productivity and elevate development quality in crafting and managing Data Lakehouse solutions.

Key Highlights:
- Discover how to automate data cleaning, validation, and deduplication using PySpark's robust capabilities.
- Learn to articulate business logic for facts and dimensions effectively through SparkSQL.

Attendees will get the code and leave with practical knowledge and insights into implementing a Lakehouse, making data operations more efficient, and harnessing the full potential of Microsoft Fabric.

ML and AI Capabilities in Microsoft Fabric

Description

Microsoft Fabric is becoming the one-stop shop for data in Azure, including machine learning and AI. Fabric's maturity is starting to enable real projects with its machine learning and AI capabilities. As with many other aspects of Fabric, there are also new libraries and tools for machine learning and AI. These might be different, especially for those coming from Azure ML.

The session will cover:

* Basic and AutoML machine learning
* Hyperparameter tuning
* OpenAI/GenAI with Azure AI Foundry
* MLOps, including model tracking, model repository, and model serving
* Comparing AzureML, Azure OpenAI Studio, Azure AI Foundry to Fabric AI/ML capabilities?
* Fabric Workspace layout, capacities, and costs

Attendees will leave with practical insights into using AutoML, hyperparameter tuning, and MLOps within Fabric, along with how OpenAI and Azure AI Foundry fit into the ecosystem. We’ll also discuss how Azure ML and Azure OpenAI Studio remains relevant and how to navigate Fabric’s workspace, capacities, and associated costs to maximize your project's efficiency and potential.

Extreme data processing and HPC with Azure Batch

While tools like Apache Spark, RDBMS, Pandas, Polars, and DuckDB handle most data processing needs, some workloads simply don’t fit these technologies. Unstructured data, such as voice files needing transcription or IoT images requiring analysis, often falls outside their scope. Likewise, semi-structured and structured data can become cumbersome when intensive ML or AI model inference is required, or when dealing with countless small files.

In this session, we’ll explore just how simple an end-to-end solution can be using Azure Batch. You’ll learn:

* How to provision and update Azure Batch services with Infrastructure as Code (IaC).
* How to define Jobs and Tasks in any code editor (e.g., VS Code).
* How to integrate continuous integration and delivery (CI/CD).
* How to off-load results into Microsoft Fabric and other services.

By the end of this session, you’ll see that when traditional technologies like Spark reach their limits, Azure Batch offers a flexible alternative. One that is easy to implement, language-agnostic, and fully compatible with Git-based workflows.

Empowering Lakehouse Solutions with Apache Arrow and Python Notebooks in Microsoft Fabric

When Microsoft Fabric was released, it introduced Apache Spark as its default engine for general data processing. While Spark is powerful for big data scenarios, it’s not always the best solution for many small and medium-sized workloads, due to both costs and performance loss from administrative overhead.

Since then, Microsoft has introduced a non-Spark compute option: Python Notebooks. This makes it easier than ever to build solutions using Python alongside technologies like Apache Arrow, DuckDB, and Polars.

In this session, we’ll dive into how Python workloads leverage Apache Arrow under the hood in Microsoft Fabric to handle data transformations and analytics.

We’ll explore practical examples where Python with Apache Arrow outperforms Spark, and demonstrate how Apache Arrow bridges the gap between high speed and flexible development. Finally, we’ll examine how Apache Arrow fits into a metadata-driven Lakehouse architecture in Microsoft Fabric.

We will cover:

* The difference between Python Notebooks and a Single Node Spark.
* When to use Python Notebooks and when to use Spark Notebooks.
* Where to use Python Notebooks in a meta-driven Lakehouse
* A brief introduction to tooling and moving workload between Python Notebooks and Spark Notebooks.
* How to avoid overload the Lakehouse tech stack with python technologies, with an introduction to Apache Arrow
* Costs

After this session, attendees will have an understanding of how to apply Python Notebooks, as
well as Spark Notebooks, to get the most out of Fabric for data processing.

An Apache Spark query's journey through the layers of Databricks

A deep-dive session about Spark internals, where we explore how queries are executed in Apache Spark and within the layers of Databricks.

We will cover:

* Spark SQL and Catalyst
* A note on Tungsten
* Delta Lake
* Parquet files

These insights will be supported by glimpses into the official Apache Spark source code on GitHub.

The takeaway should be a better understanding of how queries are executed and some tools for problem-solving and optimizing for speed or cost.

Beyond Chatbots: Leveraging AI for Unstructured Data Processing

Much attention has been drawn to the rise of Generative AI (GenAI) and Large Language Models (LLMs) in general. In most cases, we are presented with yet another company chatbot for utilizing these technologies.

Less attention has been given to the fact that data has also moved into another era. Data is becoming less logically tangible, no longer just stored in tables, numbers, and words, but increasingly represented through interpretations of images, sounds, and texts.

As humans, we have the ability to form analyses by combining what we see, hear, and read. We are able to take such analyses and apply them to other types of data as well.

In this session, we will explore how to transfer this ability to computers. We'll discuss the necessary architectures and AI services to process unstructured data such as images, sounds, and texts, allowing us to integrate it with our tables from more relational sources. The session will provide attendees with actionable takeaways to serve as a starting point or inspiration for their next steps.

An Apache Spark query's journey through the layers of Microsoft Fabric

An Apache Spark Query's Journey Through the Layers of Microsoft Fabric

Join us for an exciting deep dive into the heart of Apache Spark! We'll take you on a journey to see exactly how your Spark queries get executed, both within Apache Spark itself and through the different layers of Microsoft Fabric. Here's what we'll explore together:

* Spark SQL and Catalyst: A break down how Spark SQL works hand-in-hand with the Catalyst optimizer to make your queries smarter and faster.

* A Note on Tungsten: Discover how Tungsten boosts Spark’s performance with better memory management and lightning-fast execution.

* A note on Fabrics native execution engine: Bringing the power of C++, for even faster query execution.

*Delta Lake: See how Delta Lake makes your data lakes more reliable and scalable, ensuring your data is always in top shape.

*Parquet Files: Learn why Parquet’s columnar storage is a game-changer for efficient data storage and quick retrieval.

We'll look into the official Apache Spark source code on GitHub, giving you a real, hands-on look at what's happening under the hood.

By the end of this session, you'll have a clearer understanding of how your queries run and some tools and tips to help you solve problems and optimize your Spark jobs for both speed and cost.

Outperform Spark with Python Notebooks in Fabric

When Microsoft Fabric was released, it came with Apache Spark out of the box. Spark's ability to work with more programming languages opened up possibilities for creating data-driven and automated lakehouses. On the other hand, Spark's primary feature to scale out and handle large amounts of data will, in many cases, be over-dimensioned, less performant, and more costly when working with trivial workloads.

With Python Notebooks, we have a better tool for handling metadata, automation, and processing of more trivial workloads, while still having the option to use Spark Notebooks for handling more demanding processing.

We will cover:

* The difference between Python Notebooks and a Single Node Spark cluster, and why Spark Notebooks are more costly and less performant with certain types of workloads.
* When to use Python Notebooks and when to use Spark Notebooks.
* Where to use Python Notebooks in a meta-driven Lakehouse
* A brief introduction to tooling and move workload between Python Notebooks and Spark Notebooks.
* How to avoid overload the Lakehouse tech stack with python technologies.
* Costs

After this session, attendees will have an understanding of how to apply Python Notebooks, as well as Spark Notebooks, to get the most out of a Fabric Capacity for data processing.

ML and AI Capabilities in Microsoft Fabric

Microsoft Fabric is becoming the one-stop shop for data in Azure, including machine learning and AI. Fabric's maturity is starting to enable real projects with its machine learning and AI capabilities. As with many other aspects of Fabric, there are also new libraries and tools for machine learning and AI. These might be different, especially for those coming from Azure ML.

The session will cover:

* Basic and AutoML machine learning
* Hyperparameter tuning
* OpenAI/GenAI
* MLOps, including model tracking, model repository, and model serving
* How is AzureML still relevant?
* Fabric Workspace layout, capacities, and costs

Attendees will leave with practical insights into using AutoML, hyperparameter tuning, and MLOps within Fabric, along with how OpenAI and GenAI fit into the ecosystem. We’ll also discuss how Azure ML remains relevant and how to navigate Fabric’s workspace, capacities, and associated costs to maximize your project's efficiency and potential.

Introduction to Vibe Coding and MCP for Building a Dataplatform in Microsoft Fabric

Vibe Coding (tell your computer what to code) and MCP servers have been growing rapidly over the last year. Terms like “Talk to your data” are appearing more and more. While it sounds ideal to simply tell a computer, using speech or text, how to build a data platform, there are important considerations to keep in mind to avoid pitfalls.

This session is an enthusiastic yet critical introduction to building data platforms with AI in Microsoft Fabric.

We will cover:
* What are we trying to achieve with AI? Can it close competence gaps or even replace developers?
* Introduction to MCP (Model Context Protocol)
* What Fabric MCP Servers are available
* Data modelling
*Testing and QA
* Security considerations

After this session, attendees should have a clear idea of how they can build a data platform by chatting, an understanding of common pitfalls, and inspiration to get started with their own MCP servers.

Data Platform Next Step 2023 Sessionize Event

June 2023 Billund, Denmark

Data Saturday Denmark - 2023 Sessionize Event

March 2023 Kongens Lyngby, Denmark

Christian Henrik Reich

Cloud data architect

Copenhagen, Denmark

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top