Thibauld Croonenborghs
Data Architect at AE
Brugge, Belgium
Actions
Thibauld Croonenborghs (29/06/1989) started his studies at the Belgian Royal Military Academy and worked as an army officer for several years while simultaneously studying computer science.
He started as a software engineer at TomTom working on map data, but has now transitioned to a data engineer and Python/Azure enthousiast.
Links
Area of Expertise
Topics
Boost your Python testing with testcontainers
Writing tests that mimic production environments can be challenging, especially when dealing with dependencies like databases, message brokers, and external services
We will go through the following topics:
- What are testcontainers?
- Test environment setup
- Writing some test scenarios
- Integration with pytest and unittest
- Demo + some practical examples
From developer to AI-orchestrator: Building a data platform with AI-Assisted engineering
AI-assisted coding is rapidly changing how software is written, but what does that mean for data engineers building modern data platforms?
In this session, we explore how AI agents can accelerate development while maintaining architectural control and code quality. Using a minimal data platform setup as a foundation, we demonstrate how AI can move beyond simple autocomplete and become a structured development assistant.
In this session, you will learn:
• What AI-assisted coding really means in a data engineering context
• The tools and setup required to get started
• How to improve AI agents using structured prompts, skills, and context engineering
• The role of MCP and plugins in enabling tool-aware AI workflows
• Common pitfalls, limitations, and how to avoid over-reliance on AI
The session concludes with a live demo where we enhance a minimal dbt project using an AI agent to add tests, refactor models, and extend functionality in real time.
Attendees will gain practical insight into how AI can increase productivity in data platform development — while understanding where human expertise remains essential.
From chaos to clarity: Understanding and using Lakehouses
The emergence of lakehouses marks a significant evolution in data architecture, offering a unified platform that brings together the scalability and flexibility of data lakes with the performance and management features of data warehouses. But what exactly are lakehouses, and why should you care?
In this talk, we'll demystify the lakehouse concept: what it is, how it works, and where it fits in your data stack. We'll explore the benefits of adopting a lakehouse architecture, common use cases, and how it enables more efficient data workflows. You'll also see how to interact with lakehouses using popular open-source technologies, with a demo to illustrate practical usage and integration.
From Apache Spark to Delta Lake: A practical introduction to scalable data engineering
This session will introduce you to the fundamentals of Apache Spark, the powerful distributed computing engine for big data processing. We’ll cover how it works, where it's commonly used, and when it’s the right choice for your data challenges. From there, we'll explore the lakehouse paradigm using Delta Lake and how it seamlessly integrates with Spark to enhance reliability and performance.
Through real-world examples and live demos, you'll see how Spark and Delta Lake work together to support both streaming and batch workloads in a unified architecture. We'll also dive deeper into the lakehouse approach—a modern architecture that combines the openness and flexibility of data lakes with the reliability and performance of data warehouses.
Demystifying Spark Profile Optimizations in Microsoft Fabric
Optimizing Spark workloads in Microsoft Fabric can be a complex endeavor, given the array of available configurations and techniques. Terms like V-Order, Z-Order, and considerations for read-heavy versus write-heavy profiles often add to the confusion. In this session, we'll bring clarity to these concepts, offering a structured overview of each optimization strategy and how it impacts performance on the table level of a Delta Lake table
Through practical examples and performance considerations, attendees will gain insights into how to fit these optimizations into their data architecture, whether dealing with read-intensive analytics or write-heavy data ingestion processes. By the end of this talk, you'll be equipped with the knowledge to make informed decisions, turning the complexity of Spark optimizations in Microsoft Fabric into a strategic advantage
Single-node technologies vs. Spark in Microsoft Fabric: Choosing the right tool for the job
In the early days of big data, distributed computing frameworks like Apache Spark and Hadoop became the de facto standards for processing massive datasets. Their ability to handle distributed computing made them essential for tackling large-scale data challenges. However, in today’s diverse data landscape, not all datasets qualify as "big data." For many use cases, single-node processing tools like Polars and DuckDB are proving to be compelling alternatives, offering exceptional performance, simplicity, and lower overhead compared to distributed frameworks.
Microsoft Fabric introduces a unique opportunity to leverage both worlds if necessary. By enabling Python notebooks within its ecosystem, Fabric allows you to build and execute pipelines using these modern single-node technologies. This flexibility ensures you can choose the most efficient tool for your specific workloads.
In this session, we will:
- Examine the evolution of data processing, contrasting distributed frameworks like Spark with single-node solutions.
- Explore how technologies like Polars and DuckDB operate, their strengths, and how they compare to Spark in performance and scalability.
- Evaluate use cases to determine which approach—distributed or single-node—fits best
Thibauld Croonenborghs
Data Architect at AE
Brugge, Belgium
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top