Speaker

Atte Sukari

Atte Sukari

Senior Data Engineer at Norrin

Helsinki, Finland

Actions

I've gained valuable experience as a data engineer with a business-oriented mindset. My expertise includes Spark, Delta Lake, and Terraform, with a focus on Microsoft Azure. While these are some areas I specialize in, I have broader knowledge beyond them. I'm passionate about empowering businesses with data-driven insights and creating user-centric solutions. Additionally, I enjoy tackling architectural challenges to ensure scalable and efficient data solutions.

Area of Expertise

  • Information & Communications Technology

Topics

  • Azure
  • Spark
  • Data Platform
  • Data Engineering
  • Cloud Computing on the Azure Platform

Maximize Efficiency in Microsoft Fabric: When to Choose Python Notebooks Over Spark

In this presentation, we will explore how Python notebooks within Microsoft Fabric can offer a more efficient and flexible alternative to Spark notebooks, especially for data analysis tasks. We will begin with an overview of Spark, covering its core architecture, distributed computing model, and use cases where it excels in handling large-scale data processing. However, Spark’s resource overhead and complexity may not be necessary for every task. We will then discuss the challenges of using Spark, including its computational overhead and setup complexities.

Next, we will introduce Python notebooks in Microsoft Fabric, focusing on their role in streamlining data analysis workflows. By comparing Python notebooks to Spark, we will highlight when Python is a more lightweight and efficient solution, particularly for smaller datasets or tasks that don't require the full power of Spark’s distributed architecture. Through examples, we will see practical examples of Python notebooks used for data exploration and analysis, emphasizing their simplicity, lower resource requirements, and performance advantages in Microsoft Fabric.

By the end of this session, attendees will understand when to leverage Python notebooks over Spark, empowering them to optimize their data analysis workflows and choose the right tool for the right task.

Scaling Pete's Plumbing Data Pipelines: From Efficient to Excessive

Join us as we follow Pete’s Plumbing, an imaginary company, from humble beginnings to a data-driven powerhouse (with a few bumps along the way). What starts as a simple system for tracking work hours soon grows into a full-blown data transformation adventure. As Pete’s team expands, so do the challenges of managing data—manual mistakes, ERP struggles, and scaling bottlenecks that become harder to ignore.

As the business grows, the pressure to make data-driven decisions increases, and Pete realizes that more data means more complexity. But where’s the sweet spot? Come and join us as Pete’s consultants, exploring the tricky balance between efficiency and overengineering. How should Pete proceed? Let’s dive in and find out!

Live Data, Better Decisions: Unlock Real-Time Intelligence with Microsoft Fabric

In this session, we’ll explore how Real-Time Intelligence works, focusing on its ability to process and analyze data as it arrives for immediate insights. We’ll discuss why and when to use Real-Time Intelligence, highlighting scenarios where timely data analysis is crucial, such as fraud detection in finance by analyzing transaction patterns in real-time. Unlike batch processing, which delays insights, streaming allows for real-time ingestion, processing, and analysis of data, enabling swift business reactions.

We’ll also delve into the Real-Time Hub within Microsoft Fabric, which simplifies real-time data processing with no-code connectors, real-time dashboards, geospatial analysis, and automated trigger-based reactions. Built on OneLake, it centralizes data storage, supporting both streaming and batch data for comprehensive insights. Through demos, we’ll illustrate these concepts in action, showing how real-time data processing can enhance decision-making and operational efficiency, and how AI and machine learning advancements in Microsoft Fabric enable users to create sophisticated dashboards and analytics without needing deep technical expertise in KQL or SQL.

Serving delta tables via api

The session is titled Serving Delta via API. It explores the rising popularity of Delta Lake and various methods of serving delta data through APIs.

It begins by dissecting the structure and concept of Delta Lake, highlighting its growing adoption and versatility in data processing. After that we examine different Python libraries like Polars and Pandas for reading Delta data and why we don't maybe want to use spark when trying to serve delta data. We also discuss about using databases as query engines like duckdb.

Furthermore, the session discusses strategies for possible replication of the data or utilizing intermediary databases like Redis when low latency is essential. We consider factors such as business and application requirements and the challenges of possible data replication.

When it comes to serving Delta data through APIs, the discussion delves into frameworks like FastAPI and explores architectural choices such as serverless functions or containerization. It emphasizes the importance of simplicity and robustness in deployment. While not the fastest option, examples like the Databricks SQL API demonstrate that it can still be suitable for specific use cases where low latency isn't paramount.

Looking towards the future, the speech reflects on the ongoing trend of Delta Lake adoption and emerging projects built on this technology, while also pondering the accessibility of Delta Lake to all organizations. It raises concerns about the potential costs associated with cloud platforms and the learning curve of adopting new technologies like Delta Lake compared to more established solutions like MSSQL in the cloud.

In conclusion my goal with this session is to offer insights into the considerations and challenges involved in leveraging Delta Lake for serving data through APIs, encouraging thoughtful evaluation based on specific business needs, technical requirements, and cost considerations.

SQL Konferenz 2025 Sessionize Event

February 2025 Hanau am Main, Germany

New Stars of Data #8 Sessionize Event

October 2024

Atte Sukari

Senior Data Engineer at Norrin

Helsinki, Finland

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top