Speaker

Gonzalo Ortiz Jaureguizar

Gonzalo Ortiz Jaureguizar

Performance engineer at Startree

Madrid, Spain

Actions

I am a software engineer specialized in developing databases in Java. I love understanding how libraries and frameworks work under the hood and to design and implement high-performance systems. I have worked on prototypes such as ToroDB, the first Spanish database unicorn Devo and since 2022 I'm working at StarTree as an Apache Pinot contributor.

Area of Expertise

  • Information & Communications Technology
  • Physical & Life Sciences

Topics

  • java
  • SQL
  • Apache Pinot
  • Databases
  • Distributed Systems

Manage memory in the JVM as if it were C

As in many other languages, in Java heap memory is managed. That is, the program explicitly reserves memory, but does not indicate when to free it and delegates it to the garbage collector.

This way to deal with memory has several advantages, but it also has some drawbacks that become more problematic when the program has to work with a large amount of data or process it very quickly. For example systems like Apache Kafka or databases like Apache Pinot largely avoid using managed memory and instead manually allocate and free memory in what is known as offheap memory.

This is a technical and practical talk about how to use this memory in the JVM, when it's worth using it, and how it affects our code and deployments including examples in real applications and libraries.

Deep Dive into Apache Pinot’s Multi-Stage Query Engine: Architecture and Performance

Apache Pinot is an open-source, real-time distributed database for low-latency, high-throughput queries on large-scale datasets. As industries like e-commerce and IoT demand real-time analytics, Pinot's original single-stage query engine (SSQE) struggled with complex queries involving joins and window functions.

The Multi-Stage Query Engine (MSQE) addresses these challenges, enabling advanced relational operators and supporting complex query execution. In this session, we’ll delve into MSQE’s key innovations, including its integration with Apache Calcite for smart query planning and gRPC for efficient inter-server communication. We’ll also explore strategies for optimizing data shuffling, thread management, and query execution statistics to scale across large environments.

Join us to discover how MSQE extends Pinot’s capabilities, overcomes its limitations, and transforms it into a robust solution for modern data analysis.

Why Your SQL is Slow: Unmasking Hidden Performance Traps in Apache Pinot

As developers, we've all been there: you write a SQL query that looks perfectly clean and correct, but it runs unexpectedly slow. You check the execution plan and find the database has chosen a brute-force, inefficient path. This is a problem in single-node databases like Postgres, but it is even worse in distributed ones.

Apache Pinot is a distributed OLAP database that can run complex SQL expressions (including window functions and joins) in milliseconds across billions of rows. However, its optimizer applies complex relational algebra transformations while remaining blind to business invariants known only to humans (and sometimes not even them).

This means that a seemingly harmless function call, an innocent offset or limit, the subtle properties of NULL handling, or a slight change in a JOIN condition can unknowingly prevent these transformations, forcing the engine to abandon powerful optimizations. To make matters worse, ORMs like Hibernate are notorious for generating queries that fall into these exact traps.

While this talk uses Apache Pinot for its examples, the concepts are universal. Whether you're using ClickHouse, traditional single-node SQL databases, or even NoSQL databases like MongoDB, the principles for writing optimizer-friendly queries remain the same. Using concrete, real-world examples, I will demonstrate how seemingly equivalent SQL queries can produce wildly different execution plans. You'll leave this talk armed with the knowledge to spot these "optimization killers" and craft queries that work with the database, not against it, ensuring your analytics are always blazingly fast.

This talk tackles a universal developer pain point: writing a "perfect" SQL query that runs inexplicably slowly. Using the Apache Pinot distributed database as a practical case study, I will demystify why this happens.

The session dives into how developers unknowingly sabotage their own queries through subtle NULL semantics, innocent-looking functions, and ORM-generated code that breaks the relational algebra rules core to any modern optimizer.

This is a highly practical, "code-and-consequences" session that will give attendees a new mental model for writing performant SQL that applies not just to Pinot, but to any database they use.

Real-Time Analytics Summit 2025 Sessionize Event

May 2025

DevBcn 2023 Sessionize Event

July 2023 L'Hospitalet de Llobregat, Spain

Gonzalo Ortiz Jaureguizar

Performance engineer at Startree

Madrid, Spain

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top