Javier Ramirez

Javier Ramirez

Developer —and Agent— Advocate at QuestDB. Fan of open source, developer communities, and data/ML. All around happy person. He/him

Madrid, Spain

Actions

As Head of Developer —and Agent— Relations at QuestDB, I help developers make the most of their (fast) data. These days that increasingly means AI agents: I make QuestDB easy for an agent to use through skills, MCP servers, machine-legible docs, standard interfaces like SQL, REST, and Parquet, and a cookbook of ready-made recipes, so agents produce correct, fast results with as few tokens as possible. I also make sure the core team behind QuestDB listens to absolutely every piece of feedback I get, and I facilitate collaboration in our open source repository.

I love data, big and small, and I enjoy processing it with streaming engines, ML models, SQL, NoSQL, graph, in-memory, and time-series databases. I prefer distributed, scalable, always-on systems.

Before working at QuestDB I spent over 20 years developing software professionally and sharing what I learnt with the community, including three years as a Developer Advocate at AWS and five years as a Google Developer Expert in Cloud. I've spoken at events in over 25 countries, mentored dozens of start-ups, taught for 6 years at different universities, and trained hundreds of professionals on cloud and data engineering.

Badges

  • Most Active Speaker 2023

Area of Expertise

  • Information & Communications Technology

Topics

  • Databases
  • Cloud databases
  • Time Series Data
  • questdb
  • AWS Databases
  • Google Cloud Paltform
  • aws
  • Machine Learning and Artificial Intelligence
  • Streaming Data Analytics
  • Analytics and Big Data
  • Cloud analytics
  • Predictive Analytics
  • Developer Relations

AI Agents as Data Engineers: What Actually Happens When You Let Them Loose

AI coding agents are getting serious attention as productivity tools for data teams. But what happens when you give one a real data engineering task with a live audience watching?

We ran that experiment.

We tasked two leading AI agents, Claude Code and OpenAI Codex, with building a real-time market data pipeline from scratch: connect to a live crypto feed, ingest order book data into a time-series database, materialize aggregates at multiple intervals, and ship a working Grafana dashboard with OHLC, VWAP, and Bollinger Bands.

The goal: zero to running pipeline in under two minutes.

Both agents could write the code. The failures came at the operational layer: managing background processes, sequencing setup steps, and knowing when a dependency was truly healthy before moving forward.

One agent repeatedly deployed an empty dashboard while ingestion was still starting up. Rephrasing the instructions never fixed it.

The real lesson had nothing to do with prompting.

By moving a data-presence check into the deploy script itself, the agent physically could not complete the task in the wrong order. Architecture enforced what instructions never could.

This talk shares an honest, experience-based account of where current AI coding agents succeed and fail at data engineering tasks, and a practical framework for designing pipelines that work reliably with agentic tooling, not despite its limitations, but around them.

QuestDB: The building blocks of a fast open-source time-series database

Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.

It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed.

We will learn how it deals with data ingestion, and which SQL extensions it implements for working with time-series efficiently.

We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or data deduplication.

QuestDB: Ingesting several million rows per second on a single instance.

When doing real-time analytics, you don't only want your database to ingest as quickly as possible, but also to have your data available for (ultrafast) querying as soon as possible.

In this session I will show you the technical decisions we made when building QuestDB, an open source time-series database, and how we can achieve over four million row writes per second without blocking or slowing down the reads, with exactly-once-semantics.

Ingesting and analysing market data in real time using open source tools

Building a real-time analytics pipeline for market data can be time-consuming, but leveraging open-source tools can help speed up the process.

In this session, I’ll present a project template that serves as a foundation for building a high-performance system using Open Source tools.

We’ll start with data ingestion. While we could write directly to a fast database, I’ll use Apache Kafka to ingest data and demonstrate how to send messages using Python, JavaScript, and Go.

Now, we need an analytics database, and for real-time data, a time-series database seems like a good match. I will demonstrate how to use QuestDB, an Apache 2.0-licensed project, to ingest data at high speed and run queries in milliseconds or faster.

Dashboards are essential when working with market data. I’ll show how to use Grafana OSS to create real-time charts that update multiple times per second and demonstrate how Perspective, a FINOS project, can be used to embed real-time financial visualizations.

To wrap up, I’ll integrate Jupyter Notebook for interactive data exploration and time-series forecasting.

This session will be demo-driven, and I’ll share all the code so you can use it as a starting point.

From 8M to 15M Rows/Sec: Designing a Columnar Wire Protocol for Zero-Overhead Ingestion

Note: The talk is delivered by QuestDB's CTO and co-founder, Vlad, and Fast Data Advocate Javier.

For years, QuestDB ingested data via ILP (InfluxDB Line Protocol), a text-based, row-oriented format. We'd pushed it to ~8 million rows/sec. But at that scale, you're paying for overhead you shouldn't have to: UTF-8 number parsing on every value, row-by-row WAL appends, and repeated string-to-symbol-ID lookups. We designed a replacement from scratch and nearly doubled throughput to ~15 million rows/sec.
QWP (QuestDB Wire Protocol) is a binary, column-oriented ingestion protocol. Booleans are bit-packed, timestamps use Gorilla delta-of-delta compression (~1 bit per value for monotonic sequences), symbols use delta dictionaries, and the schema is hashed with XXH64 so repeated batches transmit 8 bytes instead of full column definitions.
We'll walk through the key design decisions:
Column-oriented wire format: the client groups values by column, so the server writes entire columns into the WAL via memcpy instead of row-by-row appends.
Zero-allocation WebSocket path: frame parsing operates directly on native memory buffers, masking/unmasking happens in-place, all cursors are reusable. No object allocation on the hot path.
Two-layer symbol caching: per-column and per-connection caches eliminate the repeated string lookups that dominated CPU time in the text protocol.
UDP for fire-and-forget: using Linux recvmmsg to batch-receive datagrams. Packet loss means silent data loss, and that's a documented, deliberate trade-off.
We'll show allocation profiling across ILP-over-TCP, ILP-over-HTTP, and QWP-over-WebSocket, and explain why a column-oriented wire format is a natural fit for a column-oriented storage engine. All code and the full wire specification are open source under Apache 2.0.

Cómo hemos convertido una DB open source en un SaaS multi-tenant usando K8s

QuestDB es una base de datos open source de alto rendimiento. Mucha gente nos comentaba que les gustaría usarla como servicio, sin tener que gestionar las máquinas. Así que nos pusimos manos a la obra para desarrollar una solución que nos permitiese lanzar instancias de QuestDB con provisionado, monitorización, seguridad o actualizaciones totalmente gestionadas.

Unos cuantos clusters de Kubernetes más tarde, conseguimos lanzar nuestra oferta de QuestDB Cloud. Esta charla es la historia de cómo llegamos ahí. Hablaré de herramientas como Calico, Karpenter, CoreDNS, Telegraf, Prometheus, Loki o Grafana, pero también de retos como autenticación, facturación, multi-nube, o de a qué tienes que decir que no para poder sobrevivir en la nube.

Building your own custom sink connector for Kafka

Apache Kafka has an impressive ecosystem of compatible tools, but if you are building your own data store, you might need to create a custom connector for storing data from Kafka Connect reliably and at speed.

In this session I will share the highlights of rolling our own connector for QuestDB, an open source time-series database.

Building a high-performance database in Java is impossible!

Or is it not? We built QuestDB, an open-source high-performance database, primarily in Java. What do I mean by 'high-performance' anyway? A single machine running QuestDB can ingest millions of rows per second and query billions. There is no magic here, just a lot of hard work and clever engineering. We focused on efficient data structures, parallel execution pipelines, and mechanical sympathy. Our Java is somewhat unorthodox: We aim for zero steady-state allocation, and we are not hesitant to jump to native code when needed.

In this session, I'll take you to the sausage factory and show you some of the techniques we use, including:

- SIMD-based optimizations from Java and native code for maximum throughput. We even build our own Just In Time (JIT) compiler!

- Implementing parallel execution pipelines to use multi-core processors effectively. - Our approach to off-heap memory management to achieve near-zero GC pauses.

- Crafting specialized data structures and algorithms suitable for high-performance Java.

I'll also answer the most common question people ask when they learn about our unorthodox Java: Why the heck do you use Java at all?!

¿Se puede vivir del open source?

Hubo un tiempo en el que para usar casi cualquier componente de software era necesario pagar una licencia. Afortunadamente, hoy en día gracias al software libre y de código abierto (FOSS), se puede desarrollar prácticamente cualquier aplicación usando componentes totalmente gratis.

Pero, si el software es gratis, ¿Quién lo desarrolla? ¿Trabaja toda la comunidad de software libre de forma altruista? ¿Se puede desarrollar software libre de forma profesional? De hecho, hay quien dice que el código abierto tal y como lo conocimos ya no existe, y que lo que hay hoy en día es otra cosa.

En esta charla hablaré de cómo se puede monetizar el código libre, y de algunos problemas o posibles conflictos que puedes encontrarte en el camino.

Además de hablarte de lo que otros proyectos han hecho para ser sostenibles, te contaré cómo hacemos desde QuestDB para desarrollar una base de datos de código abierto y ser capaces de mantener un equipo estable viviendo exclusivamente de ello.

Your database cannot do this (well)

Relational databases were created a long time ago for a simpler world. Even if they are still awesome tools for generic workloads, there are some things they cannot do well.

In this session I will speak about purpose-built databases that you can use for specific business scenarios. We will see the type of queries you can run on a Graph database, a Document Database, and a Time-Series database. We will then see how a relational database could also be used for the same use cases, just in a much more complex way.

Deduplicating and analysing time-series data with Apache Beam and QuestDB

Time series data pipelines tend to prioritise speed and freshness over completeness and integrity. In such scenarios, it is very common to ingest duplicate data, which may be fine for many analytical use cases, but is very inconvenient for others.

There are many open source databases built specifically for the speed and query semantics of time series, and most of them lack automatic deduplication of events in near real-time. One such database is QuestDB, which requires a manual batch process to deduplicate ingested data.

In this talk, we will see how we can successfully use Apache Beam to deduplicate streaming time series, which can then be analysed by a time series database.

DATA:Scotland 2024 Sessionize Event

September 2024

J On The Beach

How we added replication to QuestDB, a time-series database

May 2024 Málaga, Spain

Commit Conf

Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos de alto rendimiento

April 2024 Madrid, Spain

FOSDEM 2024

Ingesting and analyzing millions of events per second in real-time using open source tools

February 2024 Brussels, Belgium

Open Source Analytics Conference 2023 Sessionize Event

December 2023

Current 2023: The Next Generation of Kafka Summit Sessionize Event

September 2023 San Jose, California, United States

Beam Summit 2023 Sessionize Event

July 2023

DevBcn 2023 Sessionize Event

July 2023 L'Hospitalet de Llobregat, Spain

Berlin Buzzwords

Ingesting over 4 million rows a second on a single instance

June 2023 Berlin, Germany

Codemotion Madrid 2023 Sessionize Event

May 2023 Madrid, Spain

DeveloperWeek Europe 2023 Sessionize Event

April 2023

Javier Ramirez

Developer —and Agent— Advocate at QuestDB. Fan of open source, developer communities, and data/ML. All around happy person. He/him

Madrid, Spain

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top