Lester Martin
Trino Developer Advocate - Starburst
Atlanta, Georgia, United States
Actions
Lester Martin is a seasoned developer advocate, trainer, blogger, data engineer, and polyglot programmer focused on data pipelines & data lake analytics using Trino, Iceberg, Hive, Spark, Flink, Kafka, NiFi, NoSQL databases, and, of course, classical RDBMSs. Learn more about Lester at https://linktr.ee/lestermartin.
Area of Expertise
Topics
Apache Iceberg Deep Dive Workshop
Get hands-on with Apache Iceberg in this introductory workshop. We’ll cover the full table lifecycle, from creating your first Iceberg table to managing snapshots, performing rollbacks, and running compactions to improve performance.
In this workshop, you’ll learn how to:
- Create and manipulate Iceberg tables with familiar SQL
- Use snapshots and time travel to audit changes and reproduce results
- Perform rollbacks to recover quickly from bad writes or deployments
- Run compaction and maintenance to improve performance and reduce small files
- Understand the table lifecycle, schema/partition evolution, metadata, and governance touchpoints
Enhancing AI agents with data products
AI agents are all the rage and using TextToSQL features they can easily interrogate your traditional tabular data structures metadata to provide quality results. Creating business-curated and well-documented data products allows richer metadata to be consumed by the underlying LLMs to ultimately enhance the quality & accuracy of the responses provided by AI agents.
After presenting this high-level concept, live demonstrations will show you how these AI agents work on basic table structure metadata as well as the enhanced responses once datasets are curated and documented.
Building RAG apps using SQL
While the first-generation of GenAI tooling for data engineers and AI engineers focused on construction via Python, SQL users can also participate in this critical space. The theory of unstructured document parsing/chunking/embedding transformations and how this can be used in a RAG application will be presented.
This presentation will quickly move to demonstrations of showing a simple unstructured document ETL pipeline that uses SQL to create and store vector embeddings in Apache Iceberg tables. It will then move to show how to use SQL to retrieve additional context from the vector embeddings that most closely match the user’s initial request and then augment the formal request to an LLM before returning the final response.
Universal truths from 3 decades in software & data engineering
Take a break from all the technical sessions for a good-natured, but highly-relevant, look at the environment we work in and the 'universal truths' that help explain it all. These truths have been compiled by a technologist whose 30+ year career has spanned mainframes, client/server, web-based technologies, and data engineering, not to mention methodologies like waterfall, RAD, agile, and DevOps.
Expect insights for sure, but also expect to smile a bit (maybe even laugh) as you realize we all deal with same 'stuff' on a daily basis. Besides a bit of humor, the real goal is to help you become a better engineer and have more effective team dynamics working with all the other 'individuals' around you.
You'll be glad you came and will be recharged for your next technical session, not to mention be a much more enlightened person from all the insights you just gained.
Understanding & Exploring Apache Iceberg v3
Apache Iceberg is an open-source table format that provides database-like functionalities such as ACID transactions, schema evolution, and time travel for large analytic datasets stored in data lakes. Iceberg Spec v3 introduces significant advancements, including binary deletion vectors for faster deletes and updates, richer data types like variant for semi-structured data, nanosecond-precision timestamps, and built-in row lineage for enhanced data governance.
In addition to learning what these cool features can do for you, you'll see a live demo of many of the popular features and walk away with a hands-on exercise in case you want to learn by doing, too.
Ibis: Bringing Optionality to Python Dataframes
Love the power of writing lazy executed Dataframe code in Python that runs on your favorite distributed data cluster? Would like some flexibility to swap out your processing engine for another? If so, you need Optionality in your Python Dataframe API.
Ibis, https://ibis-project.org/, offers a Python Dataframe API that lets your code run on nearly 20 backend data processing systems. It is THE portable Dataframe library. Imagine being able to run your Ibis code in Polars on your laptop and then moving it to PySpark in your favorite cloud provider with just changing a property. No need to imagine; you can do it today.
This presentation walks you through the features available in Ibis as well as compares it with other popular Dataframe APIs. You'll see how to mix-and-match SQL and Dataframe API transformations as desired and how to change the backend system were your code is executed.
You will see a demo of a job running in DuckDB for local testing and then with a single line of code being changed run in a Trino cluster. Step-by-step instructions will be provided to follow along on your laptop or to run the exercise yourself later.
Optimizing Apache Iceberg Performance
Join Lester for a practical, fast-paced session on improving query performance across your data lakehouse. While we focus on Apache Iceberg, the techniques apply broadly to Delta Lake and Apache Hive as well.
We’ll start with optimizations you can apply today as a table consumer: maintaining statistics, using effective filtering and projection, and leveraging caching to reduce latency.
Then we will go under the hood to show how your lakehouse tables should be structured and maintained to improve performance at scale, covering join optimization and file size considerations, as well as compaction, partitioning, bucketing, and file-level sorting.
You’ll learn how to:
- Reduce the amount of scanned data and speed up queries with statistics, filtering, and projection pruning.
- Design tables for scale with partition strategies based on best practices.
- Maintain tables with compaction, metadata rewriting, and expiration.
You will leave with practical guidance you can apply immediately—no replatforming required.
Apache Iceberg ingestion with Apache NiFi
A cornerstone requirement of an Icehouse (Iceberg + Trino) is data ingestion. One approach is to leverage Apache NiFi. NiFi, a multimodal data pipelining tool, has a multitude of processors that can be assembled into a flow to address your specific scenarios. NiFi's low-code/no-code approach allows data engineers to rapidly build, deploy, and monitor their data ingestion & transformation pipelines. NiFi also allows custom processor development with a variety of languages, including Java and Python.
This presentation will iterate through a few common approaches and ultimately demonstrate a rich data pipeline that sources data from Kafka, performs typical transformation processing (including enrichment), and loads data into a high-performance Iceberg table that will be consumed via Trino.
DataEngBytes - Sydney
Building Trino data pipelines with SQL or Python
Implementing the medallion architecture with Starburst
Community Over Code Asia 2025 Sessionize Event
DataEngBytes - Melbourne
Building Trino data pipelines with SQL or Python
Implementing the medallion architecture with Starburst
Berlin Buzzwords 2025
Apache Iceberg ingestion with Apache NiFi
https://www.youtube.com/watch?v=2yH9PfiXb9Y
Lester Martin
Trino Developer Advocate - Starburst
Atlanta, Georgia, United States
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top