© Mapbox, © OpenStreetMap

Speaker

Josh Reini

Josh Reini

Developer Advocate for Open Source AI @ Snowflake, TruLens Maintainer

Atlanta, Georgia, United States

Actions

Josh is a developer advocate for Snowflake, previously at TruEra (recently acquired by Snowflake). He is also a maintainer of open-source TruLens, a library to systematically track and evaluate LLM based applications.

Josh has delivered tech talks and workshops to thousands of developers at events including PyData, Global AI Conference, NYC Dev Day, LLMs and the Generative AI Revolution, AI developer meetups including AI Camp and Unstructured SF Meetup.

Area of Expertise

  • Information & Communications Technology

Topics

  • Artificial Inteligence
  • Machine Leaning
  • Large Language Models
  • python

Spanning the Way: How OTel practices can illuminate the path for improving LLM apps

As the complexity of large language model (LLM) applications continues to grow, so does the challenge of debugging and iterating on these applications. Traditional monitoring tools often fall short in providing the level of visibility and context needed to understand the behavior of LLM apps.

In this talk, we will discuss how Open Source TruLens used OpenTelemetry (OTel) spans to enable richer visualization of LLM app traces, and enable the ability to compute evaluations against particular span types and attributes. By leveraging OTel's ability to capture and propagate context across distributed systems, we were able to gain deeper insights into the behavior of our LLM apps.

Specifically, we will discuss how we used OTel spans to:
Identify and visualize the different stages of LLM app execution
Evaluate the performance of specific LLM app components
We will also share how the combination of OTel spans and TruLens's visualization capabilities enabled us to iterate on our LLM apps more quickly and efficiently.

This talk will be of interest to anyone who is interested in using OTel to improve the observability and performance of their LLM applications.

Who Let the Bots Out? A Guide to Evaluating AI Agents

In this talk, I will present a systematic, open source framework for evaluating Gen AI agents—LLM-based systems that manage complex, multi-step tasks—by dissecting their performance into three critical dimensions.

First, I detail how to evaluate tool use by examining each step, from tool selection and parameter capture to tool execution, ensuring that every individual component operates as intended.

Next, we'll understand how trajectory evaluation scrutinizes the agent’s overall workflow, verifying that it adheres to an optimal and efficient sequence of actions.

Finally, I show Goal Evaluation strategies to quantitatively determine if the agent achieves the specified outcomes.

This approach not only identifies failure points across the evaluation dimensions but also provides actionable insights for iterative improvements.

Attendees will gain a robust, reproducible methodology to benchmark and optimize AI agents, bridging the gap between experimental development and reliable production deployment.

What makes a calibrated LLM Judge?

One recurring question from the entire AI community over the past 2 years for using LLM-as-judges has has been their credibility. Namely, “how can we trust a LLM-as-judges, if autoregressive LLMs themselves are stochastic in nature and hence inherently unreliable?”

To answer this, we first need empirical foundations and benchmarks to help establish strong correlation between human evaluation to automatic evaluators.

There are many levers we can take advantage of to improve the reliability of LLM judges, including the underlying LLM used, evaluation output scale, evaluation criteria, model parameters like temperature, chain of thought reasoning and more.

This talk walk through how to benchmark an LLM judge against human evaluators, which levers are available to pull, and how to systematically experiment with these different levers to improve calibration.

Josh Reini

Developer Advocate for Open Source AI @ Snowflake, TruLens Maintainer

Atlanta, Georgia, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top