LLMs as Data Architects

Testing data-intensive applications often feels like a choice between two evils: using risky real-world data or spending hours writing brittle scripts for "dummy" data that lacks realism. But what if we could use Large Language Models (LLMs) not just to chat, but to architect complex, structured, and schema-validated datasets on demand?

In this talk, we explore how to turn LLMs into reliable data architects. We will move beyond simple prompting and dive into Type-Safe Generation using Python. You will learn how to use Pydantic to define your "data contract" and leverage libraries like Instructor or Outlines to force LLMs to output perfectly formatted JSON that matches your application’s requirements every single time.

Key takeaways include:

Why "Prompting for JSON" usually fails and how to fix it with JSON Schema.

Using Pydantic to define complex, nested data structures for testing.

A comparison of OpenAI’s Structured Outputs vs. Local LLM constraints (using Outlines).

Real-world patterns for generating "Long-Tail" edge cases that traditional mock libraries miss.

Varun Joshi

Senior Data Engineer at AWS

Seattle, Washington, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

LLMs as Data Architects

Varun Joshi

Links

Actions