Session

Structuring the Unstructured: Advanced Document Parsing for AI Workflows

Modern organizations generate vast amounts of data stored in diverse and often unstructured formats, such as PDFs, scanned documents, and proprietary file types. For engineers working with AI, the challenge isn’t simply extracting text—it’s preserving the structure, context, and relationships within the data. Whether fine-tuning models or building retrieval-augmented generation (RAG) pipelines, effective document processing is essential to powering actionable insights.

This session dives into the techniques and open source tools needed to transform unstructured documents into structured formats like JSON or Markdown, ready for AI workflows. You’ll learn how to handle challenges like multi-page tables, image-heavy layouts, and scanned documents using context-aware methods. Join this session as we explore how to efficiently bridge the gap between unstructured data and AI-powered applications, and help you achieve better results in your AI projects.

Cedric Clyburn

Senior Developer Advocate, Red Hat

New York City, New York, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top