Session

Garbage In, Insight Out: Document Intelligence for AI-Infused Java Applications

We blame the model when AI hallucinates. But the most common cause isn't a flawed model — it's flawed data. Or worse, no meaningful data at all.

A document might look perfect on screen, but underneath it's often a flat stream of characters without structure and headings that are just big bold text, tables reduced to a jumble of disconnected cells, figures with captions that bear no relationship to the image they describe, and footnotes scattered wherever the layout engine decided to put them. The relationships between sections, the hierarchy of content, the intent behind a table, etc is lost. What looks like rich, structured content to a human is, to an AI, little more than an undifferentiated wall of tokens.

Enter Docling, an open-source toolkit that tackles this problem head-on: it converts messy, real-world documents into clean, structured, AI-ready data. It understands page layout, table structure, reading order, formulas, and code blocks, producing a rich unified document representation that downstream AI systems can actually reason over.

The catch? Everything we've described lives in the Python ecosystem. And last time I checked, I'm a Java Champion! Does that mean I'm out of luck? Spoiler - no! That's exactly the gap Docling Java was built to close!

In this session, I'll introduce Docling Java and dive into how it brings first-class document intelligence to the JVM. I'll walk through the library's architecture and show how it plugs naturally into the frameworks and AI stacks you're likely already using.

Along the way, I'll build up from simple to more sophisticated patterns, with a particular focus on building RAG pipelines that produce the kind of meaningful, structured data that gives your model a fighting chance to stop taking the blame. By the end, you'll leave with concrete, runnable patterns you can take back to your own Java codebase.

Eric Deandrea

Java Champion & Senior Principal Software Engineer, IBM

Manchester, New Hampshire, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top