Session

Beyond Brute Force: Lessons From Building an ARC-AGI-2 Solver

ARC-AGI-2 is often treated like a purity test for “general intelligence.” My experience building an AI-assisted solver suggests something more actionable: ARC is a stress test for many things, It punishes shallow heuristics and exposes where agentic pipelines quietly fail from spiraling costs and buggy context limits.

This session is a candid, engineering-first journey: how I structured the task, how I used optimization loops (and where they broke), what kinds of prompting patterns helped, and why “one prompt for the whole dataset” is the wrong mental model. I’ll also explain what ARC taught me about real-world agent design: the gap between “benchmark competence” and “system reliability,” and why eval harness design can be more decisive than model choice.

What they’ll leave with:
- View into future agentic benchmarks and direction frontier models are heading
- The design patterns that survive across tasks
- How to optimize for complex black-box style problems to build better agentic systems

Vincent Koc

Distingushed AI Research Engineer, Professor and Keynote Speaker (TEDx, SXSW)

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top