Session

5 things I wish I hadn’t done building my AI agent

Most talks about AI agents focus on success stories and best-case outcomes. This talk is about what can actually go wrong when you ship an scale-up AI agent in a start-up.

Over the past 18 months, our team in Baz built and scaled an AI-powered Code Review Agent used daily by thousands of across the world.
To move fast in this crazy market, we made several architectural, product, and UX decisions that seemed reasonable at the time, but later turned into expensive mistakes. Some cost us users, and some hit our precious revenue.

In this session, I’ll share five concrete pitfalls we encountered while building a real AI coding agent, why they happened, how we detected them, and the pivots that ultimately worked.

This is not a theoretical talk: every example comes from a production system, and will include real system diagrams, usage data, and how the fixes changed behavior in production.(alongside a lot of self humor :)

1. We built a “smarter” agent, and it got "dumber"
Why adding more context, tools, and responsibilities reduced accuracy instead of improving it

2. We let users choose the model, and lost control of the results
How exposing LLM choice destroyed consistency and meaningful feedback

3. We optimized for an AI app, not for developer behavior
Why real adoption only starts when the agent lives where decisions were already being made (GH, GL or the IDE)

4. Our guardrails worked, until the providers changed the models
How silent model updates broke engineering assumptions and eroded user trust

5. Our metrics looked great, but users were still churning
Why industry-standard AI metrics (like accepted suggestions and time-to-merge) missed the signal that actually won (or lost) customers

Shachar Azriel

VP Product @ Baz

Tel Aviv, Israel

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top