Session

I, for one, welcome our jailbroken AI overlords

AI has evolved fast, but the joy of breaking it hasn’t changed at all. Back in the early days you could jailbreak a model with a sentence that sounded like you were negotiating with a confused NPC. Today the systems are bigger, louder and wired into everything from documents to tools, yet they still show that same charming ability to trip over a cleverly placed prompt. In this session we take that feeling of “surely this shouldn’t work” and apply it to the AI solutions teams rely on every day.

We start with the tricks that used to crack open the early models, the ones that folded like badly generated origami. Then we follow the same ideas into modern copilots, RAG pipelines and agent setups where a single phrasing can twist logic, steer behaviour or quietly nudge the system into revealing more than it should. You will see how the old mischief has grown up, gotten access to your data, and still hasn’t learned to handle peer pressure.

Along the way we look at how these attacks happen and what you can actually do about them. Not silver bullets, but real patterns you can use to keep things sane. Red teaming that helps you find the weak spots before someone else does. Mitigations that keep your assistant from improvising its way into accidental insider‑threat roleplay. Guardrails that are subtle enough not to ruin the magic, but strong enough to stop your AI from going full side quest when it should stay on task.

Expect a fun tour through AI misbehaviour. There will be models arguing with themselves, prompts that escalate faster than a LAN party dispute, and agents that act like they found the root password in a loot chest. If you enjoy watching smart systems make very silly decisions while learning how to protect your own, this will be your kind of chaos.

Kim Berg

CTO Data & AI @Sogeti Sweden | Dual Microsoft MVP AI & IoT

Norrköping, Sweden

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top