Session
A Mad Scientist’s Guide to Testing M365 Copilot and Copilot Studio Agents (Without Iosing your job)
Testing and breaking Copilots without code to understand what your governance and agent controls are missing.
Everyone is shipping AI. Almost no one is testing it.
And those who are… have the thousand-yard stare of people who’ve seen a model rewrite HR policy at 2 a.m.
Your organisation just unleashed Copilot licences on thousands of unsuspecting users. Somewhere in the building, three Copilot Studio agents were stitched together late last Tuesday. Two can email customers. One can read finance data. None of them have been tested. At all.
This session is for the brave experimenter now responsible for that creature.
Together we’ll descend into the lab and become the agent’s least favourite person: the one who breaks it on purpose—before someone with a Reddit account, a grudge, and a free afternoon does it for you. You’ll see why chat-based Copilot and action-based agents are entirely different species, why most disasters aren’t caused by brilliant attackers but by boring sludge (a 300 page PDF, a forwarded email chain, a forgotten SharePoint site), and why “it worked in the demo” is the most expensive sentence ever spoken in enterprise AI.
Expect two live experiments. No code. No smoke machines. Just consequences.
• Experiment One: We gently poison an HR Copilot Studio agent using a single SharePoint document and watch it cheerfully violate its own rules. Then we stabilise the specimen using configuration alone.
• Experiment Two: We wire two agents together and observe what happens when a harmless looking summary step quietly launders a prompt injection between them. One guardrail later, the monster behaves. Same input. Completely different outcome.
Along the way we’ll dissect:
prompt injection without the scary jargon, the four kinds of tests most teams mysteriously skip, the ethical decisions hiding behind perfectly innocent configuration toggles, and what really changes the moment you let agents talk to each other (spoiler: your bill screams first).
It’s technical. It’s entertaining. You will leave with a test plan small enough to actually run on Monday—no torches required.
________________________________________
What You’ll Take Away 🧪
A five minute surface map
Four questions you must answer before any test plan exists:
what your agent can read, what it can write, who it thinks it is, and who is allowed to talk to it.
The four tests teams pretend they don’t need
Functional, behavioural, adversarial, and evaluative testing—plus concrete examples for both chatty copilots and action wielding agents.
A reusable adversarial playbook
Six cheap, repeatable attack patterns you can unleash on any Copilot or Copilot Studio agent:
direct override, indirect injection, role play laundering, scope drift, tool coercion, and output smuggling.
How to break agents without touching code
Using only materials a normal user can produce: documents, emails, transcripts, and “helpful” knowledge sources no one groomed.
A multi agent failure mode checklist
Prompt laundering, identity confusion, runaway loops, silent disagreement—what they look like, why they happen, and how to test for each before they escape the lab.
A six step test plan you can steal
Small enough to fit on a single page. Tough enough to survive a model update on a Tuesday morning.
Clearer sight of the ethics panel you never convened
Groundedness vs honest uncertainty. Helpful retrieval vs oversharing. Autonomy vs ask first—each hiding behind a deceptively friendly checkbox.
________________________________________
Who Should Attend 🧠
Solution architects, makers, admins, governance leads, and the QA curious.
Anyone who is now responsible for an AI system they didn’t build, don’t fully control, and can’t quite see inside.
Bring scepticism.
Leave with a checklist.
And maybe label your switches more carefully.
Claire Edgson
Capgemini - Microsoft CX CTO Europe
Kidderminster, United Kingdom
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top