Mind Games: Hacking and Hardening LLMs

Ever wondered if you could trick an AI into doing your bidding? This session explores the "mind games" played by attackers against AI models. We'll reveal the techniques behind modern adversarial attacks, starting with prompt injection, where malicious instructions hijack a model's behavior, and prompt poisoning, a more subtle attack that corrupts a model's training data.

The talk then pivots to the critical defenses and introduces a comprehensive security strategy. You'll learn how to implement red, blue, and purple teaming principles to build a proactive and resilient security posture for your AI applications. We'll explain how red teams actively find vulnerabilities, how blue teams build robust defenses, and how the purple team collaboration creates a continuous feedback loop for improvement.

Finally, we'll dive into practical, hands-on examples using the PyRIT (Python Risk Identification Tool) framework, an open-source tool for automated red teaming. You'll leave with actionable insights and the knowledge to start testing and securing your own AI applications against the most common exploits. This talk is for developers, security professionals, and anyone who wants to stay ahead of the curve in the rapidly evolving world of AI security.

Prachi Kedar

AI/ML Engineer | Computer Vision & Generative AI Enthusiast

Milan, Italy

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Mind Games: Hacking and Hardening LLMs

Prachi Kedar

Links

Actions