Session

Jailbreaking and Protecting LLM Apps: A Public Wargame Experiment

This presentation captures findings from a public AI security challenge designed to evaluate the resilience of Large Language Models (LLMs) against prompt injection attacks. The experiment involved an Attack & Defence wargame where participants were tasked with securing their LLMs, specifically preventing secret phrase disclosure. They were given access to the source code of the app that interfaced with OpenAI API. Simultaneously, participants were to attack other LLMs in an attempt to exfiltrate the secret phrase. A notable aspect of this experiment was the real-time evolution of defensive strategies and offensive tactics by participants. The results indicated that all LLMs were exploited at least once, thus highlighting the complexity behind LLM security and lack of in-depth understanding of prompt injection. This underscores how there is no silver bullet for securing against prompt injection and that it remains as an open problem.

Pedram Hayati

Founder SecDim.com, SecTalks.org. Senior Lecturer UNSW.edu

Sydney, Australia

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top