Session
Beyond the Prompt: Using Bias Subspaces to Build Algorithmic Guardrails
As Generative AI moves into production-grade enterprise environments, traditional "keyword-based" guardrails are proving insufficient for catching nuanced, latent biases. While most developers focus on surface-level prompt engineering, the true vulnerabilities often lie deeper within the model’s latent representations.In this session, we will explore a more rigorous, research-backed approach to AI safety. Drawing on my research at UT Austin, I will demonstrate how analyzing GloVe embeddings through bias subspaces can reveal hidden correlations between abstract concepts and ingrained prejudices. We will discuss:Identifying Latent Bias: How models separate abstract vs. concrete words and where human-rated concreteness scores diverge from model behavior.Building Mathematical Guardrails: Moving from "black-box" filtering to algorithmic detection of biased vector directions.Real-World Application: How to apply these research insights to harden autonomous agents and multi-model pipelines against ethical failures.Attendees will walk away with a framework for building "Constitutional" guardrails that address bias at the representation level, ensuring more inclusive and reliable AI deployments.
Shreya Singhal
AI Applied Scientist at Claritev
Austin, Texas, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top