Beyond the Prompt: Using Bias Subspaces to Build Algorithmic Guardrails

As Generative AI moves into production-grade enterprise environments, traditional "keyword-based" guardrails are proving insufficient for catching nuanced, latent biases. While most developers focus on surface-level prompt engineering, the true vulnerabilities often lie deeper within the model’s latent representations.In this session, we will explore a more rigorous, research-backed approach to AI safety. Drawing on my research at UT Austin, I will demonstrate how analyzing GloVe embeddings through bias subspaces can reveal hidden correlations between abstract concepts and ingrained prejudices. We will discuss:Identifying Latent Bias: How models separate abstract vs. concrete words and where human-rated concreteness scores diverge from model behavior.Building Mathematical Guardrails: Moving from "black-box" filtering to algorithmic detection of biased vector directions.Real-World Application: How to apply these research insights to harden autonomous agents and multi-model pipelines against ethical failures.Attendees will walk away with a framework for building "Constitutional" guardrails that address bias at the representation level, ensuring more inclusive and reliable AI deployments.

Shreya Singhal

AI Applied Scientist at Claritev

Austin, Texas, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Beyond the Prompt: Using Bias Subspaces to Build Algorithmic Guardrails

Shreya Singhal

Links

Actions