Session
SAY NOT TO VULN: Scaffolding, Guardrails, and Keeping Vulnerabilities Out of Agentic Code
Two things broke at the same time. AI now writes most of the code your team ships, and most AppSec programs are still optimised for the world where humans wrote it. Median time to exploit is at zero days. Median time to triage is still measured in weeks. The gap is not closing. It is widening every quarter the agent ships more code.
Most of the AI-AppSec talk circuit reaches for the same answer: throw a frontier model at the PR review and hope. That works until the bill arrives. $25 per Opus review across a real engineering org bankrupts the budget before it catches anything you would not have caught with a decent SAST rule. The bigger problem is more boring: nobody knows who owns the asset, where the agent is allowed to run, or which findings actually map to production. You cannot AI your way out of a broken ownership model.
This talk is the field guide for the other half of the problem. The half where AppSec works because the basics work. I will walk through what "doing the basics right" actually looks like in 2026: where the agent should live and where it should not, how to attribute work to a team that can act on it, what a Skill (.mdc, AGENTS.md, .cursorrules) needs to contain to stop a vulnerability before the agent writes it, and what an Allow / Block / Verify gate on the supply chain looks like in production. Then I will show where AI earns its place in the stack: finding the vulnerabilities your scanners miss because they cannot read the graph, generating remediation as a single PR or as a 200-repo campaign, and mapping the threats that actually apply to your stack rather than the ones the CVE feed thinks should worry you.
The thesis fits on one slide. Vibe coding will introduce vulnerabilities. Vibe coding with scaffolding can prevent more vulnerabilities than the agent would have produced. The agent is not the enemy. The agent without a fence is the enemy.
You leave with a buildable scaffolding pattern, four reviewer checks the agent can run before the PR opens, the attribution model that turns 112,000 raw findings into 300 things a team can actually fix, and the 90-day rollout plan with the metrics that show the program working before the dashboard goes green.
Key discussion points
1. Where the agent lives, and where it should not.
The coding agent shares a process with your shell, your credentials helper, your editor, your package manager, and your environment variables. Every subsystem the agent touches is now in scope for the agent's threat model. I will walk the audience through the three places teams put the agent today (IDE plugin, terminal CLI, hosted "agent platform"), what each one inherits in terms of trust, and the architectural checklist for deciding what a coding agent gets to talk to. The recent CVE clusters (IDEsaster's 24 CVEs across 10 AI coding IDEs, the Claude Code TERMINAL injection chain, the Cursor / Cline / Copilot prompt-injection class) all share one root cause: the agent runs with developer privilege but without the developer's safety controls.
2. Attributing work to the right team.
The #1 reason AppSec programs stall is not the scanner. It is that nobody owns the finding. I will show the four-problem model: attribution (which team), lineage (which build), traceability (which deploy), code-and-cloud reachability (does the path actually exist in production). The honest version of this is unglamorous. Most of it is plumbing. Most of it is fixing the CMDB, the SBOM, and the deploy manifest before AI gets near the pipeline. I will show a real example where a 112K-finding queue compressed to 300 actionable items because attribution was fixed first, not because a model rewrote the queue.
3. Doing the basics well, before AI.
The boring controls that work. Secure defaults in the framework. Auth dependency on every route, not most. Validated input at the boundary, not in the controller. Secrets out of localStorage. SBOM at build, not at deploy. Pinned dependencies with a freshness gate (no package under N hours old, no maintainer change in the last 7 days). These are not new. They are the controls that stop the AI-class attacks too. I will pair each control to a specific incident from the last twelve months where its absence was the root cause.
4. Where the Skill / scaffolding layer earns its keep.
A Skill is a versioned, scoped, auto-loaded rules file the coding agent reads before it writes a line of code. 00-meta.mdc defines how rules are written. 10-repo-structure.mdc is the authoritative folder map so the agent does not hallucinate file paths. 11-tech-stack.mdc is parsed from package.json so the agent does not invent libraries you do not use. 21-security.mdc is the security checklist the reviewer agent runs before the PR opens: auth dependency, innerHTML sanitisation, outbound fetch / SSRF, localStorage writes, SQL parameterisation, security headers. I will show a real scaffold from an open-source repo, the diff it forces, and what happens to a "vibe-coded" PR when the scaffold is in place. This is the central buildable artefact of the talk.
5. Where AI earns its place in the stack.
Three jobs, in order of how badly people get them wrong.
Identify vulnerabilities the scanner missed. A frontier model against a code graph compresses 100 static findings to 8 hypotheses, three of which become real bugs. The trick is the graph, not the model. I will show the maths and the failure modes (false positives without triple-pass validation, token costs without graph compression).
Help with remediation, single and bulk. Single remediation as a contextual PR with the patch, the test, and the threat context. Bulk remediation as a Log4j-style campaign across 200 services with one approval gate, ownership-attributed, with a rollback plan. The honest limit: bulk only works when attribution is already clean.
Map threats that actually damage you. CWE to CAPEC to MITRE ATT&CK to known threat actor, anchored to your stack. Not the OWASP Top 10 in the abstract. The five techniques that are exploited against your specific framework, your specific cloud, your specific industry, this quarter.
6. Bonus thesis: vibe coding versus vibe secure coding.
Same agent. Same prompt. Different outcome. With scaffolding, the agent has to consult the rules file before writing the route. Without scaffolding, the agent writes the route the way the training data biases it to. I will run the side-by-side on stage. Same task, no scaffold versus scaffold. The scaffolded version produces fewer vulnerabilities than the median human developer in the team it replaced. That is the line that should make people uncomfortable and the line that should make the OWASP community take this layer seriously.
Takeaways for attendees
A mental model for where the coding agent belongs in your architecture and what it is allowed to touch. With the checklist. With the recent CVEs that show what happens when the checklist is wrong.
A working attribution model. Four problems, one diagram, four sources of truth. The thing that turns the backlog from a triage burden into a queue an engineer can act on without a meeting.
A Skill / scaffolding template you can fork on Monday. Real .mdc examples, real reviewer-agent checks, real before-and-after diffs from a public repo.
The economic argument for graph-based AI in AppSec. Why brute-force LLM review bankrupts the program and what makes context-aware review affordable. The maths, not the marketing.
Three places to put AI in your AppSec stack and three places to keep it out. With the failure modes for each.
The 90-day rollout plan with four metrics a non-security stakeholder can read. Prevention rate above 60% on AI-generated PRs. Block rate 5 to 15% (zero means the gate is off). Malicious packages blocked per month, trending down at steady state. Burn rate positive at day 90 (findings closed per week greater than findings created per week).
A diagnostic for which control to install first based on the bottleneck you actually have. PR velocity, backlog size, agent adoption, or supply chain. Nobody leaves with a generic checklist.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top