Breach on Autopilot: Closing the CTEM Validation Gap with Autonomous Adversary Emulation

General Information

Category: Artificial Intelligence (AI) in Security / Offensive Cyber Security Operations
Keywords: Autonomous Adversary Emulation, LLM Reasoning, MITRE ATT&CK, Air-Gapped Security, Security Control Validation, Purple Teaming

Moving Beyond the AI Buzzword: Building an Autonomous Offensive Pipeline
Modern threat actors execute coordinated, context-aware, multi-stage kill-chains, yet many automated offensive tools remain fragmented and rely on brittle, static scripts. This session introduces an in-house, autonomous offensive framework designed to bridge the gap between actual asset intelligence (Blue Hunter) and stateful, automated execution (Red Hunter) without relying on external cloud APIs.

We will demonstrate a unified "Blue-to-Red" validation workflow where AI does not merely generate text, but acts as a deterministic strategic operative. We introduce the Dual Autonomy Architecture, a tiered model allowing operators to scale from strictly controlled deterministic validation to full adversary emulation. In "High Autonomy" mode, the framework utilizes an error-based State Machine to perform Auto-Resolution—capturing execution failures (e.g., EDR blocks or missing dependencies) and feeding them back into local LLMs to dynamically mutate payloads and pivot attack chains in real-time.
-----------------------------------------------------------------------------------------
Key Technical Highlights:

Context-Aware Strategic Pivot: A seamless programmatic bridge that transitions from the cognitive engine’s ASM (Attack Surface Management) data and CVE prioritization directly into tactical execution, automatically classifying and routing techniques into Agent or Agentless attack paths based on target reachability.

Target-Dependent Payload Pre-rendering: Eliminating runtime mapping errors by having the AI intelligence engine dynamically pre-render "ready-to-fire" payloads with target-specific variables (IPs, open ports, credentials) completely hardcoded prior to execution.

Air-Gapped Resilience (Zero Data Leak): A fully offline architecture utilizing quantized local LLMs and serialized MITRE ATT&CK/Atomic Red Team vector caches, designed for highly regulated environments that strictly prohibit external data transmission.

Tactical Analytics & Detection Engineering: Transforming raw execution logs and self-healing mutation attempts into a localized, empirical MITRE ATT&CK Coverage Matrix, explicitly pinpointing true detection gaps for Purple Teams.

Join us to explore the engineering depths of how AI moves beyond theoretical concepts into a reproducible, self-healing offensive pipeline that empowers defenders to empirically validate their security posture and focus on threat-informed defense.
-----------------------------------------------------------------------------------------
From Asset Cognition to Real-Time Self-Healing: Designing and Implementing a Fully Offline Autonomous Offensive Framework to Close the CTEM Validation Gap

As Continuous Threat Exposure Management (CTEM) establishes itself as the standard for modern enterprise security, asset discovery technologies that identify attack surfaces have become highly automated. However, the "Validation" phase—proving whether an identified vulnerability is actually exploitable in a target environment—remains a severe bottleneck, heavily reliant on the manual intervention and resources of defense teams.

To automate this validation process, most Breach and Attack Simulation (BAS) tools widely used in the industry today remain stuck in a binary approach, blindly projecting pre-compiled, static atomic scripts against targets. This static approach causes fatal crashes in the face of minor environmental changes, such as slight patch version differences in the target OS, missing dependency libraries, or the intervention of modern Endpoint Detection and Response (EDR) solutions. This results in simulation failures caused not by superior security controls, but by the "brittleness" of the execution scripts themselves. Conversely, while there are attempts to introduce external cloud-based Large Language Model (LLM) APIs into the offensive pipeline to overcome this lack of flexibility, this requires transmitting sensitive internal network topologies and asset vulnerability data to external servers. Consequently, its adoption is fundamentally impossible in financial and public infrastructures that strictly mandate Zero Trust and Zero Data Leak architectures.

In this session, we will unveil for the first time the core architecture and engineering mechanics of a fully offline-based autonomous offensive framework that perfectly overcomes both the static limitations of existing BAS systems and the data leak risks of cloud AI. This framework consists of the programmatic integration of a cognitive engine (Blue Hunter), which formulates strategies based on actual asset data exposed on the internet rather than a virtual sandbox, and an execution engine (Red Hunter), which controls the execution flow by autonomously analyzing errors. Specifically, to perform top-tier reasoning and code generation even in environments completely disconnected from external networks, the system operates code-generation-specific models and custom-finetuned cybersecurity models locally on its own Ollama server. Through this, it satisfies the absolute Zero Data Leak requirement, ensuring that not a single byte of customer data is leaked externally.

The first core of the framework is its multi-source-based Attack Surface Management (ASM) pipeline and rule-based asset normalization. Moving beyond simple port scanning, the cognitive engine resolves DNS records based on the root domain and correlates internet-exposed ports and Common Vulnerabilities and Exposures (CVE) data. Simultaneously, it collects vast amounts of unstructured OSINT data, including subdomain extraction via Certificate Transparency logs and historical URL patterns. The collected heterogeneous data goes through a rule evaluation engine that performs multi-layer matching of banner information, HTTP response headers, TLS subject strings, and CPE identifiers using regular expressions, rather than simple version matching, to meticulously construct the target's risk profile.

This established real-world asset context then passes through an intelligent payload classification algorithm. To overcome the limitations of existing tools that mandatorily require agents to be installed on target systems or stop at external network scanning, the framework operates a hybrid heuristic algorithm combining a Tactic and Technique ID (TID) dictionary and behavioral keywords within the AI-generated attack scenarios. Through this, the framework autonomously determines and dynamically routes whether the technique requires an Agent method executing inside the target, or an Agentless method capable of striking from the outside. Furthermore, to eradicate variable mapping errors at runtime, the cognitive engine does not simply pass the TID to the execution engine. For each attack technique object, it directly generates a "Ready-to-fire," target-dependent payload with the asset context—such as the actual IP, identified FQDN, and open ports—completely hardcoded, encapsulating it in a JSON schema.

The formulated scenario is handed over to the execution engine via a signature-based one-way Single Sign-On (SSO) architecture that bridges the two engines. Protected by timestamps and Hash-based Message Authentication Codes (HMAC), this bridge allows security analysts to instantly transition from the asset analysis dashboard to the red teaming operational environment without additional authentication or context switching.

Upon receiving the scenario, the execution engine automatically triggers a campaign based on the identified actual CVE list. Using a mapping dictionary built on NVD API data, CVEs are translated into Common Weakness Enumerations (CWEs), which are automatically mapped to specific MITRE ATT&CK TID sequences corresponding to OS Command Injection, SQL Injection, Authentication Bypass, and more. The execution engine's most powerful engineering achievement lies in its Dual Autonomy mechanism, which operators can select based on the operational environment, and its State Machine based on standard error (stderr) parsing.

In environments requiring extreme predictability, the Low Autonomy mode is engaged. Upon execution failure, it suppresses AI generation and safely falls back to numerous static atomic payloads pre-loaded in the framework. Conversely, in High Autonomy mode—which emulates the flexibility of a real attacker—a true autonomous execution loop unfolds. If execution is blocked by a security solution after payload delivery, the execution engine does not simply log a failure and halt. It parses the returned error log in real-time and feeds it back to the local AI engine.

Finally, to fundamentally prevent destructive outages on production networks that complete autonomy might cause, a rigorous Risk Assessor module is implemented within the execution layer. All payloads dynamically mutated by the local AI must pass through this static heuristic engine immediately before being transmitted to the target. It evaluates risk weights by scanning for the presence of fatal system destruction keywords, such as disk formatting, deletion of critical system files, or complete deactivation of network firewalls. If the calculated risk score exceeds a configured threshold, the framework immediately suspends execution at the API level and activates a Human-in-the-Loop control gate that mandates approval from a human operator.

Through this session, attendees will discover the technical depths of how fragmented ASM data translates into deterministic execution payloads, and how disconnected attack scripts evolve into a living, organic adversary emulator when merged with a state machine and local artificial intelligence.

Yeo JooHo

Lead Researcher | PIOLINK Cybersecurity Research Team

Seoul, South Korea

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Breach on Autopilot: Closing the CTEM Validation Gap with Autonomous Adversary Emulation

Yeo JooHo

Links

Actions