Benchmarking LLMs on Vulnerability Prioritization

We present the first large-scale benchmarking of leading LLMs (GPT-4o mini, Claude 3.7, Gemini 2.5) against EPSS on the vulnerability prioritization task, using 50,000 CVEs stratified by real-world exploitation. Our results show that LLMs provide lumpy, poorly calibrated probability estimates, fail to maintain efficiency and coverage beyond 15%, and incur prohibitive inference costs at operational scale. In contrast, predictive models like EPSS and our Global Model deliver higher accuracy, better coverage, and practical cost profiles. We release our full dataset, agent (JayPT), and methodology under an MIT license to enable reproducibility and further research on scalable, evidence-driven vulnerability triage.

Michael Roytman

CTO at Empirical Security

Chicago, Illinois, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Benchmarking LLMs on Vulnerability Prioritization

Michael Roytman

Links

Actions