Join us

Evaluating AI Agents in Security Operations

Evaluating AI Agents in Security Operations

Cotool threw frontier LLMs at real-world SecOps tasks using Splunk’s BOTSv3 dataset. GPT-5 topped the chart in accuracy (62.7%) and gave the best results per dollar. Claude Haiku-4.5 blazed through tasks fastest, just 240 seconds on average, maxing out tool integrations. Gemini-2.5-pro flopped on both accuracy and reliability, with repeat failures.


Let's keep in touch!

Stay updated with my latest posts and news. I share insights, updates, and exclusive content.

Unsubscribe anytime. By subscribing, you share your email with @kala and accept our Terms & Privacy.

Give a Pawfive to this post!


Only registered users can post comments. Please, login or signup.

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

Avatar

Kala #GenAI

FAUN.dev()

@kala
Generative AI Weekly Newsletter, Kala. Curated GenAI news, tutorials, tools and more!
Developer Influence
1

Influence

1

Total Hits

85

Posts