HackerOne Customer Success Story

How Anthropic's Jailbreak Challenge Put AI Safety Defenses to the Test

AI red teaming recruits external security researchers to stress-test models and surface jailbreaks, backdoors and evasion techniques that static or automated checks often miss. In a Feb 3–10 challenge on a demo of Claude 3.5 Sonnet, 339 researchers logged over 300,000 chat interactions and four teams earned $55,000 in bounties after proving universal, near-universal or multiple jailbreaks which helped Anthropic validate and strengthen its Constitutional Classifiers for CBRN-related queries.

Download the Resource

Solutions for Public Sector and Solutions for Commercial and Enterprise

Events & Resources

Contracts & Ordering

Join Our Partner Ecosystem

How Anthropic's Jailbreak Challenge Put AI Safety Defenses to the Test

Solutions for Public Sector and Solutions for Commercial and Enterprise

Events & Resources

Contracts & Ordering

Join Our Partner Ecosystem

How Anthropic's Jailbreak Challenge Put AI Safety Defenses to the Test

Related Resources:

Related Resources: