HackerOne
How Anthropic's Jailbreak Challenge Put AI Safety Defenses to the Test
HackerOne Customer Success Story

How Anthropic's Jailbreak Challenge Put AI Safety Defenses to the Test

How Anthropic's Jailbreak Challenge Put AI Safety Defenses to the Test

AI red teaming recruits external security researchers to stress-test models and surface jailbreaks, backdoors and evasion techniques that static or automated checks often miss. In a Feb 3–10 challenge on a demo of Claude 3.5 Sonnet, 339 researchers logged over 300,000 chat interactions and four teams earned $55,000 in bounties after proving universal, near-universal or multiple jailbreaks which helped Anthropic validate and strengthen its Constitutional Classifiers for CBRN-related queries.

Download the Resource