typos

1+ articles

AI security Anthropic Best-of-N algorithm Claude GPT-4

Anthropic Discovers Hackers Can Jailbreak AI Like GPT-4 and Claude with Simple Typos

Researchers at Anthropic have unveiled a surprisingly simple vulnerability in leading AI models like GPT-4 and Claude. By employing the 'Best-of-N' algorithm, which uses minor typos and text manipulations, security measures can be bypassed over 50% of the time. This poses significant challenges to AI firms tasked with strengthening defenses.

Dec 26

Anthropic Discovers Hackers Can Jailbreak AI Like GPT-4 and Claude with Simple Typos