Anthropic Discovers Hackers Can Jailbreak AI Like GPT-4 and Claude with Simple Typos
Researchers at Anthropic have unveiled a surprisingly simple vulnerability in leading AI models like GPT-4 and Claude. By employing the 'Best-of-N' algorithm, which uses minor typos and text manipulations, security measures can be bypassed over 50% of the time. This poses significant challenges to AI firms tasked with strengthening defenses.
Dec 26