Strict anti-hacking prompts make AI models more likely to sabotage and lie, Anthropic finds

Continue Reading