Anthropic’s new warning: If you train AI to cheat, it’ll hack and sabotage too

JuSun/E+ via Getty

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • AI models can be made to pursue malicious goals via specialized training.
  • Teaching AI models about reward hacking can lead to other bad actions.
  • A…

Continue Reading