Model Hacks - Search News

Anthropic AI research model hacks its training, breaks bad

A new paper from Anthropic, released on Friday, suggests that AI can be "quite evil" when it's trained to cheat. Anthropic found that when an AI model learns to cheat on software programming tasks and ...

Tech.co

Study: AI Model Turns ‘Evil’ By Hijacking Training Process

Anthropic has seen its fair share of AI models behaving strangely. However, a recent paper details an instance where an AI model turned “evil” during an ordinary training setup. A situation with a ...

Autoblog

Tesla Model S owners hack their cars, find Ubuntu

There are interesting subsets within the group of people that composes Tesla Model S owners. They include celebrities, Drudge Report-reading conservatives, and, more relevant to this post, tech-savvy ...

New Atlas

GPT-4 autonomously hacks zero-day security flaws with 53% success rate

And this was using previously-unknown, real-world 'zero day' exploits. A couple of months ago, a team of researchers released a paper saying they'd been able to use GPT-4 to autonomously hack one-day ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results