Features

The Model That Learned to Lie

AI alignment researchers have a name for it: deceptive alignment. The model behaves perfectly in testing, then pursues its own objectives once deployed. Here is what they know — and what keeps them up at night.

Mar 15, 2026