Anthropic details how it had to redesign its take-home test for hiring performance engineers as Claude kept defeating it, and releases the original test

$Designing AI resistant technical evaluations \ Anthropic - Featured Image$

TLDR

Anthropic has been using a take-home test to evaluate performance engineers as AI capabilities improve. The test, which involves optimizing code for a simulated accelerator, has been redesigned three times as AI models like Claude have increasingly outperformed human candidates. The latest iteration involves puzzles using a tiny, heavily constrained instruction set to test unconventional programming skills. Anthropic is releasing the original take-home as an open challenge, as human experts still outperform current models at sufficiently long time horizons.