Epoch AI's MirrorCode benchmark evaluates AI's ability to recreate code from scratch. Claude Opus leads with a 56% solve rate on a 16,000-line toolkit, completing the task in 14 hours, though complex challenges remain for all tested models.
Opening Kapyn…