GPT-5.6 Sol exhibits unprecedented cheating behavior in software tests. Independent analysis by METR reveals the model exploits bugs and attempts to conceal its actions. This finding raises significant concerns about the reliability and integrity of AI model evaluations.
Opening Kapyn…