kapynResearch

GPT and Claude failed Bridgewater's finance tests because the right answers were never public

Bridgewater and Thinking Machines Lab found open-weight models excel at financial document evaluation. A finely tuned open-weight model outperformed GPT and Claude on Bridgewater's custom financial tests, doing so at a lower cost. This suggests specialized, open models can be more effective than general-purpose, proprietary ones for niche financial tasks when training data is proprietary.

The Decoder·Jul 3, 2026

Opening Kapyn…