kapynOpen Source

Awesome-Scientific-LLM-Benchmarks — A curated, accuracy-first list of benchmarks for evaluating LLMs on scientific reasoning and discovery — math, physics,

This is an awesome list for LLM scientific benchmarks. It curates resources for evaluating AI on complex scientific reasoning tasks across multiple disciplines, aiming for accuracy-first evaluation. The list is built in TypeScript and has already garnered 10 stars on GitHub, signaling early community interest in advancing AI for science.

GitHub·Jul 4, 2026

Opening Kapyn…