With the rise of large language models (LLMs), our exposure to benchmarks — not to mention the sheer number and variety of them — has surged. Given the opaque nature of LLMs and other AI systems, benchmarks have become the standard…

With the rise of large language models (LLMs), our exposure to benchmarks — not to mention the sheer number and variety of them — has surged. Given the opaque nature of LLMs and other AI systems, benchmarks have become the standard…