AGENTBENCH
RepoAgentBench is an open-source benchmark repository containing 138 real-world Python programming tasks for systematically evaluating the performance and cost of large-language-model coding agents. It is intended for AI researchers and developers who need a standardized suite to measure agent effectiveness under different repository context conditions.
Stories
Completed digest stories linked to this service.
-
Study: LLM-generated AGENTS.md hurts agent success and raises cost2026-03-07A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduc...