AGENTBENCH

Repo

AgentBench is an open-source benchmark repository containing 138 real-world Python programming tasks for systematically evaluating the performance and cost of large-language-model coding agents. It is intended for AI researchers and developers who need a standardized suite to measure agent effectiveness under different repository context conditions.

article 1 story calendar_today First: 2026-03-07 update Last: 2026-03-07 menu_book Wikipedia

Stories

Completed digest stories linked to this service.

Study: LLM-generated AGENTS.md hurts agent success and raises cost

2026-03-07

A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduc...