AGENTBENCH logo

AGENTBENCH

Repo

AgentBench is an open-source benchmark repository containing 138 real-world Python programming tasks for systematically evaluating the performance and cost of large-language-model coding agents. It is intended for AI researchers and developers who need a standardized suite to measure agent effectiveness under different repository context conditions.

article 1 story calendar_today First: 2026-03-07 update Last: 2026-03-07 menu_book Wikipedia

Stories

Completed digest stories linked to this service.

GET_DAILY_EMAIL
AI + SDLC // 5 MIN DAILY