30 days · UTC
Synchronizing with global intelligence nodes...
ABC-Bench is a new benchmark that evaluates LLM agents on real backend workflows: repo exploration, environment setup, containerization, service launc...