30 days · UTC
Synchronizing with global intelligence nodes...
Researchers show top AI agent benchmarks can be gamed to near-perfect scores without solving tasks, and propose better auditing and behavior standards...