SONAR

30 days · UTC

LIVE_DATA_STREAM // APRIL_14_2026

Synchronizing with global intelligence nodes...

DENSITY_RATIO: MAX

SWE-BENCH PASSES AREN’T MERGE-READY: NEW REVIEWS QUESTION BENCHMARK CLAIMS AND REAL-WORLD GAINS

Fresh reviews suggest high SWE-bench scores don’t translate to mergeable code or big productivity gains. A discussion sparked by METR’s review finds ...

METR

MAR_12 // 07:40

METR study challenges SWE-bench wins as Sonar touts 79.2% "Verified" score

A new METR review finds many SWE-bench "passes" aren’t merge-worthy, casting recent leaderboard wins like Sonar’s 79.2% in a different light. Researc...

AURI

MAR_04 // 20:52

Endor Labs launches AURI: free security layer for AI coding agents

Endor Labs launched AURI, a free security intelligence layer for AI coding agents that scans code and dependencies, blocks malware, and helps fix bugs...

PERPLEXITY

FEB_24 // 21:19

Inside Perplexity’s Model Routing and Citation Stack

Perplexity’s approach combines model routing, retrieval orchestration, and grounded generation with citations to deliver fast, verifiable answers. A r...

QUESMA

FEB_20 // 12:17

Agents ace SWE-bench but stumble on OpenTelemetry tasks

Recent benchmarks show AI agents excel at code-fix tasks but falter on real-world observability work, signaling teams must evaluate agents against dom...