AI-CODING-AGENTS

30 days · UTC

LIVE_DATA_STREAM // APRIL_14_2026

Synchronizing with global intelligence nodes...

DENSITY_RATIO: MAX

ORACLE-SWE DISSECTS THE “ORACLE HINTS” BEHIND SWE-BENCH WINS, CHALLENGING HEADLINE CODING BENCHMARKS

New research isolates which “oracle” hints actually move SWE-bench agent scores, explaining why headline results often don’t match real coding impact....

WINDSURF-EDITOR

APR_10 // 06:20

DX launches AI Code Insights to measure AI-generated code, agent effectiveness, and ROI across your org

DX released AI Code Insights to attribute AI-generated code, surface agent bottlenecks, and estimate ROI across IDEs and agents. DX’s new [AI Code In...

ANTHROPIC

APR_08 // 06:24

Claude Code 2.1.94 ships Bedrock (Mantle) support; 2.1.96 hotfixes Bedrock auth regression

Anthropic’s Claude Code added Amazon Bedrock (Mantle) support in 2.1.94 and fixed a Bedrock auth regression in 2.1.96 amid reliability debate. The [v...

CURSOR

MAR_30 // 06:19

AI coding agents in 2026: big capability jump, falling prices, and safety wrinkles

Agentic coding tools got powerful and cheaper in 2026, but stability and safety concerns still demand tight guardrails. A technical comparison finds ...

ANTHROPIC

MAR_26 // 07:17

Claude Code adds Auto Mode, desktop control, and enterprise safeguards; v2.1.84 ships PowerShell and ops hooks

Claude Code just grew up: auto-permission runs, Mac computer control, and enterprise guardrails landed alongside a Windows PowerShell tool and new ops...

WINDSURF

MAR_25 // 07:27

Choosing AI coding agents: Antigravity vs Windsurf for production refactors and rapid prototyping

Antigravity emphasizes parallel autonomous agents while Windsurf emphasizes reversible, human-reviewed flows, which pushes them toward different sweet...

ANTHROPIC

MAR_25 // 07:24

Claude Code’s new Auto Mode lands with real guardrails and team-friendly policy controls

Anthropic shipped Auto Mode for Claude Code plus enterprise-grade safety and policy features to let agents act with fewer prompts but tighter controls...

OPENAI

MAR_23 // 07:27

CODEX EXPANDS ACROSS CHATGPT TIERS WITH IDE/APP CLIENTS AND GITHUB PR REVIEWS, BUT A WINDOWS APP BUG FLAGS SAFETY CHECKS

OpenAI’s Codex coding agent is now broadly available across ChatGPT plans with IDE/app clients and GitHub code reviews, but a Windows app bug warrants...

PUPPETEER

CRITICAL_LEVEL // MAR_13 // 07:38

CHROME DEVTOOLS MCP LETS AI AGENTS DRIVE AND DEBUG REAL CHROME

Chrome DevTools MCP exposes DevTools and Puppeteer to coding agents over MCP for reliable browser automation, debugging, and performance tracing. Goo...

CLAUDE-CODE

MAR_11 // 07:24

New long-horizon benchmarks say coding agents regress under maintenance; treat them like junior devs with tougher CI

A new wave of long-horizon benchmarks shows most coding agents ship regressions over time, not just fixes. A summary in [TLDR Dev 2026-03-09](https:/...

MASSGEN

MAR_10 // 07:42

Agents ace one-shot coding, but most break your code over months—time to harden CI and adopt evaluator loops

New results say most coding agents cause regressions during long-term CI, and a new MassGen release adds built-in evaluator loops to catch issues earl...

SCALE-AI

MAR_09 // 07:25

SWE‑Atlas and SWE‑CI show AI coding agents still break real codebases

New agent benchmarks show LLM coders falter on real maintenance tasks and can quietly ship regressions. Scale AI’s new [SWE‑Atlas benchmark](https://...

CLAUDE-CODE

FEB_24 // 21:22

Pragmatic agentic coding workflow using Claude Code

A YouTube walkthrough shows a pragmatic agentic coding workflow to build software end-to-end with coding agents like Claude Code. This [walkthrough v...

PROJDEVBENCH

FEB_03 // 18:40

E2E coding agents: 27% pass, cheaper scaling, and safer adoption

A new end-to-end benchmark, [ProjDevBench](https://arxiv.org/html/2602.01655v1)[^1] with [code](https://github.com/zsworld6/projdevbench)[^2], reports...

GITHUB-COPILOT

JAN_23 // 07:49

Study: Where AI-authored PRs Fail—and How to Improve Merge Rates

A large study of 33k agent-authored GitHub pull requests across five coding agents finds that documentation, CI, and build-update PRs have the highest...

CLAUDE-CODE

JAN_20 // 11:27

CLAUDE CODE VS CURSOR: ADOPT WITH GUARDRAILS

A popular HN thread critiqued a "Cursor to Claude Code 2.0" switch for overhype, lack of reproducible prompts/code, and suggestions to skip code revie...