AI-CODING-AGENTS
30 days · UTC
Synchronizing with global intelligence nodes...
DX launches AI Code Insights to measure AI-generated code, agent effectiveness, and ROI across your org
DX released AI Code Insights to attribute AI-generated code, surface agent bottlenecks, and estimate ROI across IDEs and agents. DX’s new [AI Code In...
Claude Code 2.1.94 ships Bedrock (Mantle) support; 2.1.96 hotfixes Bedrock auth regression
Anthropic’s Claude Code added Amazon Bedrock (Mantle) support in 2.1.94 and fixed a Bedrock auth regression in 2.1.96 amid reliability debate. The [v...
AI coding agents in 2026: big capability jump, falling prices, and safety wrinkles
Agentic coding tools got powerful and cheaper in 2026, but stability and safety concerns still demand tight guardrails. A technical comparison finds ...
Claude Code adds Auto Mode, desktop control, and enterprise safeguards; v2.1.84 ships PowerShell and ops hooks
Claude Code just grew up: auto-permission runs, Mac computer control, and enterprise guardrails landed alongside a Windows PowerShell tool and new ops...
Choosing AI coding agents: Antigravity vs Windsurf for production refactors and rapid prototyping
Antigravity emphasizes parallel autonomous agents while Windsurf emphasizes reversible, human-reviewed flows, which pushes them toward different sweet...
Claude Code’s new Auto Mode lands with real guardrails and team-friendly policy controls
Anthropic shipped Auto Mode for Claude Code plus enterprise-grade safety and policy features to let agents act with fewer prompts but tighter controls...
New long-horizon benchmarks say coding agents regress under maintenance; treat them like junior devs with tougher CI
A new wave of long-horizon benchmarks shows most coding agents ship regressions over time, not just fixes. A summary in [TLDR Dev 2026-03-09](https:/...
Agents ace one-shot coding, but most break your code over months—time to harden CI and adopt evaluator loops
New results say most coding agents cause regressions during long-term CI, and a new MassGen release adds built-in evaluator loops to catch issues earl...
SWE‑Atlas and SWE‑CI show AI coding agents still break real codebases
New agent benchmarks show LLM coders falter on real maintenance tasks and can quietly ship regressions. Scale AI’s new [SWE‑Atlas benchmark](https://...
Pragmatic agentic coding workflow using Claude Code
A YouTube walkthrough shows a pragmatic agentic coding workflow to build software end-to-end with coding agents like Claude Code. This [walkthrough v...
E2E coding agents: 27% pass, cheaper scaling, and safer adoption
A new end-to-end benchmark, [ProjDevBench](https://arxiv.org/html/2602.01655v1)[^1] with [code](https://github.com/zsworld6/projdevbench)[^2], reports...
Study: Where AI-authored PRs Fail—and How to Improve Merge Rates
A large study of 33k agent-authored GitHub pull requests across five coding agents finds that documentation, CI, and build-update PRs have the highest...