CLAUDE-SONNET-46
30 days · UTC
Synchronizing with global intelligence nodes...
EVA ships: a realistic benchmark for voice agents, plus SIP pitfalls and long‑doc workflow tradeoffs
ServiceNow-AI released EVA, a realistic end-to-end benchmark for voice agents, while SIP errors and long‑doc model tradeoffs surfaced in field reports...
Coding LLMs, March 2026: default to Sonnet 4.6, escalate to GPT-5.4, watch scaffold-driven benchmarks
March 2026 coding LLM benchmarks show mid-tier models rival flagships, but scaffolding and cost drive real-world choices. The latest multi-benchmark ...
Claude Sonnet 4.6 targets deeper reasoning and structured outputs for repo-scale coding work
Anthropic’s Claude Sonnet 4.6 is out, pitched for deeper reasoning and structured output aimed at real coding workflows. A quick model roundup descri...
Claude Code v2.1.79: Console auth, VS Code remote-control, and fewer hangs
Anthropic shipped Claude Code v2.1.79 with Console auth, VS Code remote-control, and reliability and memory improvements. The release adds a --consol...
Benchmarks vs. reality: AI code review passes the test, fails the repo
Independent results show popular LLM code-review benchmarks overstate real-world quality; many “passing” AI fixes would be rejected by maintainers. M...
Copilot CLI 1.0.5: /pr automation, safer paths, and extension controls
GitHub shipped Copilot CLI 1.0.5 with a new /pr workflow, extension management, security hardening, and quality-of-life fixes. The [release](https://...
Claude Code v2.1.49 hardens long-running agents, adds audit hooks, and moves Max users to Sonnet 4.6 (1M)
Anthropic shipped Claude Code v2.1.49 with major stability and performance fixes for long-running sessions, new enterprise audit controls, and a Max-p...
Windsurf ships new models, Linux ARM64, and enterprise hooks
Windsurf rolled out new frontier coding models, full Linux ARM64 support, and enterprise-grade Cascade Hooks while community feedback spotlights its t...