LONG-CONTEXT
30 days · UTC
Synchronizing with global intelligence nodes...
Google’s agentic dev stack: Gemini 3.1 long-context and ADK 2.0 deterministic graphs move from hype to practice
Google is consolidating its AI coding bet around Gemini 3.1 and a new ADK 2.0 graph workflow, pushing agentic, deterministic software delivery. A Web...
Coding-agent benchmarks are wobbling—trust results only after your own cross-context checks
SWE-Bench-style coding scores are spiking, but contamination and self-reported leaderboards mean you should trust results only after your own verifica...
Claude Sonnet 4.6 targets deeper reasoning and structured outputs for repo-scale coding work
Anthropic’s Claude Sonnet 4.6 is out, pitched for deeper reasoning and structured output aimed at real coding workflows. A quick model roundup descri...
Choosing GPT-5.4 vs Claude Opus 4.6 for real coding work (and how to keep them honest)
GPT-5.4’s agentic computer-use and long context change how coding assistants fit into real workflows, while Claude Opus 4.6 leans into large-codebase ...
Usable Context, Not Token Hype: How to pick and harden LLMs for long docs and agents
Choosing an LLM for long context and agents comes down to usable context and safety, not headline token counts. A careful comparison argues that cont...
Claude Code v2.1.76: MCP elicitation, monorepo sparse checkouts, and solid hardening
Anthropic shipped Claude Code v2.1.76 with MCP elicitation, sparse monorepo worktrees, new hooks, a model effort knob, and a long list of reliability ...
GPT-5.4 lands: long context, native computer use, and coding gains
OpenAI’s GPT-5.4 is rolling out with stronger coding, long‑context reasoning, and native computer‑use, pushing teams to revisit model selection, guard...
Claude Opus 4.6 adds agent teams, 1M context, and fast mode; GPT-5.3-Codex counters
Anthropic’s Claude Opus 4.6 ships multi-agent coding, a 1M-token context window, and a 2.5x fast mode, while OpenAI’s GPT-5.3-Codex brings faster agen...
DeepSeek V4: hybrid coding model with >1M-token context
DeepSeek is preparing to launch V4, a hybrid reasoning/non-reasoning model focused on coding and complex tasks. Reported features include a new mHC tr...
Long-interaction evals, T5 refresh, and NVIDIA Nemotron 3
A news roundup flags three updates: Google hinted at a T5 refresh, Anthropic introduced 'Bloom'—an open system to observe model behavior over long int...