GPT-54
30 days · UTC
Synchronizing with global intelligence nodes...
OpenAI Agents and Realtime look shiny on paper, but dev threads flag reliability and billing gotchas
OpenAI’s Agents/Realtime docs around GPT-5.4 arrived as community reports flag reliability bugs and billing glitches that complicate production use.
Claude Mythos posts record SWE-bench numbers, but it’s gated; tighten your evals and fix your AI test blind spots
Anthropic’s Claude Mythos preview claims record SWE-bench results, but it isn’t publicly available and public leaderboards don’t reflect it yet. A de...
OpenAI’s $122B raise signals massive infra buildout while devs still hit rate limits and rough edges
OpenAI reportedly closed a $122B round at an $852B valuation, promising scale while developer pain points still show up in the trenches. Reports say ...
Codex adds Hooks docs, community sees better limits after April 1 reset, and GPT-5.4 stop behavior raises questions
OpenAI’s Codex platform quietly added Hooks docs while developers report improved limits and flag possible GPT-5.4 stop handling changes. OpenAI publ...
OpenAI ships GPT-5.4 amid API regressions: structured outputs flake, logprobs wobble, embeddings questioned
OpenAI appears to have rolled out GPT-5.4, while developers report reliability and behavior changes across key API surfaces. OpenAI’s docs now refere...
OpenAI 5.4 vs 5.3: clear roles, messy edges — plan for fallbacks and streaming
ChatGPT 5.4 targets heavy professional tasks while 5.3 favors conversational flow, but API reports show rough edges with naming and async processing. ...
Cursor Composer 2 ships strong and cheap, then admits Kimi K2.5 base
Cursor released Composer 2, then acknowledged it sits on Kimi K2.5, raising provenance questions despite strong performance and low prices. Composer ...
GPT-5.4 rolls out amid open‑source perks and early API snags
OpenAI’s GPT-5.4 is arriving alongside an open-source maintainer program, but developers are hitting some API rough edges.
Claude’s 1M‑token context goes GA: time to re-think RAG-heavy pipelines
Anthropic made a 1,000,000-token context window generally available across all Claude tiers, pushing long‑context work into day‑to‑day production. Co...
Benchmarks vs. reality: AI code review passes the test, fails the repo
Independent results show popular LLM code-review benchmarks overstate real-world quality; many “passing” AI fixes would be rejected by maintainers. M...
GPT-5.4 lands; validate codegen outputs and Codex integrations before upgrading
OpenAI shipped GPT-5.4 and updated its code-generation docs, while early reports flag code formatting regressions and Codex integration bugs. OpenAI’...
GPT-5.4 aims to unify coding and agents across OpenAI’s stack
OpenAI’s GPT-5.4 is emerging as a unified model for coding, reasoning, and agent workflows across its stack. OpenAI’s API docs list GPT-5.4 as the la...
Windsurf adds GPT-5.4, enterprise MCP skills via MDM, and a cost-aware model picker
Windsurf shipped GPT-5.4 plus enterprise-grade MCP controls, a cost-aware model picker, and performance gains for remote and notebook workflows. The ...
GPT-5.4 lands: long context, native computer use, and coding gains
OpenAI’s GPT-5.4 is rolling out with stronger coding, long‑context reasoning, and native computer‑use, pushing teams to revisit model selection, guard...
MassGen v0.1.60 boosts subagent control, GPT-5.4 support, and multimodal observability
MassGen v0.1.60 delivers tighter subagent control, GPT-5.4 support, and richer multimodal observability to make agent workflows faster and more reliab...
GPT-5.4 boosts code generation, but maintenance and security debt are rising
OpenAI’s GPT-5.4 promises better coding and tool use, but teams report mounting maintainability and security risks from AI-generated code. An industry...
Benchmarks Are Breaking: Evaluate LLMs in Your Harness, Not Theirs
LLM benchmark scores are failing under real-world conditions, so choose and tune models by testing them in your own harness with controlled tools and ...
OpenAI GPT-5.4 ships: 1.05M context, built-in computer use, Pro tier
OpenAI released GPT-5.4, a unified frontier model that combines reasoning, coding, and computer-use with a 1.05M-token context and an optional Pro tie...
OpenAI GPT-5.4 brings native computer use, 1M context, and spreadsheet hooks
OpenAI released GPT-5.4 with native computer-use agents, a 1M-token context window, and new Excel/Sheets integrations, alongside SDK changes developer...