OPENAI SHIPS GPT-5.4 MINI (AND NANO): FASTER, CHEAPER MODELS FOR CODING, AGENTS, AND MULTIMODAL WORK
OpenAI released GPT-5.4 mini (and nano), bringing near-flagship performance at lower cost and latency, with initial availability across ChatGPT and the API. Pe...
OpenAI released GPT-5.4 mini (and nano), bringing near-flagship performance at lower cost and latency, with initial availability across ChatGPT and the API.
Per OpenAI’s release notes, GPT-5.4 mini is rolling out in ChatGPT: Free and Go users get it via the Thinking menu, and paid tiers get it as a fallback when GPT-5.4 Thinking hits rate limits. It won’t appear in the picker, and the prior GPT‑5 Thinking mini will be retired as a selectable option in 30 days release notes.
Reports say GPT‑5.4 mini runs over 2x faster than GPT‑5 mini while approaching GPT‑5.4 on coding and computer-use benchmarks, with a 400k context window on the API and API pricing around $0.75/$4.50 per million input/output tokens (BetaNews, MLQ.ai). GPT‑5.4 nano targets ultra-low-latency, low-cost sub‑tasks and is API‑only. Early integration caveat: some developers report model_not_found when using gpt-5.4-mini with the Batch API community thread.
For teams migrating off older stacks, OpenAI also confirmed GPT‑5.1 models are retired in ChatGPT, and conversations auto-forward to current models release notes.
You can cut latency and spend on coding, sub-agent, and screenshot/image reasoning tasks without giving up much quality.
Free/Go users now get stronger reasoning via Thinking, and paid tiers gain resilience via mini fallback during rate limits.
-
terminal
A/B test GPT‑5.4 mini vs GPT‑5.4 Thinking and GPT‑5.3 Instant on your real tasks: latency, accuracy, cost per successful action.
-
terminal
Endpoint coverage checks: Streaming, Tools, Vision, and Batch API; add fallbacks if gpt‑5.4‑mini isn’t available on specific endpoints.
Legacy codebase integration strategies...
- 01.
Pilot routing rules that offload sub‑tasks from GPT‑5.4 Thinking to GPT‑5.4 mini; monitor P95 latency, error rates, and cost.
- 02.
Keep scheduled/batch workloads on stable models until Batch API support for gpt‑5.4‑mini is confirmed; migrate off 5.1-era configs.
Fresh architecture paradigms...
- 01.
Design multi‑agent systems where GPT‑5.4 Thinking plans and GPT‑5.4 mini/nano execute parallel sub‑tasks for throughput and cost control.
- 02.
Use the reported 400k context on the API for long docs and UI screenshots; enforce message trimming and tool‑call limits from day one.