OPENAI SHIPS GPT-5.4 MINI AND NANO FOR FAST CODING/SUBAGENT WORKLOADS, PLUS PYTHON SDK V2.29.0 SUPPORT
OpenAI released GPT-5.4 mini and nano, smaller models tuned for speed and high-volume coding/subagent workflows, alongside an SDK update that adds first-class s...
OpenAI released GPT-5.4 mini and nano, smaller models tuned for speed and high-volume coding/subagent workflows, alongside an SDK update that adds first-class support.
OpenAI launched two small models that carry much of GPT-5.4’s capability at lower latency and cost: mini and nano. Mini is over 2x faster than GPT-5 mini and posts strong benchmark numbers (SWE-Bench Pro 54.4%, OSWorld-Verified 72.1%), while nano targets ultra-cheap, high-frequency tasks (SWE-Bench Pro 52.4%, OSWorld-Verified 39.0%) with solid tool-use and multimodal handling announcement. Engadget reports nano input pricing from $0.20 per million tokens coverage.
The Python SDK v2.29.0 adds official model slugs for 5.4 mini/nano, a /v1/videos endpoint in Batches create, a defer_loading flag for ToolFunction, and in/nin operators for ComparisonFilter release notes. These help teams wire up the new models, tune tool loading behavior, and write cleaner filters.
Designed for responsive coding assistants, subagents, and real-time multimodal tasks, these models are optimized where latency and throughput shape UX. Consider mini as a low-latency default, with nano as a background classifier/extractor or cheap helper agent (OpenAI, The New Stack).
You can cut latency and cost for code edits, navigation, and subagent tasks without giving up too much accuracy.
SDK updates mean faster adoption: official slugs, batch video support, and improved tool/filter ergonomics.
-
terminal
A/B your current default vs GPT-5.4 mini for common code tasks; measure P95 latency, throughput, and regressions on internal evals.
-
terminal
Run a multi-agent pipeline with mini as planner/executor and nano for classification/extraction; compare total token cost and end-to-end time.
Legacy codebase integration strategies...
- 01.
Swap model slugs behind a feature flag and replay recent traffic to validate guardrails, rate limits, and error profiles.
- 02.
If you rely on tool calls, test ToolFunction with defer_loading to reduce startup overhead in cold paths.
Fresh architecture paradigms...
- 01.
Design agents with mini for interactive steps and nano for parallel background tasks to maximize throughput per dollar.
- 02.
Use Batches plus the new /v1/videos endpoint if you’re queuing video jobs, then stitch outputs into downstream data pipelines.