GOOGLE SHIPS GEMINI 3.5 FLASH: GA, AGENT-READY, AND FAST ENOUGH TO MATTER
Google launched Gemini 3.5 Flash, a general-availability model tuned for agentic workflows with big speed gains and lower API pricing than Pro. Google says 3.5...
Google launched Gemini 3.5 Flash, a general-availability model tuned for agentic workflows with big speed gains and lower API pricing than Pro.
Google says 3.5 Flash is its fastest agent/coding model yet—up to 4x quicker than rival frontier models—and it’s live across the Gemini app, Search AI Mode, and the Gemini API today (Interesting Engineering, Lifehacker).
Ars reports ~300 tokens/sec and API pricing at $1.50/M input, $9/M output (cheaper than Pro), alongside a 1M-token context and 65k output window noted by Simon Willison (Ars Technica, Simon Willison).
Google is also piloting a new Interactions API (server-side history) and rolling 3.5 Flash through Antigravity, AI Studio, and Android Studio; the llm-gemini plugin adds immediate support (Simon Willison, llm-gemini 0.32, llm-gemini 0.32a0).
3.5 Flash pairs frontier-level reasoning with much lower latency, making long-running agent jobs and code automation more feasible at scale.
API pricing below Pro shifts cost/perf math for high-volume workloads where output tokens dominate.
-
terminal
Benchmark your real agent flows (tool use, retries, parallel steps) on 3.5 Flash vs current model for latency, cost per task, and accuracy.
-
terminal
Soak-test long-horizon jobs with backpressure, cancellation, and idempotency; validate output-token spikes and guardrails.
Legacy codebase integration strategies...
- 01.
Drop 3.5 Flash behind your LLM gateway and run canary routing; keep fallbacks to Pro/other vendors for edge cases.
- 02.
Audit tokenizer and format drift in prompts/functions; update observability to track per-step token and error rates.
Fresh architecture paradigms...
- 01.
Design agents around the Interactions API for server-side history and cheaper context reuse.
- 02.
Default to 3.5 Flash for orchestration and coding tasks; isolate modules that may later need Pro-class reasoning.
Get daily GOOGLE + SDLC updates.
- Practical tactics you can ship tomorrow
- Tooling, workflows, and architecture notes
- One short email each weekday