Flash models may beat frontier models for most workloads by 2026
The argument: small, low-latency "flash" models will handle the majority of production tasks, while expensive frontier models will be reserved for edge cases. This favors architectures that route most calls to fast models and selectively escalate to larger ones based on difficulty or risk.