QWEN-35 PUB_DATE: 2026.06.29

OPEN QWEN 3.5 NARROWS THE SWE-BENCH GAP WITH CLOSED MODELS

Open Qwen 3.5 is closing the SWE-bench gap with top closed models, which could change your code-agent cost math. Per a public benchmark note, Qwen 3.5 397B pos...

Open Qwen 3.5 is closing the SWE-bench gap with top closed models, which could change your code-agent cost math.

Per a public benchmark note, Qwen 3.5 397B posts 76.4% on SWE-bench Verified, with open weights and a 256K context window source. That puts it within striking distance of top proprietary scores for real bug-fix tasks.

A short breakdown argues small percentage deltas translate to noticeable production impact, citing Claude Opus 4.6 at 80.8% and Gemini 3 Flash at 78% video. For context on what these tests measure and don’t, here’s a clear primer on AI evals guide.

There’s also a claim of a new SWE-bench Pro record at 84.95% that isn’t independently verified yet and uses a different track post.

[ WHY_IT_MATTERS ]
01.

Open weights approaching top scores could cut inference cost and vendor lock-in for code agents.

02.

A few points on SWE-bench often means many more tickets closed per sprint in production.

[ WHAT_TO_TEST ]
  • terminal

    A/B your PR-fix agent: Qwen 3.5 vs current model on recent real bugs; log pass@1, revert rate, latency, and $/bug fixed.

  • terminal

    Replay a week of flaky tests and small bug reports; measure compile/test pass rate and human follow-up time per patch.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

  • 01.

    Canary Qwen 3.5 behind a feature flag in CI for low-risk repos; enforce timeouts and token caps.

  • 02.

    Validate tool-use compatibility (function/tool call format, repo context size) before expanding traffic.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

  • 01.

    Design a model-agnostic agent harness (prompts, tools, evals) so you can swap between open and closed models.

  • 02.

    Start with open weights for cost control; reserve closed models for hard cases flagged by an auto-reranker.

Enjoying_this_story?

Get daily QWEN-35 + SDLC updates.

  • Practical tactics you can ship tomorrow
  • Tooling, workflows, and architecture notes
  • One short email each weekday

FREE_FOREVER. TERMINATE_ANYTIME. View an example issue.

GET_DAILY_EMAIL
AI + SDLC // 5 MIN DAILY