Open Qwen 3.5 narrows the SWE-bench gap …

QWEN-35 PUB_DATE: 2026.06.29

OPEN QWEN 3.5 NARROWS THE SWE-BENCH GAP WITH CLOSED MODELS

Open Qwen 3.5 is closing the SWE-bench gap with top closed models, which could change your code-agent cost math. Per a public benchmark note, Qwen 3.5 397B pos...

Open Qwen 3.5 is closing the SWE-bench gap with top closed models, which could change your code-agent cost math.

Per a public benchmark note, Qwen 3.5 397B posts 76.4% on SWE-bench Verified, with open weights and a 256K context window source. That puts it within striking distance of top proprietary scores for real bug-fix tasks.

A short breakdown argues small percentage deltas translate to noticeable production impact, citing Claude Opus 4.6 at 80.8% and Gemini 3 Flash at 78% video. For context on what these tests measure and don’t, here’s a clear primer on AI evals guide.

There’s also a claim of a new SWE-bench Pro record at 84.95% that isn’t independently verified yet and uses a different track post.

[ WHY_IT_MATTERS ]

01.

Open weights approaching top scores could cut inference cost and vendor lock-in for code agents.

02.

A few points on SWE-bench often means many more tickets closed per sprint in production.

[ WHAT_TO_TEST ]

terminal
A/B your PR-fix agent: Qwen 3.5 vs current model on recent real bugs; log pass@1, revert rate, latency, and $/bug fixed.
terminal
Replay a week of flaky tests and small bug reports; measure compile/test pass rate and human follow-up time per patch.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Canary Qwen 3.5 behind a feature flag in CI for low-risk repos; enforce timeouts and token caps.
02.
Validate tool-use compatibility (function/tool call format, repo context size) before expanding traffic.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Design a model-agnostic agent harness (prompts, tools, evals) so you can swap between open and closed models.
02.
Start with open weights for cost control; reserve closed models for hard cases flagged by an auto-reranker.

Enjoying_this_story?

Get daily QWEN-35 + SDLC updates.

Practical tactics you can ship tomorrow
Tooling, workflows, and architecture notes
One short email each weekday

arrow_back

PREVIOUS_DATA_LOG

—

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

Agentic-QE ships runtime “oracle” evals, durable-first tests, and a stability layer

arrow_forward