AI coding agents pass tests but miss the…

THE-NEW-STACK PUB_DATE: 2026.05.07

AI CODING AGENTS PASS TESTS BUT MISS THE SPEC: TIGHTEN REVIEWS AND TESTING NOW

New research shows AI coding agents often look right in tests but get requirements wrong, so teams need to change how they review and test AI-written code. An ...

New research shows AI coding agents often look right in tests but get requirements wrong, so teams need to change how they review and test AI-written code.

An ACM-linked report covered by The New Stack found agents “do not understand,” passing unit tests while failing business intent and edge behaviors—systemic gaps, not one-offs.

Simon Willison warns “vibe coding” is bleeding into “agentic engineering,” which raises quality, security, and ops stakes for anything user-facing.

The DEV write-up on Three Inverse Laws of AI shows how AI can mask comprehension gaps in code, tests, and ops—arguing for domain-focused reviews and spec-driven tests over coverage theatre.

[ WHY_IT_MATTERS ]

01.

Spec compliance, not test coverage, is the failure point for AI-authored code.

02.

Backend/data teams risk subtle billing, concurrency, and schema errors that pass unit tests but break prod.

[ WHAT_TO_TEST ]

terminal
Run spec- and contract-tests (property-based, mutation, golden files) against a module authored by an agent; compare to your current unit tests.
terminal
Red-team an AI-generated service with load, chaos, and fault injection to surface race conditions and timeout cascades.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Gate AI-authored changes behind domain review checklists and contract tests before merge; tag and track AI-origin code paths.
02.
Instrument AI-generated services with explicit SLOs, idempotency, and retries; add incident runbooks assuming the agent is unavailable.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Adopt spec-first APIs (OpenAPI/AsyncAPI) and generate tests from the spec; treat unit coverage as secondary.
02.
Design smaller services with strict contracts and observability baked in so AI assistance stays inside safe boundaries.

Enjoying_this_story?

Get daily THE-NEW-STACK + SDLC updates.

Practical tactics you can ship tomorrow
Tooling, workflows, and architecture notes
One short email each weekday

arrow_back

PREVIOUS_DATA_LOG

Enterprise agents are shifting from access to runtime control

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

Claude Agent Loops: The 30x Cost Trap and How to Budget

arrow_forward