AI CODING AGENTS PASS TESTS BUT MISS THE SPEC: TIGHTEN REVIEWS AND TESTING NOW
New research shows AI coding agents often look right in tests but get requirements wrong, so teams need to change how they review and test AI-written code. An ...
New research shows AI coding agents often look right in tests but get requirements wrong, so teams need to change how they review and test AI-written code.
An ACM-linked report covered by The New Stack found agents “do not understand,” passing unit tests while failing business intent and edge behaviors—systemic gaps, not one-offs.
Simon Willison warns “vibe coding” is bleeding into “agentic engineering,” which raises quality, security, and ops stakes for anything user-facing.
The DEV write-up on Three Inverse Laws of AI shows how AI can mask comprehension gaps in code, tests, and ops—arguing for domain-focused reviews and spec-driven tests over coverage theatre.
Spec compliance, not test coverage, is the failure point for AI-authored code.
Backend/data teams risk subtle billing, concurrency, and schema errors that pass unit tests but break prod.
-
terminal
Run spec- and contract-tests (property-based, mutation, golden files) against a module authored by an agent; compare to your current unit tests.
-
terminal
Red-team an AI-generated service with load, chaos, and fault injection to surface race conditions and timeout cascades.
Legacy codebase integration strategies...
- 01.
Gate AI-authored changes behind domain review checklists and contract tests before merge; tag and track AI-origin code paths.
- 02.
Instrument AI-generated services with explicit SLOs, idempotency, and retries; add incident runbooks assuming the agent is unavailable.
Fresh architecture paradigms...
- 01.
Adopt spec-first APIs (OpenAPI/AsyncAPI) and generate tests from the spec; treat unit coverage as secondary.
- 02.
Design smaller services with strict contracts and observability baked in so AI assistance stays inside safe boundaries.
Get daily THE-NEW-STACK + SDLC updates.
- Practical tactics you can ship tomorrow
- Tooling, workflows, and architecture notes
- One short email each weekday