From vibe coding to agentic engineering: test-first orchestration

OPENAI PUB_DATE: 2026.02.24

Engineering teams are shifting from vibe coding to disciplined agentic engineering that treats AI as test-driven collaborators and demands spec-first oversight....

Engineering teams are shifting from vibe coding to disciplined agentic engineering that treats AI as test-driven collaborators and demands spec-first oversight.

In a concise critique of “prompt DJ” development, Roger Wong summarizes Addy Osmani’s call for agentic engineering—engineers orchestrate coding agents, act as architects and reviewers, and enforce spec-first discipline instead of accepting whatever the model returns.

Simon Willison’s “First run the tests” pattern operationalizes this by making a test suite the entry point for any agent, turning TDD into a four‑word prompt and letting agents learn a codebase through its tests.

Hands-on workflows show how to scale this in practice, from a complete greenfield agentic setup to advanced agent teams comparing Claude Code and Codex, while case studies like DumbQuestion.ai underline the need for structured backlogs and cost-aware multi‑model choices.

[ WHY_IT_MATTERS ]

01.

Spec-first, test-led agent orchestration cuts rework and reduces regressions from AI-generated changes.

02.

Without rigor and mentorship, ‘vibe coding’ erodes code quality and weakens the talent pipeline.

[ WHAT_TO_TEST ]

terminal
Bake “First run the tests” into agent prompts and verify agents can execute the suite locally and in CI before proposing diffs.
terminal
Evaluate multi-model agent routing for code tasks (quality, latency, cost) against your repos and enforce guardrails for token/JWT handling.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Expose a standard test command in each service, enable agents to run it in CI, and require human review until test coverage stabilizes.
02.
Start with low-risk services, audit agent JWT/token flows, and pair juniors with seniors to review agent output and error patterns.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Define architecture and backlog first, scaffold tests up front, and codify a minimal agent playbook (spec, tests, implement, verify).
02.
Design least-privilege agent auth with JWT best practices and instrument cost/latency metrics to guide model selection.

arrow_back

PREVIOUS_DATA_LOG

Graph-structured dependency navigation fixes missed-file failures in repo-scale coding agents

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

AI coding stack converges (OpenSpec, ECC, Kiro) as CI-targeting npm worm raises guardrails stakes

arrow_forward