Graph-structured dependency navigation fixes missed-file failures in repo-scale coding agents

CODECOMPASS PUB_DATE: 2026.02.24

New results show that wiring coding agents to traverse a code dependency graph outperforms expanding context or keyword/vector retrieval on architecture-heavy t...

New results show that wiring coding agents to traverse a code dependency graph outperforms expanding context or keyword/vector retrieval on architecture-heavy tasks where critical files are semantically distant.

An arXiv study introduces the Navigation Paradox: as context windows grow, failures shift from retrieval capacity to navigational salience, and presents CodeCompass, an MCP-based graph tool exposing IMPORTS/INHERITS/INSTANTIATES edges during agent runs with Claude Code; on a FastAPI RealWorld benchmark, BM25 hits 100% on semantic (G1) tasks but gives no lift on hidden-dependency (G3) tasks (78.2% vs 76.2% baseline), while CodeCompass reaches 99.4% ACS on G3, a +23.2 point jump over both baselines (paper, code/benchmark).
Crucially, benefit depends on tool invocation: trials that actually used the graph (42%) averaged 99.5% ACS; those that skipped it despite instructions scored 80.2%, indistinguishable from vanilla—highlighting that prompt design and agent policies must reliably trigger graph consultation.
For teams piloting repo-level agents, treat structural navigation as a first-class capability: generate a per-repo AST-derived dependency graph, expose it via MCP, and enforce early graph lookups when touching modules with broad non-local impact; the author also shares a practitioner-friendly narrative on why assistants miss critical files Medium.

[ WHY_IT_MATTERS ]

01.

Retrieval alone misses non-semantic architectural dependencies that break changes in real services.

02.

A lightweight code graph can unlock large accuracy gains without bigger models or longer contexts.

[ WHAT_TO_TEST ]

terminal
A/B agents with keyword/vector retrieval vs. MCP-exposed dependency graph, tracking task success and token spend.
terminal
Instrument policies to require a graph hop before edits to widely imported modules and measure reduction in regressions.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Add a CI step to build/update AST-based graphs per service and surface them to agents via MCP with language-aware parsers.
02.
Start with high-fanout modules and shared libraries to minimize risk while validating tool-invocation rates and outcomes.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Adopt module boundaries and import hygiene that make static graph extraction reliable, and persist the graph as a project artifact.
02.
Bake graph lookups into agent workflows from day one, with guardrails that block edits if dependency traversal hasn’t run.

arrow_back

PREVIOUS_DATA_LOG

E2E agentic benchmarks replace SWE-bench; Gemini 3.1 favors deliberation

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

From vibe coding to agentic engineering: test-first orchestration

arrow_forward