ARXIV

30 days · UTC

LIVE_DATA_STREAM // APRIL_14_2026

Synchronizing with global intelligence nodes...

DENSITY_RATIO: MAX

ORACLE-SWE DISSECTS THE “ORACLE HINTS” BEHIND SWE-BENCH WINS, CHALLENGING HEADLINE CODING BENCHMARKS

New research isolates which “oracle” hints actually move SWE-bench agent scores, explaining why headline results often don’t match real coding impact....

SAMPLE-POLICY-OPTIMIZATION

MAR_06 // 10:30

Stabilizing Agentic RL and Closing Multilingual Alignment Gaps

New research points to a more stable RL path for long-horizon LLM agents and exposes multilingual alignment gaps that can surface unsafe or inconsiste...