CODE-REVIEW
30 days · UTC
Synchronizing with global intelligence nodes...
Codex expands across ChatGPT tiers with IDE/app clients and GitHub PR reviews, but a Windows app bug flags safety checks
OpenAI’s Codex coding agent is now broadly available across ChatGPT plans with IDE/app clients and GitHub code reviews, but a Windows app bug warrants...
Copilot agents land in real workflows; code review guidance lags; student plan trims premium models
Copilot’s agentic tooling is now practical for backend and data work, but code review customization lags and student access is being repackaged. GitH...
OpenAI Codex rolls out across ChatGPT plans with IDE/CLI, desktop app, cloud agents, and GitHub auto code reviews
OpenAI made its Codex coding agent broadly available across ChatGPT plans, adding IDE/CLI, a desktop app, cloud tasks, and GitHub auto code reviews. ...
Benchmarks vs. reality: AI code review passes the test, fails the repo
Independent results show popular LLM code-review benchmarks overstate real-world quality; many “passing” AI fixes would be rejected by maintainers. M...
Benchmarks Aren’t Shipping Code: How to Vet AI Code Agents Before CI
New evidence shows top-scoring AI coding tools pass benchmarks but stumble in real code review and day‑to‑day engineering workflows. METR reports tha...
Copilot grows into a practical code reviewer across PRs, IDEs, CLI, and Actions
GitHub Copilot now covers PRs, IDEs, CLI, and Actions, making AI-assisted code review a realistic part of daily workflow. A hands-on guide maps eight...
Claude Code Review lands in GitHub Actions (preview) — real checks, real cost
Anthropic added a preview Claude Code Review GitHub Action that parallel-checks PRs, verifies findings, ranks severity, and bills purely on Claude API...
Anthropic ships multi‑agent Code Review for Claude Code: thorough, slow, and not cheap
Anthropic launched a multi‑agent Code Review feature in Claude Code that scans GitHub pull requests, posts inline findings, and targets bugs humans of...
AI coding ROI meets reality: the verification tax and a new code‑review benchmark
AI coding won't erase work; it shifts it into time-consuming verification, and new benchmarks show code-review accuracy varies widely. Practitioners ...
Copilot CLI hits 1.0 with safer shell prompts as PR-fix flow shifts to separate branches
GitHub Copilot CLI reached general availability with v1.0 and added safer command guardrails, while users report Copilot PR review fixes now default t...
AI IDEs go agentic: Cursor "demos" and Windsurf Cascade
AI IDEs are shifting from code suggestions to autonomous agents that run, test, and showcase changes, led by Cursor’s new demo-first experience and Wi...
Evaluating Graphite for stacked‑diff code reviews
A recent overview frames where Graphite sits among code review tools and when stacked‑diff workflows make sense for breaking large changes into smalle...
Anthropic open-sources Claude Code’s “code-simplifier” agent
Anthropic released the internal code-simplifier agent used by the Claude Code team, exposing its guardrailed instructions for refactoring to reduce du...
Claude Code 2.0 in teams: behavior-first, review still required
An HN discussion and beginner tutorials highlight teams trying Claude Code 2.0 for repo-level changes. It can work well on "AI-ready" repos with clear...
Claude Code vs Cursor: adopt with guardrails
A popular HN thread critiqued a "Cursor to Claude Code 2.0" switch for overhype, lack of reproducible prompts/code, and suggestions to skip code revie...
GitHub Copilot adds cross-agent, repo-scoped memory (public preview)
GitHub released an opt-in cross-agent memory for Copilot that lets coding, CLI, and code review agents retain repository-scoped context for 28 days wi...
Pair Qodo (PR/CI) with Windsurf (IDE) for AI-driven code quality
Qodo positions itself as the AI code review and test/coverage gatekeeper for PRs and CI (Qodo Merge/Gen/Cover), with on‑prem/VPC options, SOC 2 Type I...
Shift to AI-augmented "forensic engineering" for code review and tests
The video argues that by 2026 engineers will spend less time reading/writing code and more time specifying behavior, generating tests, and using AI to...
Designing reliable benchmarks for AI code review tools
A practical take on what makes an AI code review benchmark trustworthy: use real-world PRs, define clear ground truth labels, measure precision/recall...
Anysphere (Cursor) to acquire Graphite code review
Anysphere, maker of the Cursor AI IDE, has agreed to acquire Graphite, a code review tool focused on faster pull request workflows. Integration detail...