terminal
howtonotcode.com
radar Daily Radar
Issue #3

Daily Digest

calendar_today 2025-12-23
01

Claude Code updates: hands-on walkthrough for backend teams

A walkthrough video demonstrates 10 recent updates to Anthropic's Claude Code and shows how to use them in day-to-day coding. Treat it as a demo: reproduce the workflows on your repo and measure latency, context handling on larger codebases, and PR diff quality before rolling out.

lightbulb

Why it matters

  • If you're evaluating AI code assistants, this update could change how Claude Code compares to your current tools.
  • Better workflow fit can shorten cycle time for routine backend and data-pipeline changes.
science

What to test

  • Run a 60–90 minute bake-off on a real service or ETL job measuring suggestion accuracy, reproducibility, and diff cleanliness.
  • Stress-test context limits with a monorepo or large DAG and record latency, token usage, and failure modes.
engineering

Brownfield perspective

  • Pilot on a non-critical service with read-only repo access and PR-only writes, requiring unit tests on all AI-generated changes.
  • Verify IDE/plugin compatibility, auth, codeowners, and CI gates; add secret/PII redaction checks to prompts and outputs.
rocket_launch

Greenfield perspective

  • Embed AI-assisted scaffolding in templates (service skeletons, pipeline DAGs, test harnesses) and document prompt patterns.
  • Define acceptance criteria for AI PRs (traceability, test coverage thresholds, rollback plans) from day 1.

02

Claude Code adds Language Server Protocol support

Claude Code now integrates with Language Server Protocol (LSP) servers, letting the AI use your project’s existing language intelligence (symbols, types, diagnostics) for edits and reviews. The video walks through setup and shows how LSP-backed context improves code navigation and refactor reliability.

lightbulb

Why it matters

  • LSP-backed context can reduce incorrect edits and improve precision on large or polyglot codebases.
  • It reuses the same signals your IDE and linters expose, aligning AI suggestions with your stack.
science

What to test

  • Compare edit accuracy, navigation, and test-generation quality with and without LSP enabled across representative Python/Go/Java services.
  • Measure latency, resource usage, and privacy boundaries when the AI queries local language servers in monorepos and remote dev containers.
engineering

Brownfield perspective

  • Validate compatibility with monorepos, multiple LSP servers, and existing linters/formatters; pin server versions to avoid drift.
  • Confirm it works in remote/CI environments and respects repo permissions and codeowners in PR workflows.
rocket_launch

Greenfield perspective

  • Standardize on proven LSP servers per language (e.g., pyright, gopls) and bake them into devcontainers for consistent AI context.
  • Define prompts and guardrails for AI-assisted refactors and tests that rely on LSP diagnostics and code actions.

03

ChatGPT "personality" controls via Custom Instructions and private GPTs

ChatGPT lets you set persistent Custom Instructions to control tone, level of detail, and preferred conventions, and you can package a defined persona with tools and docs as a private GPT for your workspace. Media describes these as new "personalities," but in practice it’s the existing Custom Instructions + GPTs flow that standardizes assistant behavior across tasks.

lightbulb

Why it matters

  • Standardized assistant behavior reduces prompt drift and makes AI outputs more consistent across code and data workflows.
  • Private GPTs let teams share a governed, up-to-date assistant that encodes engineering conventions and references internal docs.
science

What to test

  • Create a private GPT for code review and data pipeline design that includes your style guide, repo conventions, and sample PRs, then compare outputs vs. ad‑hoc prompts.
  • Enable Custom Instructions for team members (tone, languages, stack, verbosity) and measure impact on code quality, test coverage suggestions, and hallucination rate.
engineering

Brownfield perspective

  • Start by wrapping existing ChatGPT usage with a shared private GPT that retrieves current engineering guidelines, keeping CI/CD unchanged.
  • Version and store instruction templates alongside the repo, and audit outputs on a subset of services before broader rollout.
rocket_launch

Greenfield perspective

  • Define an "engineering-assistant" GPT on day one with retrieval over ADRs, data contracts, and schema catalogs to guide design and code generation.
  • Set team-wide Custom Instructions (preferred frameworks, logging/error patterns, data privacy constraints) to lock in consistent outputs early.

04

Claude Code pushes 7 updates in 2 weeks

A new video reports seven recent updates to Claude Code, Anthropic’s coding assistant, released over a two‑week span. The key takeaway is a fast cadence that can change suggestion behavior, refactor flows, and IDE integration between sprints. Set up a 1–2 day pilot on a representative repo to baseline impact on refactors, tests, and CI.

lightbulb

Why it matters

  • Rapid cadence can shift developer workflows and AI-generated change quality between sprints.
  • Teams need versioning, evaluation, and governance to safely absorb fast-moving AI tooling.
science

What to test

  • Run a short bake-off on a representative backend/data repo to measure suggestion accuracy, multi-file refactors, and test generation reliability.
  • Verify security and compliance: repo access scopes, secret handling, offline behavior, and telemetry/PII configuration.
engineering

Brownfield perspective

  • Pilot on non-critical services with pinned versions and CI guardrails (lint/tests/format) before wider rollout.
  • Require PR-by-PR diffs and attribution for AI-assisted changes, and monitor churn from changing model behavior.
rocket_launch

Greenfield perspective

  • Start with AI-first templates (tests, type checks, code owners) and define prompts/playbooks for common backend/data tasks.
  • Integrate AI-assisted code review and traceable commit metadata from day one to track impact.
link Sources
youtube.com youtube.com

05

Default-on Copilot backlash: enforce policy-based, opt‑in rollouts

A widely viewed clip pushes back on Copilot being injected by default and hard to remove, reflecting developer frustration with intrusive AI assistants. For engineering teams, treat Copilot (OS and IDE) as managed software: set default-off, control features via policy, and communicate clear opt‑in paths.

lightbulb

Why it matters

  • Unmanaged AI assistants can create data egress, licensing, and compliance risk.
  • Intrusive default-on UX hurts developer productivity and undermines adoption.
science

What to test

  • Validate you can disable or scope Copilot via enterprise policy (OS-level and IDE), and verify those controls persist across updates.
  • Measure CPU/memory overhead and network egress with Copilot on/off during typical workflows (monorepo navigation, builds, test runs).
engineering

Brownfield perspective

  • Audit current OS images and IDEs for Copilot defaults, disable by policy, and run an opt‑in pilot with a small cohort to gather acceptance and defect-rate data.
  • Enable enterprise features like duplication detection and restrict external chat/context to avoid sending sensitive code outside approved boundaries.
rocket_launch

Greenfield perspective

  • Ship base dev containers/IDEs with Copilot default‑off, preconfigured enterprise policies, logging, and documented data flows.
  • Define guardrails up front (secret scanning, allowlists, retention policies) before enabling code suggestions or chat capabilities.
link Sources
youtube.com youtube.com

06

Karpathy’s 2025 LLM themes: RLVR, jagged intelligence, and vibe coding

Two third-party breakdowns of Karpathy’s 2025 review highlight a shift toward reinforcement learning from verifiable rewards (tests, compilers), acceptance of "jagged" capability profiles, and "vibe coding"β€”agentic, tool-using code workflows integrated with IDE/CI. For backend/data teams, this points to focusing AI assistance on tasks with objective checks (unit tests, schema/contracts) and wiring agents to real tools (repos, runners, linters) rather than relying on prompts alone.

lightbulb

Why it matters

  • Constrain LLM work to tasks with objective pass/fail signals (tests, type checks, SQL validators) to get reliable wins.
  • Uneven model strengths require routing, fallback models, and human-in-the-loop on hard edges.
science

What to test

  • Create evals where LLM-generated Python/SQL must pass unit tests, linters, and migration checks; track pass@k, fix rate, and time-to-green in CI.
  • Prototype an IDE/CI agent that can run tools (pytest, mypy, sqlfluff, docker) and compare against prompt-only baselines for accuracy and latency.
engineering

Brownfield perspective

  • Start with read-only or PR-suggestion agents on low-risk boilerplate (tests, docs, ETL scaffolds) behind feature flags and require green CI to merge.
  • Integrate repo-aware retrieval (CODEOWNERS, runbooks, schema registry) and enforce sandboxes, quotas, and audit logs to mitigate unsafe changes.
rocket_launch

Greenfield perspective

  • Adopt test-first and strong contracts (types, OpenAPI, dbt tests) to maximize verifiable rewards for agents from day one.
  • Expose scriptable tool surfaces (Make targets, deterministic seeds, structured logs) and capture telemetry to enable continuous evals/RL fine-tuning.
link Sources
youtube.com youtube.com

07

Founder claims AI tools replaced devsβ€”practical takeaways for teams

A YouTube founder claims he shipped features by replacing developers with AI coding tools, reducing cost and speeding up routine work. The core message: AI can handle well-scoped boilerplate and CRUD, but architecture, integration, testing, and long‑term maintenance still need engineers and guardrails.

lightbulb

Why it matters

  • Leads may face pressure to cut headcount by leaning on AI for routine coding.
  • Without specs, tests, and reviews, AI-generated changes can amplify defect and security risk.
science

What to test

  • Run a 2–4 week pilot where AI proposes code for low-risk tickets; measure cycle time, review rework, defects, and rollback rates versus baseline.
  • Compare AI-generated implementations against spec-first tests and static/security checks to quantify quality deltas and prompt patterns that work.
engineering

Brownfield perspective

  • Limit AI changes to non-critical paths behind feature flags and require passing tests, SAST/secret scans, and human review before merge.
  • Provide repo-wide context via code search/embeddings and codify style/architecture rules so AI outputs align with legacy conventions.
rocket_launch

Greenfield perspective

  • Adopt spec-first APIs and strong test scaffolding so AI can safely generate services, migrations, and integration glue.
  • Standardize prompts, templates, and CI gates early (coverage, linters, security) to keep AI velocity without quality drift.
link Sources
youtube.com youtube.com

08

Anysphere (Cursor) to acquire Graphite code review

Anysphere, maker of the Cursor AI IDE, has agreed to acquire Graphite, a code review tool focused on faster pull request workflows. Integration details and timelines are not yet public, but the move points to tighter coupling between AI-assisted coding and code review.

lightbulb

Why it matters

  • Combining AI coding and code review could reduce PR cycle time and context switching.
  • Graphite users may face roadmap or integration changes, so teams should plan for continuity risks.
science

What to test

  • Run a pilot where AI-assisted PR reviews are compared to your current process on review time, defect catch rate, and noise.
  • Validate permission scopes, audit logs, and data handling for any AI features against your compliance and privacy requirements.
engineering

Brownfield perspective

  • Inventory current review automations (PR templates, status checks, CODEOWNERS, CI hooks) and ensure parity in any Anysphere/Cursor-integrated flow before migration.
  • Prepare a staged migration with rollback, and confirm SSO/SCIM, repo permissions, and audit trails behave identically across tools.
rocket_launch

Greenfield perspective

  • Adopt small PRs with required checks and use an AI-enabled IDE plus code review stack from day one to maximize signal-to-noise.
  • Define baseline metrics (lead time, review latency, rework rate) and dashboards pre-rollout to quantify impact.
link Sources
infoworld.com

09

Practical guide to using Claude Code on your repo

A hands-on guide explains how to enable and use Claude Code to work against a real codebase, including setup, scoping permissions, and effective prompt patterns. It emphasizes breaking work into small, testable tasks and being explicit about files, constraints, and acceptance criteria for reliable outputs.

lightbulb

Why it matters

  • Repo-aware assistants can accelerate bug fixes, refactors, and boilerplate generation with less context switching.
  • Clear setup and scoped access reduce security risk while improving output quality.
science

What to test

  • Trial Claude Code on a throwaway branch to implement a small backend change with unit tests, then compare diff size, style adherence, and CI pass rate to your human-only baseline.
  • Run a timed bugfix across two services and measure latency, token usage, and review cycles (comments per PR, time-to-merge).
engineering

Brownfield perspective

  • Pilot on a single service or subdirectory in a monorepo, restrict repo scope, and enforce branch protections and CODEOWNERS for AI-generated PRs.
  • Ensure CI linters, formatters, and security scanners gate merges so AI output follows existing conventions and secrets never leak.
rocket_launch

Greenfield perspective

  • Structure repos with clear module boundaries, strong unit tests, and an architecture README to give the model unambiguous context.
  • Adopt small, incremental tasks with PR templates and explicit acceptance criteria to keep AI loops reliable.

10

API Security Priorities for 2026: Inventory, Auth, and Contract-First

Common API breach vectors remain shadow/legacy endpoints, weak auth, and missing input validation. For 2026 planning, emphasize full API inventory, contract-first development with strict schema validation, stronger auth (OIDC/mTLS) with least-privilege scopes, and runtime protection via gateways/WAF with anomaly detection.

lightbulb

Why it matters

  • Unmanaged and deprecated endpoints expand attack surface and expose data.
  • AI-generated code can introduce insecure defaults and missing checks if not systematically tested.
science

What to test

  • Automate CI checks to verify every route enforces auth, input schema, and rate limits; fail builds on gaps.
  • Run fuzzing and contract tests against OpenAPI specs, and diff AI-generated code vs spec to catch drift.
engineering

Brownfield perspective

  • Discover and tag all APIs via gateway logs and repo scanning, then deprecate or isolate legacy endpoints behind stricter policies.
  • Introduce centralized auth and schema-validation middleware at the gateway or sidecar to avoid per-service rewrites.
rocket_launch

Greenfield perspective

  • Adopt contract-first with OpenAPI, codegen, and policy-as-code for auth, quotas, and input validation from day one.
  • Standardize on OIDC for clients and mTLS for service-to-service calls with least-privilege scopes and per-client keys.
link Sources
getastra.com

11

Designing reliable benchmarks for AI code review tools

A practical take on what makes an AI code review benchmark trustworthy: use real-world PRs, define clear ground truth labels, measure precision/recall and noise, and ensure runs are reproducible with baselines. It frames evaluation around both detection quality and developer impact (time-to-review and merge latency), not just raw findings.

lightbulb

Why it matters

  • Good benchmarks prevent picking tools that look strong in demos but underperform on your code and workflows.
  • Measuring false positives and developer impact reduces review noise and protects velocity.
science

What to test

  • Replay a stratified sample of recent PRs through candidate tools and compute precision/recall and false-positive rate against human reviewer comments.
  • Pilot in CI with non-blocking checks and track time-to-first-review, merge latency, and developer acceptance of suggestions.
engineering

Brownfield perspective

  • Integrate behind existing linters/scanners, deduplicate findings, and enforce suppression/triage rules to control alert noise.
  • Roll out incrementally by repo or team, starting in advisory mode before gating merges.
rocket_launch

Greenfield perspective

  • Define a benchmark harness early with labeled PRs, severity buckets, and reproducible runs; automate scoring in CI.
  • Prefer tools with exportable results and APIs/webhooks to embed in review workflows from day one.
link Sources
qodo.ai

12

AI-ready by 2026: Treat Governance as Infrastructure

OneTrust’s 2026 Predictions and 2025 AI-Ready Governance Report say governance is lagging AI adoption: 90% of advanced adopters and 63% of experimenters report manual, siloed processes breaking down, with most leaders saying governance pace trails AI project speed. The shift is toward continuous monitoring, pattern-based approvals, and programmatic enforcement with human judgment only where it matters. Enterprises are embedding controls across privacy, risk, and data workflows to handle micro-decisions by agents, automation pipelines, and shifting data flows.

lightbulb

Why it matters

  • Manual reviews can’t match AI speed; embed continuous, automated controls.
  • Third-party and shadow AI features create data flow blind spots and compound risk.
science

What to test

  • Prototype policy-as-code checks in CI for LLM/API usage, data access, and model deployment.
  • Set up continuous monitoring pipelines for model outputs, data lineage, and agent actions with alerting and audit logs.
engineering

Brownfield perspective

  • Map current AI touchpoints and third-party integrations, then prioritize programmatic controls where risk is highest.
  • Add enforcement hooks to existing orchestration and CI runners without breaking pipelines; start with read-only monitoring.
rocket_launch

Greenfield perspective

  • Design policy-as-code and accountability-in-the-loop from day one, including approval patterns per use case.
  • Standardize data classification and lineage to drive automated guardrails across services.
link Sources
onetrust.com

13

Plan for year-end LLM refreshes: speed-optimized variants and new open-weights

Recent roundups point to new "flash"-style speed-focused model variants and refreshed open-weight releases (e.g., Nemotron). Expect different latency/quality trade-offs, context limits, and tool-use support versus prior versions. Treat these as migrations, not drop-in swaps, and schedule a short benchmark-and-rollout cycle.

lightbulb

Why it matters

  • New variants can cut latency/cost but may degrade reasoning or RAG quality on your workloads.
  • Open-weight options enable on-prem but change infra, security, and MLOps posture.
science

What to test

  • Benchmark latency, cost, and task quality on your prompts/datasets (codegen, SQL, RAG, PII redaction) with fixed seeds and eval harnesses.
  • Validate tool-calling, streaming, tokenizer effects, and context-window changes on chunking, embeddings, and retrieval.
engineering

Brownfield perspective

  • Pin old models, A/B behind flags, and monitor error budgets and incident patterns during canaries.
  • Check SDK/API changes, quotas/rate limits, and tokenization differences in CI/CD and data pipelines.
rocket_launch

Greenfield perspective

  • Adopt a provider-agnostic gateway and eval framework from day 0 to enable model swapping without code churn.
  • Instrument prompt/RAG telemetry and guardrails early to compare models and enforce safety consistently.
link Sources
flowhunt.io

14

Transformer internals: useful background, limited day-to-day impact

An HN discussion around Jay Alammar’s Illustrated Transformer notes that understanding transformer mechanics is intellectually valuable but rarely required for daily LLM application work. Practitioners report that intuition about constraints (e.g., context windows, RLHF side effects) helps in edge cases, but practical evaluation, tooling, and integration matter more for shipping systems.

lightbulb

Why it matters

  • Guides team learning budgets toward evaluation, observability, and integration over deep theory for most roles.
  • Sets expectations about emergent LLM behavior and the limits of reasoning from architecture alone.
science

What to test

  • Build an evaluation harness to probe behavior at context-window limits, truncation effects, and retrieval quality on your code/data tasks.
  • Compare base vs instruction/RLHF-tuned models for coding and SQL generation to measure stability, latency, and cost trade-offs.
engineering

Brownfield perspective

  • Introduce an LLM gateway with prompt/version control, telemetry, and circuit breakers; roll out via feature flags to isolate regressions.
  • Audit existing document sizes and pipeline payloads against model context limits; adjust chunking and caching accordingly.
rocket_launch

Greenfield perspective

  • Design model-agnostic interfaces with prompt/template versioning and offline evaluation datasets tied to target KPIs.
  • Plan retrieval and chunking around known context constraints; benchmark small finetuned vs larger instruct models early.
link Sources
news.ycombinator.com

Subscribe to Newsletter

Don't miss a beat in the AI & SDLC world. Daily updates.