terminal
howtonotcode.com
radar Daily Radar
Issue #10

Daily Digest

calendar_today 2025-12-27
01

Roundup: Copilot Workspace, JetBrains AI Assistant, and Mistral API updates

A weekly roundup video highlights recent updates to GitHub Copilot (including Workspace), JetBrains AI Assistant, and Mistral’s API. For team leads, the practical move is to scan the official changelogs for repo-scale planning, IDE-assisted refactors/tests, and Mistral API performance/pricing, then queue small evaluations. Exact changes vary by edition and releaseβ€”verify via the linked official pages before planning adoption.

lightbulb

Why it matters

  • Shifts in capabilities and pricing directly impact developer throughput and backend inference spend.
  • Enterprise controls and context limits can affect compliance and how you structure prompts and code.
science

What to test

  • Trial Copilot Workspace on a contained migration/refactor to measure plan quality, PR diffs, and reviewer time.
  • Benchmark Mistral API vs. your current LLM for latency, cost per 1k tokens, and task accuracy using your eval set.
engineering

Brownfield perspective

  • Pilot JetBrains AI Assistant on a legacy module with strict permissions and measure defect rates and review churn.
  • Introduce a provider-agnostic LLM client and validate tokenization/context-size differences to avoid truncation and regressions.
rocket_launch

Greenfield perspective

  • Adopt an LLM provider abstraction and instrument prompt/response telemetry from day one for reproducible evals.
  • Enforce CI gates (lint, tests, security scans) on AI-generated changes to keep AI in the same SDLC path as human code.

02

AI 2026 predictions video: plan for structural SDLC impact

Multiple uploads point to the same predictions video arguing AI will shift from app features to a structural layer by 2026. There are no concrete product details, but the takeaway is to prepare for wider AI use across code, data pipelines, and ops.

lightbulb

Why it matters

  • Budget, skills, and infra planning should assume more AI-assisted development and data workflows.
  • Governance, testing, and QA expectations will rise as AI touches more production paths.
science

What to test

  • Pilot AI code-assist with guarded write permissions and measure PR quality, cycle time, and defect rates.
  • Add observability and cost tracking for any LLM usage (latency, token cost, error classes) in staging before production.
engineering

Brownfield perspective

  • Introduce AI via wrapper libraries to centralize config, logging, and fallbacks without rewriting core services.
  • Use canary releases and contract tests when adding AI-generated transformations to ETL jobs to protect downstream consumers.
rocket_launch

Greenfield perspective

  • Design evals-first with versioned prompts, deterministic test cases, and clear rollback paths.
  • Abstract model/providers behind retry, caching, and circuit-breaking to allow swap-outs without redesign.

03

Field report: Claude Code paired with Antigravity for faster automation build loops

A practitioner demo shows using Anthropic’s Claude Code alongside an automation tool called Antigravity to rapidly scaffold and iterate on small automation projects. Claude Code is used for multi-file code generation/refactoring, while Antigravity handles wiring tasks and running automations, compressing idea-to-demo cycles for integrations and scripts.

lightbulb

Why it matters

  • AI coding environments are being used for repo-aware, multi-file changes rather than just autocomplete.
  • Combining LLM coding with an orchestration tool can speed delivery of integration glue and small services.
science

What to test

  • Pilot Claude Code (or equivalent) on a low-risk service to assess multi-file change quality, unit/integration test stability, and code-review overhead.
  • Trial an orchestration/automation framework for integration jobs, verifying observability, retries, idempotency, and secrets management.
engineering

Brownfield perspective

  • Gate AI-generated diffs with strict CI (type checks, linters, contract tests) and introduce them behind feature flags to protect existing services.
  • Start with isolated ETL or integration tasks and enforce coding standards/templates so AI output matches the current codebase.
rocket_launch

Greenfield perspective

  • Adopt repo templates with clear module boundaries, strong tests, and CI/IaC scaffolds so AI can generate repeatable components safely.
  • Favor small services with clean interfaces and contract tests to let AI assistants refactor and extend with less risk.

04

Unofficial: Claude Code update adds sub-agents and LSP support

An unofficial YouTube walkthrough claims a new Claude Code update bringing sub-agent orchestration, a higher-capability "Claude Ultra" model, and IDE integration via the Language Server Protocol. These details are not yet in Anthropic’s official docs, so treat as tentative and verify availability in your Anthropic Console before planning adoption.

lightbulb

Why it matters

  • If accurate, sub-agents could automate multi-step coding tasks (gen, review, tests) and reduce cycle time.
  • LSP support would enable editor-native AI assistance across IDEs without bespoke plugins.
science

What to test

  • Confirm feature availability in your Anthropic org and A/B test code generation/refactor quality vs current models on a backend/data pipeline repo.
  • Prototype a minimal agent chain (generator + reviewer/tester) and measure defect rate, latency, and token costs.
engineering

Brownfield perspective

  • Validate LSP-based suggestions against existing lint/format hooks, pre-commit, and PR policies to avoid churn.
  • Run a guarded pilot on a non-critical service with PR-bot gating and action logs for agent steps.
rocket_launch

Greenfield perspective

  • Adopt an agentic template (spec -> code -> tests) with prompts and evals versioned in-repo from day one.
  • Standardize IDEs via LSP config and set reliability/cost SLOs before scaling to more services.

05

Copilot Money adds a brand-new web app alongside iOS/iPadOS/macOS

A sponsored video announces Copilot Money now has a web app in addition to its iOS, iPadOS, and macOS clients, expanding access via browsers. Details are light, but the substantive update is cross-platform availability with a new browser client.

lightbulb

Why it matters

  • A web client increases concurrent usage and API load, requiring tighter scalability and performance controls.
  • Cross-platform access expands the surface area for auth, sessions, and API consistency.
science

What to test

  • Use AI to generate and maintain API contract tests that validate parity across web and native clients.
  • Use AI-driven synthetic E2E tests in headless browsers to catch cross-platform regressions and auth/session issues.
engineering

Brownfield perspective

  • Audit API versioning, rate limits, and CORS before opening web access; gate new endpoints behind feature flags.
  • Harden auth flows for browser sessions (e.g., PKCE/OAuth2) without breaking existing mobile clients.
rocket_launch

Greenfield perspective

  • Design web-first REST APIs with strict schemas and generated clients to enforce contract parity from day one.
  • Adopt shared domain models and idempotent write patterns to keep state consistent across platforms.
link Sources
youtube.com youtube.com

06

Prompt scaffolding pattern for GLM-4.7 coding: "KingMode" + task-specific skills

A recent tutorial shows a prompt scaffolding approach for GLM-4.7 that combines a strong system prompt ("KingMode") with task-specific "skills" blocks to guide coding work. The pattern emphasizes separating general reasoning from concrete task instructions, which may help mid-tier models perform more reliably on code tasks. Treat it as a reusable prompt template to evaluate against your existing workflows.

lightbulb

Why it matters

  • Structured prompts can make lower-cost models more usable for code generation and maintenance.
  • Standardized templates improve reproducibility and make model swaps easier.
science

What to test

  • Benchmark GLM-4.7 with and without a structured system prompt across backend tasks (bug fixes, tests, refactors), tracking pass@1, runtime errors, and latency.
  • Try a "skills" layout: modular instruction blocks for API design, SQL/ETL tuning, and error handling; compare outcomes vs monolithic prompts.
engineering

Brownfield perspective

  • Integrate GLM-4.7 behind your existing LLM provider interface and enable via feature flag on a few services first.
  • Add guardrails (compile/test loops, repo-scoped context, policy checks) to catch hallucinations before PRs affect legacy code.
rocket_launch

Greenfield perspective

  • Adopt standardized prompt templates from day one and version them alongside code with an evaluation harness.
  • Define tool-calling and retrieval contracts early (schemas, context limits) so prompts remain model-agnostic and portable.
link Sources
youtube.com youtube.com

07

2026 Workflow: From Writing Code to Forensic Engineering

A recent video argues engineers will spend less time hand-writing code and more time specifying behavior, generating tests, and verifying AI-produced changesβ€”"forensic engineering." For backend/data teams, this means using AI to read large codebases and pipelines, propose patches, and auto-generate characterization tests, while humans review traces, diffs, and test outcomes.

lightbulb

Why it matters

  • Shifts effort from implementation to verification, potentially speeding delivery on complex or legacy codebases.
  • Emphasizes tests and traceability to reduce regression risk from AI-generated changes.
science

What to test

  • Pilot AI-driven characterization test generation on a critical service or pipeline and measure flakiness and coverage deltas.
  • Run an LLM-assisted PR workflow (AI proposes patch + tests), gate on CI, and track review time and defect escape rate.
engineering

Brownfield perspective

  • Start with read-heavy, stable modules: use AI to summarize behavior and suggest tests, then lock with golden datasets.
  • Expect flaky tests and missing specs; add contracts (types, schemas, invariants) and observability before widening scope.
rocket_launch

Greenfield perspective

  • Adopt contract-first APIs and schemas with machine-readable specs to feed AI agents from day one.
  • Build CI lanes for AI-suggested changes (sandbox runs, canaries, rollbacks) with mandatory test traceability.
link Sources
youtube.com youtube.com

08

DIY Gemini voice agents without paid SaaS

A YouTube demo shows building a basic voice agent using Google’s Gemini without relying on $497/month platforms. It wires speech input/output around an LLM loop to handle simple tasks, implying teams can prototype quickly and keep costs under control.

lightbulb

Why it matters

  • Direct API use can cut vendor lock-in and recurring per-seat fees.
  • Owning the pipeline improves control over latency, data handling, and observability.
science

What to test

  • Spike a minimal voice agent and benchmark end-to-end latency, error rates, and cost per minute under load.
  • Add guardrails (input validation, safety filters) and test failure modes, retries, and human handoff.
engineering

Brownfield perspective

  • Plan integration with existing telephony/IVR, CRM, and logging stacks, and map data flows for PII compliance.
  • Pilot a side-by-side rollout with current voice-bot vendor and compare QoS, costs, and ops burden before migration.
rocket_launch

Greenfield perspective

  • Start with a reusable template that abstracts speech I/O, intent routing, and tool calls behind clear interfaces.
  • Design for streaming-by-default, structured outputs, and metrics tracing from day one.
link Sources
youtube.com youtube.com

09

Treat AI Roundups as Leads, Not Facts

Two duplicate YouTube roundup videos hype 'insane AI news' without concrete sources or technical detail. Use such content as a starting point only: verify claims via vendor release notes, SDK changelogs, or docs. Make SDLC changes only after controlled tests on your workloads.

lightbulb

Why it matters

  • Unverified AI claims can cause churn, break builds, or trigger costly experiments with little value.
  • A lightweight verification workflow reduces risk and protects delivery timelines.
science

What to test

  • Build an eval harness with golden datasets to check accuracy, latency, cost, and safety when upgrading models/SDKs.
  • Pin versions and run canary CI on provider/model upgrades; track regressions before rollout.
engineering

Brownfield perspective

  • Abstract AI provider calls behind interfaces with feature flags and circuit breakers to enable fast rollback or swap.
  • Backfill evals for existing critical prompts and data transforms so regressions are measurable and auditable.
rocket_launch

Greenfield perspective

  • Bake evals into CI from day one, version prompts, and choose providers with stable model versioning and SLAs.
  • Design AI stages in pipelines to be idempotent with telemetry for latency, cost, and quality per step.
link Sources
youtube.com youtube.com

10

When an AI β€˜Breakthrough’ Is a Risk Signal, Not a Feature

A recent video argues that not every AI breakthrough is good for engineering teams, highlighting potential reliability, safety, and cost risks. Treat novel LLM capabilities as untrusted until proven with evals and guardrails, especially before putting them into CI/CD or auto-test generation.

lightbulb

Why it matters

  • Risky AI features can silently degrade quality, inflate costs, or introduce security gaps.
  • Without evals and governance, CI/CD pipelines can amplify bad outputs into production.
science

What to test

  • Stand up offline evals with golden datasets to track accuracy, latency, cost, and regression before rollout.
  • Red-team prompts for jailbreaks and prompt injection, and measure flakiness/mutation score of AI-generated tests.
engineering

Brownfield perspective

  • Gate LLM features behind flags with fallbacks and circuit breakers, and add prompt/response logging with PII scrubbing.
  • Canary new AI behaviors to a small traffic slice and enforce error budgets tied to eval metrics.
rocket_launch

Greenfield perspective

  • Design the eval harness first (metrics, datasets, thresholds) and codify prompts/templates as versioned artifacts.
  • Choose a provider strategy (hosted vs self-hosted) with clear SLAs, token budgets, and rollback paths.
link Sources
youtube.com youtube.com

11

Fix Source Ingestion: Deduplicate and Relevance-Filter YouTube Inputs

The input set contains the same YouTube video twice and content unrelated to backend/AI-in-SDLC, exposing gaps in our ingestion pipeline. Add deterministic deduplication by YouTube videoId and a lightweight relevance classifier on titles/descriptions to filter off-topic items. This reduces noise before summarization and speeds editorial review.

lightbulb

Why it matters

  • Cuts reviewer time and model token spend on irrelevant media.
  • Improves trust in automated digests and downstream metrics.
science

What to test

  • Compare LLM zero-shot vs. a small supervised classifier over embeddings for relevance on a labeled set.
  • Evaluate exact videoId matching vs. embedding-based near-duplicate detection to catch re-uploads and playlist variants.
engineering

Brownfield perspective

  • Insert a pre-processing stage in the existing ETL to run in shadow mode and report precision/recall before enforcing drops.
  • Route uncertain items to a quarantine queue and use human feedback to retrain the classifier weekly.
rocket_launch

Greenfield perspective

  • Model ingestion around canonical IDs (YouTube videoId) with content hashes and explicit source provenance in the schema.
  • Define SLOs for relevance precision/recall and gate deploys with automated evaluation in CI.
link Sources
youtube.com youtube.com

12

Evaluate Google NotebookLM for source-grounded answers over engineering docs

A third-party video highlights new NotebookLM updates, but details are not from an official source. Regardless, NotebookLM already provides grounded Q&A, summaries, and outlines over your uploaded sources (e.g., PDFs, docs), which can streamline spec reviews, runbook lookup, and onboarding. Verify any "new features" against the official product page before planning adoption.

lightbulb

Why it matters

  • Speeds up design reviews and onboarding by answering questions directly from your team’s docs with citations.
  • Offers a low-effort alternative to building and maintaining an internal RAG stack for knowledge retrieval.
science

What to test

  • Pilot on a read-only subset of design docs and runbooks; measure answer accuracy and citation fidelity against a controlled checklist.
  • Validate data handling, access controls, and retention with your compliance team, and test ingestion paths from PDFs, Google Docs, and exported Confluence pages.
engineering

Brownfield perspective

  • Start by exporting critical RFCs/runbooks to a secure, versioned corpus and ensure citations map cleanly back to canonical sources.
  • Keep it out of production workflows until hallucination rates, source freshness, and permission scoping meet your thresholds.
rocket_launch

Greenfield perspective

  • Adopt docs-as-data from day one with clear source curation, tagging, and ownership to maximize retrieval quality.
  • Standardize templates for PRDs/RFCs so summaries and Q&A stay consistent and traceable across projects.
link Sources
youtube.com youtube.com

13

Reverse‑engineering insights into Claude Code’s agent architecture

PromptLayer’s Jared Zoneraich independently analyzes how Claude Code likely works: a tool-calling agent that reads/writes files and runs local commands, guided by a lightweight workspace index to decide what to load into context. The talk walks through observed behaviors, latency/cost tradeoffs, and practical guardrails for using a code agent on real repos. Findings are not officially endorsed by Anthropic, but provide concrete patterns to pilot safely.

lightbulb

Why it matters

  • Clarifies how a code agent actually touches your filesystem and shell, informing guardrails, logging, and permissions.
  • Highlights scaling constraints (repo size, context management, multi-file edits) that affect backend/data monorepos.
science

What to test

  • Enable verbose logging to review tool calls (file reads/writes, command exec) and inspect outbound payload size and scope.
  • Benchmark on a representative repo: measure latency, token use, and multi-file diff accuracy across read-only vs write modes.
engineering

Brownfield perspective

  • Start in a fork with read-only defaults, directory/command allowlists, and secrets filtering; gate writes via CI tests/linters.
  • Exclude large/binary/data directories and generate a code index (e.g., ripgrep/ctags) to improve retrieval without ballooning context.
rocket_launch

Greenfield perspective

  • Structure repos with clear task runners (Makefile/Invoke/NPM scripts), small modules, and high-signal docs for agent grounding.
  • Bake in fast tests and example workflows so the agent can run local feedback loops (build, test, format) reliably.
link Sources
youtube.com youtube.com

14

Claude Code vs Codex: pick by workflow fit

An HN thread discusses a blog post arguing that different AI coding assistants suit different working styles: Codex is described as more hands-off while Claude Code is more hands-on. The author suggests teams try both for a week to see which aligns with their habits, but provides no benchmarks or concrete examples. Treat the takeaway as guidance to run a structured trial, not as evidence of superiority.

lightbulb

Why it matters

  • Tool fit with developer workflow often drives ROI more than headline model quality.
  • A short, structured bake-off can prevent tool churn and mismatched expectations.
science

What to test

  • Run a 1–2 week A/B on representative backend/data tasks; track cycle time, review rework, defects, and suggestion usefulness.
  • Verify repo indexing, context handling, and security controls (secrets redaction, least-privilege access) in IDE and CI.
engineering

Brownfield perspective

  • Pilot in a contained service with feature flags and enforce AI changes behind tests and code review to match existing patterns.
  • Check compatibility with monorepo layout, build tooling, and CI annotations to avoid noisy diffs or brittle suggestions.
rocket_launch

Greenfield perspective

  • Standardize prompts, scaffolds, and guardrails early so assistants generate consistent service and pipeline templates.
  • Choose assistants based on whether the project needs iterative prototyping (hands-on) or checklist-driven flow (hands-off).
link Sources
news.ycombinator.com

15

Claude Code teases AI-powered terminal for dev workflows

An unofficial write-up claims new Claude Code features focused on an AI-powered terminal for development workflows. For backend/data teams, this points to AI assistance directly in the CLI, potentially reducing context switching for scripting, data tasks, and ops; validate via a small pilot given the lack of official details.

lightbulb

Why it matters

  • CLI-first AI can speed common backend/data tasks like migrations, ETL scripts, packaging, and incident checks.
  • Terminal-based assistance reduces IDE dependence and fits server-side workflows.
science

What to test

  • Run a pilot in isolated devcontainers with read-only/dry-run modes and audit logging to assess safety and accuracy.
  • Benchmark time-to-complete and error rates for routine tasks (data migrations, Docker builds, kubectl ops) with and without the AI terminal.
engineering

Brownfield perspective

  • Integrate via a wrapper that logs prompts, generated commands, and outputs to existing observability, and require approval before write actions.
  • Scope credentials and filesystem access tightly and target staging clusters or sampled datasets to avoid destructive changes.
rocket_launch

Greenfield perspective

  • Start with ephemeral devcontainers, least-privilege tokens, and policy-based command execution.
  • Version prompt templates for common workflows (e.g., db migrations, DAG scaffolding) alongside code to standardize usage.

16

WSL2 builds of the Continue VS Code extension ship Linux binaries, break on Windows

Building the Continue VS Code extension (VSIX) from WSL2 packages Linux-native binaries (sqlite3, LanceDB, ripgrep), and the extension fails to activate on Windows with "not a valid Win32 application." The prepack step targets the current platform; trying a win32 target from Linux fails due to missing Windows artifacts (e.g., rg.exe), indicating the need for cross-target packaging or universal bundles.

lightbulb

Why it matters

  • Many devs and CI build on Linux while running VS Code on Windows, so mismatched native modules can silently break AI tooling and slow iteration.
  • Robust cross-target builds improve reproducibility for any extension or Node project with native dependencies.
science

What to test

  • Add CI to package VSIX for win32-x64 from Linux and run activation smoke tests on a Windows runner to verify native module loading.
  • Validate packaging fetches the correct platform binaries for sqlite3/LanceDB/ripgrep and fails fast if artifacts are missing.
engineering

Brownfield perspective

  • Update build scripts to download platform-specific prebuilt binaries during cross-target packaging and document WSL2 build constraints.
  • As a stopgap, require marketplace installs or native Windows builds for local VSIX testing from WSL2.
rocket_launch

Greenfield perspective

  • Prefer dependencies with cross-platform prebuilds or WASM to avoid .node binaries in the extension host.
  • Set up a multi-target release matrix (win32, linux, arm64) with activation tests in Windows, WSL Remote, and Linux.
link Sources
github.com

17

Replit ships Enterprise Security Center and ChatGPT app-building; Agent first build now 3–5 min

Replit introduced an Enterprise Security Center that scans all org Replit Apps for CVEs across dependencies, shows affected apps, and exports SBOMs. A new Replit ChatGPT App lets you build and publish Replit Apps directly from a ChatGPT conversation. The Agent "Fast Build" upgrade cuts first-build time from 15–20 minutes to 3–5 minutes and aligns build-mode design quality with design mode.

lightbulb

Why it matters

  • Org-wide CVE visibility and SBOM export reduce supply-chain risk and simplify compliance.
  • Faster agent builds and ChatGPT-based app creation can speed prototyping and internal tool delivery.
science

What to test

  • Pilot the Replit ChatGPT App to generate a small internal service and measure code quality, latency, and deployment handoff.
  • Run Security Center scans on a sample workspace, validate CVE coverage vs your existing SCA, and test SBOM export integration with your risk tooling.
engineering

Brownfield perspective

  • If parts of your stack run on Replit Apps, integrate Security Center SBOMs into your current vulnerability management pipeline and compare findings with your SCA.
  • Assess how ChatGPT-driven builds fit with existing repos, secrets, and CI gates, and define review controls to avoid bypassing standards.
rocket_launch

Greenfield perspective

  • Use the ChatGPT App plus Fast Build to bootstrap new services, then harden with templates that enforce linting, tests, and IaC from day zero.
  • Enable Security Center early and treat SBOM export as a required artifact in CI to support audits and incident response.
link Sources
docs.replit.com

Subscribe to Newsletter

Don't miss a beat in the AI & SDLC world. Daily updates.