Daily Digest - howtonotcode.com

01

Demo: six 'Skills' in Claude Code for IDE workflows

sell claude sell claude-code sell vscode sell code-generation sell sdlc

A creator demo shows six 'Skills' in Claude Code that package repeatable coding actions inside the IDE. The video focuses on using pre-configured skills to streamline common tasks without leaving the editor; this is a user demo, not official docs.

lightbulb

Why it matters

Cuts context switching by running routine edits and explanations inside the IDE.
Provides a repeatable way to standardize prompts/actions across a team.

science

What to test

Run a 1-2 week pilot on a small service repo measuring PR turnaround time, diff accuracy, and test stability when using Skills.
Start read-only, then enable write/edit with code-owner review, static analysis, and secret scanning gates.

engineering

Brownfield perspective

Enable Skills on non-critical modules first and watch for multi-file edit or monorepo path issues.
Curate a minimal, approved skills catalog aligned to your linters, formatters, and test runners.

rocket_launch

Greenfield perspective

Bake a shared Skills catalog and editor config into your project template to standardize usage from day one.
Adopt clear module and test layout to help the assistant reason about code and generate cleaner diffs.

link Sources

youtube.com youtube.com youtube.com

02

GLM 4.7 release emphasizes coding agents and tool-use

sell glm sell zhipuai sell huggingface sell code-generation sell tool-use

A recent video claims GLM 4.7 improves coding agents and tool-use, suggesting open models are closing gaps with closed alternatives. No official release notes were provided in the source, so treat this as preliminary and validate against your workloads.

lightbulb

Why it matters

If accurate, stronger codegen and tool-use could reduce cost and vendor lock-in via self-hosted or open-weight options.
Backend teams may gain better function-calling reliability for API orchestration and data workflows.

science

What to test

Run a bakeoff on backend tasks (API handlers, ETL/DAG scaffolding, SQL generation) and track pass@k, diff/revert rates, latency, and cost versus your current model.
Evaluate tool-use/function-calling with your existing JSON schema, checking JSON validity, call ordering, error recovery, and idempotency.

engineering

Brownfield perspective

Integrate behind a provider-agnostic interface and use an inference server to expose a consistent API to minimize code changes.
Validate tokenizer behavior, context window, and timeout/rate-limit policies to avoid regressions in pagination, SQL, and logging paths.

rocket_launch

Greenfield perspective

Standardize function-calling schemas and retry/backoff policies early, and instrument tool-call accuracy and JSON error rates.
Build an eval harness that runs repo-level codegen, SQL tests, and latency/cost tracking for model selection and continuous monitoring.

link Sources

youtube.com youtube.com youtube.com

03

Speculative decoding: 3x faster LLM serving with a draft-and-verify path

sell vllm sell tensorrt-llm sell huggingface-transformers sell speculative-decoding sell llm-inference

Speculative decoding runs a small draft model to propose tokens and uses the main model to verify them, keeping outputs identical to baseline while cutting latency. Expect up to ~3x speedups when the draft model’s proposals have high acceptance; tune draft size and propose steps to hit the sweet spot.

lightbulb

Why it matters

Reduces p95 latency and infra cost for AI endpoints without changing output quality.
Improves throughput under load, enabling higher QPS or smaller fleets.

science

What to test

A/B enable speculative decoding and measure acceptance rate, tokens/sec, p95 latency, and exact output diffs against baseline.
Sweep draft model size and max-propose steps to maximize acceptance and minimize cost while preserving determinism and streaming behavior.

engineering

Brownfield perspective

Adopt via serving platforms that support it (e.g., vLLM, TensorRT-LLM) behind a feature flag with detailed telemetry for acceptance rate and fallbacks.
Validate interactions with batching, caching, streaming, and autoscaling to avoid regressions and resource contention from the extra draft model.

rocket_launch

Greenfield perspective

Choose a serving stack with native speculative decoding and build observability (acceptance rate, throughput, cost) from day one.
Pick a cheap draft model closely aligned with the target model to maximize acceptance and simplify capacity planning.

link Sources

youtube.com youtube.com youtube.com

04

GLM-4.7: free in-browser access to a strong open model

sell glm-4.7 sell zhipuai sell python sell code-generation sell model-evaluation sell sdlc

A new GLM-4.7 model is being promoted as open-source and usable free in the browser with no install. It’s a low-friction way to trial an alternative LLM for coding and backend automation, but you should verify license, data handling, and performance before relying on it.

lightbulb

Why it matters

Provides a low-cost alternative to GPT/Claude for code assistance and backend tasks.
Could reduce rate-limit and cost constraints if performance is acceptable.

science

What to test

Run your internal eval set (code gen, SQL, log triage) comparing GLM-4.7 vs your current model; track pass@k, latency, and cost.
Validate license, data retention/telemetry, and API/browser usage terms; prefer self-hosting if permitted.

engineering

Brownfield perspective

Introduce a provider abstraction so GLM-4.7 can be swapped in without large refactors; check context window/tokenization impacts on prompts.
Canary on non-critical paths (lint/PR comments, docs) and compare regression vs baseline before broader rollout.

rocket_launch

Greenfield perspective

Design with an LLM router and eval harness from day one; keep prompts/tools model-agnostic.
If open weights are available, containerize deployment with observability and quotas; otherwise front the hosted API with a rate-limited proxy.

link Sources

youtube.com youtube.com

05

Claude Skills: Templatize repeatable dev and ops tasks

sell claude sell anthropic sell prompt-engineering sell sdlc sell code-review

A step-by-step walkthrough shows how to create reusable "Skills" in Claude to standardize prompts for recurring work. Teams can codify instructions for tasks like PR review checklists, incident triage, or data pipeline QA so outputs become more consistent and faster to produce.

lightbulb

Why it matters

Reusable skills reduce prompt drift and improve consistency across code and data workflows.
Standardized instructions make it easier to audit and scale AI-assisted tasks across teams.

science

What to test

Pilot a PR review Skill with explicit checklists (security, migrations, DB changes) and measure comment precision, false positives, and time saved.
Create a data pipeline QA Skill that validates schema changes and alert thresholds, and compare results against existing runbooks.

engineering

Brownfield perspective

Start by using Skills in non-blocking stages (draft reviews, runbook generation) and compare against current processes before adding gates.
Document approved Skill prompts and inputs in your repo/docs to control variation and ensure compliance with data-handling policies.

rocket_launch

Greenfield perspective

Define a small set of canonical Skills (PR review, test planning, migration checklist) during project setup and make them part of onboarding.
Version Skills alongside project docs and regularly evaluate output quality with lightweight acceptance criteria.

link Sources

youtube.com youtube.com

06

Prioritize small, fast LLMs for production; reserve frontier models for edge cases

sell openai sell python sell model-routing sell rag sell cost-optimization

A recent analysis argues that fast, low-cost "flash" models will beat frontier models for many production workloads by 2026 due to latency SLOs and total cost. For backend/data engineering, pairing smaller models with retrieval, tools, and caching can meet quality bars for tasks like SQL generation, log summarization, ETL scaffolding, and runbook assistance, with frontier models used only when needed.

lightbulb

Why it matters

Latency, throughput, and cost constraints often cap the value of frontier models in backend services.
A model-routing strategy can cut spend while maintaining quality for common SDLC and data tasks.

science

What to test

Run offline evals and canary A/Bs comparing small vs frontier models on your top tasks (SQL, code fixes, schema mapping), tracking quality, tail latency, and cost per request.
Test routing policies: default to a small model with RAG/tools and auto-escalate to a frontier model on confidence/uncertainty or timeouts.

engineering

Brownfield perspective

Introduce a model abstraction layer and router in existing services, with feature-flagged fallbacks to current frontier defaults.
Migrate prompts and tool schemas to be model-agnostic; add telemetry for quality, latency, cost, and escalation rates to avoid regressions.

rocket_launch

Greenfield perspective

Design for model-agnostic interfaces from day one and choose a small-model default with streaming, caching, and RAG built in.
Automate evals in CI/CD with task-specific test sets and budget guards so routing changes cannot blow SLOs or costs.

link Sources

youtube.com youtube.com

07

NotebookLM adds structured data tables; Gemini 3 upgrade reported

sell notebooklm sell gemini sell google-sheets sell rag sell sdlc

Two creator videos report that Google NotebookLM now supports structured data tables and has been upgraded to Gemini 3. If accurate, this should improve table-aware reasoning and make it easier to analyze spreadsheets/CSVs directly inside NotebookLM; confirm details in official docs before relying on it.

lightbulb

Why it matters

Structured tables plus a stronger model could speed exploratory analysis and dataset documentation.
Better table reasoning may reduce manual prototyping and back-and-forth for data Q&A.

science

What to test

Benchmark table Q&A accuracy on your schemas with edge cases (NULLs, joins, mixed units) using CSV/Sheets.
Validate reproducibility and provenance: can outputs be cited back to exact source files; measure latency and any size/rate limits.

engineering

Brownfield perspective

Pilot on exported, non-PII snapshots from existing pipelines and keep governed BI dashboards as the source of truth.
Assess integration friction: source file formats, update cadence, and how NotebookLM handles schema changes over time.

rocket_launch

Greenfield perspective

Design a lightweight flow: curated data snapshots -> NotebookLM sources -> prompt templates -> human-reviewed handoff to code/docs.
Start with small, non-sensitive datasets and define acceptance criteria for AI-generated analyses and summaries.

link Sources

youtube.com youtube.com

08

Hands-on demo: Coding with GLM 4.7 for AI-in-the-loop development

sell glm-4.7 sell python sell sql sell code-generation sell sdlc

A community video shows using GLM 4.7 to write and iterate on code, highlighting a practical generate-run-fix loop and the importance of grounding the model with project context. While there are no official release notes in the source, the workflow demonstrates how to use an LLM as a coding assistant for everyday tasks without heavy agent frameworks.

lightbulb

Why it matters

It shows a low-friction pattern to add LLMs to day-to-day coding without changing your stack.
Grounding prompts with repo and task context remains the difference between helpful and noisy outputs.

science

What to test

Run a small bake-off where GLM 4.7 generates an API handler, unit tests, and a SQL query fix, then measure review diffs, runtime correctness, and edit distance to final code.
Evaluate latency and context limits by prompting with real repo snippets (e.g., service configs, schema files) and verify reproducibility via fixed prompts and seeds.

engineering

Brownfield perspective

Start in non-critical services and gate LLM-generated diffs behind CI checks for tests, lint, and security scans.
Control context ingestion (no secrets/PII), and add a fallback plan when the model output diverges from house style or architecture.

rocket_launch

Greenfield perspective

Structure repos for AI-readability (clear READMEs, task-oriented docs, example configs) and add eval suites from day one.
Adopt prompt templates for common backend/data tasks (CRUD endpoints, ETL steps, schema migrations) and track outcomes in CI.

link Sources

youtube.com youtube.com

09

From “AI agency in 24 minutes” to an internal AI MVP

sell llm-apis sell python sell sdlc sell observability sell ci/cd

A short video demonstrates standing up a minimal AI service in about 24 minutes by scoping a single use case and wiring an LLM-backed workflow end-to-end. For teams, the practical takeaway is to time-box a thin slice, use off‑the‑shelf components, and ship a measurable demo with basic instrumentation for latency, cost, and quality.

lightbulb

Why it matters

Rapid prototyping de-risks AI bets and validates value before deeper integration.
A thin-slice demo clarifies data needs, guardrails, and operational SLAs early.

science

What to test

Stand up evals on representative data to track quality regressions, prompt drift, and failure modes.
Instrument end-to-end latency and per-request cost with alerts and budgets tied to usage.

engineering

Brownfield perspective

Introduce the AI step as a sidecar or async worker with feature flags and safe fallbacks to avoid breaking the critical path.
Capture prompts, responses, and traces with PII redaction and versioned prompts/models to support audits and rollbacks.

rocket_launch

Greenfield perspective

Hide the model behind an interface so you can swap providers and prompt versions without API changes.
Bake in observability (traces, eval dashboards, cost metrics) and canary users before broad rollout.

link Sources

youtube.com youtube.com

10

Tutorial: Generate a static site in Google AI Studio and deploy to Hostinger with a custom domain

sell google-ai-studio sell hostinger sell javascript sell static-sites sell code-generation

A step-by-step video shows how to use Google AI Studio to generate a simple website, export the code, deploy it to Hostinger, and map a custom domain. The workflow demonstrates prompt-driven code generation for static HTML/CSS/JS and a basic hosting setup without a framework.

lightbulb

Why it matters

Teams can spin up internal docs or landing pages quickly without pulling frontend resources.
It’s a practical path to evaluate AI-generated code quality, security, and deployment fit within existing infra.

science

What to test

Add CI checks (linters, accessibility, Lighthouse, SAST) for AI-generated HTML/CSS/JS before deployment.
Version prompts and outputs to track changes and verify reproducibility across generations.

engineering

Brownfield perspective

Pilot on static sections (docs/status pages) and ensure routing, auth, and CDN/DNS stay consistent with existing standards.
Integrate hosting with current CI/CD, monitoring, and WAF/CDN policies to avoid shadow infrastructure.

rocket_launch

Greenfield perspective

Use AI Studio to scaffold a static starter, then formalize with a static site generator and IaC for hosting/DNS.
Define coding standards, prompt templates, and acceptance tests upfront to constrain AI output.

link Sources

youtube.com youtube.com

11

CodeRabbit report: Don’t auto-approve AI-generated PRs

sell coderabbit sell claude sell cursor sell code-review sell ai-generated-code

A video summary of CodeRabbit’s recent report cautions against rubber-stamping AI-authored pull requests from tools like Claude, Cursor, or Codex. The core guidance is to treat AI changes as untrusted code: require tests, run full CI, and perform normal, skeptical review. Label AI-originated PRs and add explicit gates to prevent subtle defects from slipping through.

lightbulb

Why it matters

AI-generated code can look correct while hiding subtle defects that raise incident risk.
Stricter review gates and observability reduce rework and production issues.

science

What to test

Label AI-authored PRs and require diff coverage thresholds, static analysis, and security scans before merge.
Track defect density, revert rate, and MTTR for AI vs human PRs over a sprint to quantify impact.

engineering

Brownfield perspective

Update PR templates to require a test plan and risk notes for AI-assisted changes, and enforce CI gates without exceptions.
Enable repo rules to block merges when AI PRs miss diff coverage or fail SAST checks.

rocket_launch

Greenfield perspective

Bake in AI PR labeling, small-PR policy, and mandatory tests from day one with precommit hooks and CI templates.
Prefer stacks with strong typing and linters to constrain AI mistakes and simplify review.

link Sources

youtube.com youtube.com

12

Track Windsurf Editor updates via its public changelog

sell windsurf sell codeium sell git sell python sell sdlc sell code-generation

Windsurf maintains a public changelog for its AI-powered editor, which is the canonical place to see recent fixes and feature changes. Treat this as the source for planning rollouts that may affect coding assistance, editor behavior, and integrations. Establish a lightweight review-and-test step before bumping versions team-wide.

lightbulb

Why it matters

AI editor updates can change suggestions, indexing, telemetry, and defaults, impacting productivity and compliance.
A monitored changelog enables scheduled upgrades with preflight checks instead of surprise breakage.

science

What to test

Run smoke tests on representative repos to validate agent behavior, completion quality, and multi-file edits before upgrading.
Verify privacy controls (offline mode, network egress), secret redaction, and telemetry settings after each update.

engineering

Brownfield perspective

Pin editor versions, roll out by cohort, and check compatibility with existing extensions, language servers, and CI formatting/linting gates.
Watch for changes that affect commit message generation, code actions, or formatting that could cause noisy diffs or CI failures.

rocket_launch

Greenfield perspective

Standardize editor configs and model/provider selection early, and automate update checks against a benchmark repo with quality gates.
Store repo-level prompts and coding conventions in-repo to keep agent behavior consistent across teams.

link Sources

windsurf.com

13

On-device LLMs: running models on your phone

sell llama-cpp sell mlc-llm sell android sell on-device-inference sell quantization

A hands-on guide shows how to deploy and run a compact LLM directly on a smartphone, outlining preparation of a small model, on-device runtime setup, and practical limits around memory, thermals, and latency. For backend/data teams, this validates edge inference for select tasks where low latency, privacy, or offline capability outweighs the accuracy gap of smaller models.

lightbulb

Why it matters

On-device inference can cut tail latency and cloud costs while improving privacy for sensitive prompts.
Edge+cloud split becomes a viable architecture: small local models for fast paths, server models for complex fallbacks.

science

What to test

Benchmark token throughput, latency, and battery/thermal behavior across 4-bit vs 8-bit quantization on target devices.
Validate functional parity and fallback logic between on-device and server models, including prompt compatibility and safety filters.

engineering

Brownfield perspective

Introduce an edge-inference feature flag and A/B test routing some requests to on-device models with telemetry for quality and SLA impact.
Plan model distribution, versioning, and license compliance in your mobile release pipeline, and cache/purge strategies for weights.

rocket_launch

Greenfield perspective

Design a mixed edge/cloud architecture from day one with clear model selection rules, offline modes, and privacy-by-default data handling.
Choose a mobile-friendly runtime and quantized model format early, and standardize benchmarks for device classes you support.

link Sources

blog.dailydoseofds.com

14

Inside AI coding agents: supervisors, tools, and sandboxed execution

sell claude-code sell anthropic sell python sell agents sell sdlc

Modern coding agents wrap multiple LLMs: a supervisor decomposes work and tool-using workers edit code, run commands, and verify results in loops. They operate either locally with OS-level permissions or in sandboxed cloud containers preloaded with your repo to run tests and linters safely. Effective use hinges on permissioning, repeatable environments, and testable tasks.

lightbulb

Why it matters

Agents can autonomously change code and run commands, so security, tooling, and review gates must be explicit.
Understanding the supervise-act-verify loop helps you decide where agents fit in CI/CD and how to contain risk.

science

What to test

Run agents in a sandboxed container against a representative service to compare task success, revert rate, and time-to-merge versus human-only baselines.
Evaluate permission models by starting read-only, gradually enabling file writes and a command allowlist, and auditing all actions in CI logs.

engineering

Brownfield perspective

Start in a forked or mirrored repo with sandboxed containers, deny local CLI write/run access to prod paths, and gate outputs via PR-only workflows.
Add agent-friendly scaffolding (Taskfile/Makefile, smoke tests, clear README/setup scripts) so the gather–act–verify loop has reliable context.

rocket_launch

Greenfield perspective

Standardize on deterministic devcontainers, explicit task runners, and comprehensive test harnesses to maximize agent reliability.
Define RBAC and resource limits for agent containers and enforce PR-based merges with automated checks from day one.

link Sources

arstechnica.com

15

QA software testing: tools, automation, and best practices

sell selenium sell github-copilot sell docker sell test-automation sell sdlc

This guide explains core QA testing concepts, where automation fits, and how continuous testing reduces defects and post-release cost. It outlines benefits (cost reduction, performance, higher quality), strategy considerations, and when outsourcing QA can help scale. For backend/data teams, the emphasis is on systematic, automated testing embedded in delivery workflows to prevent issues before they reach production.

lightbulb

Why it matters

Embedding automated QA and continuous testing lowers defect rates and remediation costs.
A clear QA strategy improves reliability and performance for services and data pipelines.

science

What to test

Pilot AI-assisted test generation and measure coverage lift, false positives, and review effort saved.
Gate AI-authored code with contract tests, schema validations, and data-quality checks in CI before deploy.

engineering

Brownfield perspective

Incrementally introduce test automation starting with high-risk services and flaky areas to avoid regressions.
Stabilize tests by standardizing deterministic test data and aligning local, CI, and staging environments.

rocket_launch

Greenfield perspective

Bake automated tests into CI from day one with quality gates for unit, integration, performance, and data checks.
Define a lean QA strategy early, including test data management and clear ownership for test maintenance.

link Sources

hurix.com

Demo: six 'Skills' in Claude Code for IDE workflows

Why it matters

What to test

Brownfield perspective

Greenfield perspective

GLM 4.7 release emphasizes coding agents and tool-use

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Speculative decoding: 3x faster LLM serving with a draft-and-verify path

Why it matters

What to test

Brownfield perspective

Greenfield perspective

GLM-4.7: free in-browser access to a strong open model

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Claude Skills: Templatize repeatable dev and ops tasks

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Prioritize small, fast LLMs for production; reserve frontier models for edge cases

Why it matters

What to test

Brownfield perspective

Greenfield perspective

NotebookLM adds structured data tables; Gemini 3 upgrade reported

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Hands-on demo: Coding with GLM 4.7 for AI-in-the-loop development

Why it matters

What to test

Brownfield perspective

Greenfield perspective

From “AI agency in 24 minutes” to an internal AI MVP

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Tutorial: Generate a static site in Google AI Studio and deploy to Hostinger with a custom domain

Why it matters

What to test

Brownfield perspective

Greenfield perspective

CodeRabbit report: Don’t auto-approve AI-generated PRs

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Track Windsurf Editor updates via its public changelog

Why it matters

What to test

Brownfield perspective

Greenfield perspective

On-device LLMs: running models on your phone

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Inside AI coding agents: supervisors, tools, and sandboxed execution

Why it matters

What to test

Brownfield perspective

Greenfield perspective

QA software testing: tools, automation, and best practices

Why it matters

What to test

Brownfield perspective

Greenfield perspective

Subscribe to Newsletter