terminal
howtonotcode.com

Stories by Tags

Search and filter stories across all digests by tags. Stories must match all selected tags.

Stories with tags: vllm, vllm

Showing 1-3 of 3

DeepSeek open models: worth a backend/RAG benchmark

article Daily Digest calendar_today 2025-12-26 Daily

A community post claims a free "DeepSeek V3.2" outperforms top closed models, but the source provides no verifiable details. Regardless, DeepSeek’s open models are mature enough to justify a brief, task-focused benchmark on code generation, test scaffolding, and RAG to gauge quality, latency, and co...

Speculative decoding: 3x faster LLM serving with a draft-and-verify path

article Daily Digest calendar_today 2025-12-25 Daily

Speculative decoding runs a small draft model to propose tokens and uses the main model to verify them, keeping outputs identical to baseline while cutting latency. Expect up to ~3x speedups when the draft model’s proposals have high acceptance; tune draft size and propose steps to hit the sweet spo...