NVIDIA
30 days · UTC
Synchronizing with global intelligence nodes...
KV-cache compression upends LLM serving economics: 6x memory cut, no retrain
Google’s TurboQuant claims 6x KV‑cache compression for LLM inference with no retraining, turning memory‑bound GPUs into higher‑concurrency servers. A...
Agentic coding grows up: open‑weights MiniMax M2.7 meets Grok’s tool‑calling workflows
Open-weights MiniMax M2.7 and xAI’s tool-calling Grok push agentic coding from demos to production workflows. NVIDIA detailed the open-weights releas...
Anthropic launches Project Glasswing, giving controlled access to Claude Mythos for vulnerability discovery
Anthropic formed Project Glasswing and is withholding its Claude Mythos Preview model for controlled, defensive use after it found thousands of high‑s...
Anthropic previews Claude Mythos and launches Project Glasswing to weaponize defense against zero‑days
Anthropic previewed Claude Mythos and launched Project Glasswing, claiming the model can autonomously find high‑severity bugs across major OSes and br...
Anthropic’s Mythos and Project Glasswing push AI into real-world vuln discovery, with tight access and strong benchmark signals
Anthropic launched Project Glasswing and a Mythos Preview model that finds serious software bugs, pairing industry partners with restricted access and...
Nvidia buys SchedMD (Slurm), putting the de facto AI/HPC scheduler under one GPU vendor’s roof
Nvidia’s acquisition of SchedMD hands Slurm’s roadmap to a single GPU vendor, triggering concerns about neutrality for mixed-hardware clusters. Per [...
Stop starving your GPUs: make agent rollout a service
Separating I/O-heavy agent rollouts from GPU training nearly doubled coding-agent performance and fixed chronic GPU underutilization. An NVIDIA audit...
Google’s TurboQuant promises 6x KV cache memory cuts and 8x attention speedups; mind the quantization outliers
Google proposed TurboQuant to compress KV caches and speed vector search, reporting big H100 wins with no accuracy drop. Per Google’s claims, TurboQu...
Coding agents in production: architecture choices, reliability budgets, and hitting the brakes
A wave of practitioner write-ups agrees: shipping coding agents is about reliability budgets and the right architecture, not flashy demos. At the AAA...
Build vs. Buy for AI Agents: Ship your own stack, fix prompts, and save the consulting bill
The strongest signal this week: most of your agent deployment work is classic engineering, not consultant magic. A deep teardown argues the five hard...
Google donates llm-d LLM inference gateway to CNCF Sandbox
Google open-sourced llm-d, a Kubernetes-native LLM inference gateway, into the CNCF Sandbox with backing from IBM, Red Hat, NVIDIA, and Anyscale. llm...
Agents are diverging; your backend needs an AI orchestrator, not a single model bet
AI agent strategies are splitting across clouds, local runtimes, and model choices, pushing teams to build orchestration and token-aware backends now....
The desktop agent land grab: OpenClaw, NemoClaw, and the new control plane
Desktop AI agents are the new battleground, with Nvidia pushing OpenClaw and rivals racing to own the orchestration layer. At GTC, Nvidia framed Open...
AI workloads are blowing up cloud bills—time to add GPU guardrails and trial local inference
HashiCorp’s latest data says AI reversed five years of cloud waste declines, and the GPU arms race is making the problem worse. A summary of HashiCor...
Efficiency wave: GPT-5.4 mini lands in ChatGPT, and NVIDIA/Hugging Face ship a real-world SD benchmark
OpenAI is pushing smaller, faster LLMs in ChatGPT while NVIDIA and Hugging Face release a benchmark to measure real speedups from speculative decoding...
Open-weight coding agents hit 60%+ SWE-Bench and get easier to run on-prem
Open-weight coding agents leaped forward as NVIDIA’s Nemotron 3 Super tops SWE-Bench and new research streamlines on‑prem and local runs. NVIDIA unve...
On-device AI steps up: 4B Nemotron, cuTile.jl for Julia, and a faster computer-use agent
NVIDIA and partners just pushed on-device AI forward with a 4B hybrid model, Julia GPU tiles, and a faster computer-use agent. NVIDIA introduced the ...
Enterprise agents grow up: new guardrails for identity, policy, and attack resilience
Agentic AI is getting real guardrails as vendors ship identity, policy, and safety layers to contain tool-using agents. Security research shows auton...
Decouple RL environments from training: NeMo Gym + Unsloth approach, backed by new failure-mode evidence
A new deep dive argues RL teams should separate environment services from the training loop, and fresh research shows why sloppy environments create b...
Agentic retrieval steps up: NVIDIA NeMo tops ViDoRe; hybrid search becomes the RAG default
NVIDIA unveiled a generalizable agentic retrieval pipeline that topped ViDoRe v3 and ranked #2 on BRIGHT, pushing hybrid, agentic RAG beyond pure embe...
NVIDIA’s Nemotron 3 Super targets long-context, cost-heavy agent workloads with a hybrid 120B model and open weights
NVIDIA released Nemotron 3 Super, a 120B-parameter, 12B-active hybrid model with open weights aimed at long-context, cost-efficient autonomous agents....
Local-first AI agents just got real on Linux and the edge
Vendors and open-source projects just made local AI agents practical across Linux laptops, workstations, and new edge boards. AMD’s XDNA drivers now ...
Encoders Are Back: ModernBERT and a push to ditch LLMs for NER and retrieval
Encoders are back in the spotlight for search, NER, and reranking, with ModernBERT and fresh guidance arguing against LLMs for extraction workloads. ...