HUGGING-FACE
30 days · UTC
Synchronizing with global intelligence nodes...
GLM-5.1 lands: MIT-licensed 754B open weights show surprising multi-step code reasoning
Zhipu AI’s GLM-5.1 is a 754B-parameter, MIT-licensed open-weights LLM that shows strong multi-step code reasoning and self-correction. As [Simon Will...
Code agents grow up: CI-scale benchmarking, structured patch checks, and cheaper eval runs
Code agent evaluation is shifting to long-run maintainability, execution-free patch checks, and leaner, cheaper benchmark runs. A new benchmark, [SWE...
Antigravity v9.2.0 adds local HF evals, HF Jobs vision training, Transformers.js, and better jq/tmux workflows
Antigravity-awesome-skills v9.2.0 ships a big Hugging Face-focused update plus serious shell workflow upgrades. The [release notes](https://github.co...
EVA ships: a realistic benchmark for voice agents, plus SIP pitfalls and long‑doc workflow tradeoffs
ServiceNow-AI released EVA, a realistic end-to-end benchmark for voice agents, while SIP errors and long‑doc model tradeoffs surfaced in field reports...
Local multimodal RAG + tiny fine-tunes: a viable private AI stack
You can now build private, multimodal RAG and fine-tune tiny models that run offline on laptops and phones. A practical guide shows how to build a lo...
Efficiency wave: GPT-5.4 mini lands in ChatGPT, and NVIDIA/Hugging Face ship a real-world SD benchmark
OpenAI is pushing smaller, faster LLMs in ChatGPT while NVIDIA and Hugging Face release a benchmark to measure real speedups from speculative decoding...
AI sped up coding; quality and CI are now the bottleneck
New data shows AI coding boosts throughput, but quality and maintainability lag—so teams must harden CI and measure agent impact over time. Jellyfish...
Nvidia’s “OpenClaw” push blurs robotics, GPU security, and edge AI—teams need an attestation plan
Nvidia is expanding OpenClaw across robotics and GPU security while vendors preinstall it on edge boxes, forcing teams to tighten attestation and hard...
Agentic retrieval steps up: NVIDIA NeMo tops ViDoRe; hybrid search becomes the RAG default
NVIDIA unveiled a generalizable agentic retrieval pipeline that topped ViDoRe v3 and ranked #2 on BRIGHT, pushing hybrid, agentic RAG beyond pure embe...
NVIDIA’s Nemotron 3 Super targets long-context, cost-heavy agent workloads with a hybrid 120B model and open weights
NVIDIA released Nemotron 3 Super, a 120B-parameter, 12B-active hybrid model with open weights aimed at long-context, cost-efficient autonomous agents....
NVIDIA’s AI-Q tops DeepResearch benchmarks, hinting at a full-stack agent push with Nemotron 3 Super
NVIDIA’s AI-Q open agent stack hit #1 on DeepResearch Bench I and II and points to a broader open, enterprise agent strategy. NVIDIA details how its ...
NVIDIA posts 2PB of open datasets on Hugging Face, with recipes to speed up model building
NVIDIA is scaling open AI data by publishing 2 petabytes of permissively licensed datasets and training recipes to cut time-to-first-model. NVIDIA ou...
Coding Benchmarks Shake-up: Qwen 3.5, MiniMax M2.5, and a SWE-bench Reality Check
Open models like Alibaba’s Qwen 3.5 and MiniMax M2.5 post strong coding-agent results, but OpenAI’s audit of SWE-bench Verified shows contamination an...
Practical LLM efficiency: Magma optimizer, Unsloth on HF Jobs, and NVLink realities
A new wave of efficiency wins—masked optimizers, free small‑model fine‑tuning, and faster GPU interconnects—can cut LLM costs without sacrificing qual...
Transformer internals: useful background, limited day-to-day impact
An HN discussion around Jay Alammar’s Illustrated Transformer notes that understanding transformer mechanics is intellectually valuable but rarely req...