pymupdf4llm-c
Repopymupdf4llm-c is an open-source C-based PDF extraction library with Python bindings that outputs rich, structured JSON data including geometry and typography. It is designed for high-volume RAG and LLM pipelines and claims processing speeds of roughly 300 pages per second on CPU.
article
1 story
calendar_today
First: 2026-01-06
update
Last: 2026-01-06
Stories
Completed digest stories linked to this service.
-
Structured PDF extractor for RAG claims ~300 pages/s on CPU2026-01-06A new C-based PDF extractor with Python bindings outputs structured JSON (geometry, typography, headings) and ...
Resources
Links to check for updates: homepage, feed, or git repo.