opendatalab

opendatalab / MinerU

#13
69,5405,883+644 todayPython

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

📊 Project Info

Language
Python
Stars
69,540
Forks
5,883
Today
+644
Ranking
#13
Collection
Overall
Trending Date
June 25, 2026
Last Push
6/25/2026

🏷️ Topics

ai4sciencedocument-analysisdocxextract-datalayout-analysisocrparserpdfpdf-converterpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragpdf-parserpptxpythonxlsx

📸 Screenshots

MinerU screenshot 1