opendatalab

opendatalab / MinerU

#7
70,4035,939+960 todayPython

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

📊 Project Info

Language
Python
Stars
70,403
Forks
5,939
Today
+960
Ranking
#7
Collection
Overall
Trending Date
June 26, 2026
Last Push
6/26/2026

🏷️ Topics

ai4sciencedocument-analysisdocxextract-datalayout-analysisocrparserpdfpdf-converterpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragpdf-parserpptxpythonxlsx

📸 Screenshots

MinerU screenshot 1