opendatalab

opendatalab / MinerU

#9
71,5566,012+380 todayPython

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

📊 Project Info

Language
Python
Stars
71,556
Forks
6,012
Today
+380
Ranking
#9
Collection
Overall
Trending Date
June 28, 2026
Last Push
6/27/2026

🏷️ Topics

ai4sciencedocument-analysisdocxextract-datalayout-analysisocrparserpdfpdf-converterpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragpdf-parserpptxpythonxlsx

📸 Screenshots

MinerU screenshot 1