opendatalab

opendatalab / MinerU

#13
65,6155,534+150 todayPython

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

📊 Project Info

Language
Python
Stars
65,615
Forks
5,534
Today
+150
Ranking
#13
Collection
Language
Trending Date
May 29, 2026
Last Push
5/28/2026

🏷️ Topics

ai4sciencedocument-analysisdocxextract-datalayout-analysisocrparserpdfpdf-converterpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragpdf-parserpptxpythonxlsx

📸 Screenshots

MinerU screenshot 1