2024-10-14 14:13:13 +02:00
< p align = "center" >
2024-10-22 15:29:36 +02:00
< img loading = "lazy" alt = "Docling" src = "assets/docling_processing.png" width = "100%" / >
2024-11-05 13:57:06 +01:00
< a href = "https://trendshift.io/repositories/12132" target = "_blank" > < img src = "https://trendshift.io/api/badge/repositories/12132" alt = "DS4SD%2Fdocling | Trendshift" style = "width: 250px; height: 55px;" width = "250" height = "55" / > < / a >
2024-10-14 14:13:13 +02:00
< / p >
[](https://arxiv.org/abs/2408.09869)
[](https://pypi.org/project/docling/)
2024-11-21 17:23:04 +01:00
[](https://pypi.org/project/docling/)
2024-10-14 14:13:13 +02:00
[](https://python-poetry.org/)
[](https://github.com/psf/black)
[](https://pycqa.github.io/isort/)
[](https://pydantic.dev)
[](https://github.com/pre-commit/pre-commit)
[](https://opensource.org/licenses/MIT)
2024-11-21 17:23:04 +01:00
[](https://pepy.tech/projects/docling)
2024-10-14 14:13:13 +02:00
2025-01-28 13:23:30 +01:00
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
2024-10-14 14:13:13 +02:00
## Features
2025-01-28 13:23:30 +01:00
* 🗂️ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, XLSX, HTML, images, and more
* 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
* 🧬 Unified, expressive [DoclingDocument][docling_document] representation format
* ↪️ Various [export formats][supported_formats] and options, including Markdown, HTML, and lossless JSON
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
* 🔍 Extensive OCR support for scanned PDFs and images
2024-10-16 21:02:03 +02:00
* 💻 Simple and convenient CLI
2024-11-05 08:53:02 +01:00
### Coming soon
* 📝 Metadata extraction, including title, authors, references & language
2025-01-30 09:52:54 +01:00
* 📝 Inclusion of Visual Language Models ([SmolDocling ](https://huggingface.co/blog/smolervlm#smoldocling ))
* 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
* 📝 Complex chemistry understanding (Molecular structures)
2024-11-05 13:57:06 +01:00
2025-01-07 14:15:54 +01:00
## Get started
< div class = "grid" >
< a href = "concepts/" class = "card" > < b > Concepts< / b > < br / > Learn Docling fundamendals< / a >
< a href = "examples/" class = "card" > < b > Examples< / b > < br / > Try out recipes for various use cases, including conversion, RAG, and more< / a >
< a href = "integrations/" class = "card" > < b > Integrations< / b > < br / > Check out integrations with popular frameworks and tools< / a >
< a href = "reference/document_converter/" class = "card" > < b > Reference< / b > < br / > See more API details< / a >
< / div >
2024-11-05 13:57:06 +01:00
## IBM ❤️ Open Source AI
Docling has been brought to you by IBM.
2025-01-28 13:23:30 +01:00
[supported_formats]: ./supported_formats.md
[docling_document]: ./concepts/docling_document.md
[integrations]: ./integrations/index.md