Docling DS4SD%2Fdocling | Trendshift

[![arXiv](https://img.shields.io/badge/arXiv-2408.09869-b31b1b.svg)](https://arxiv.org/abs/2408.09869) [![PyPI version](https://img.shields.io/pypi/v/docling)](https://pypi.org/project/docling/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/docling)](https://pypi.org/project/docling/) [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit) [![License MIT](https://img.shields.io/github/license/docling-project/docling)](https://opensource.org/licenses/MIT) [![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling) [![OpenSSF Best Practices](https://www.bestpractices.dev/projects/10101/badge)](https://www.bestpractices.dev/projects/10101) [![LF AI & Data](https://img.shields.io/badge/LF%20AI%20%26%20Data-003778?logo=linuxfoundation&logoColor=fff&color=0094ff&labelColor=003778)](https://lfaidata.foundation/projects/) Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem. ## Features * 🗂️ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, XLSX, HTML, images, and more * 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more * 🧬 Unified, expressive [DoclingDocument][docling_document] representation format * ↪️ Various [export formats][supported_formats] and options, including Markdown, HTML, and lossless JSON * 🔒 Local execution capabilities for sensitive data and air-gapped environments * 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI * 🔍 Extensive OCR support for scanned PDFs and images * 🥚 Support of Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview)) 🆕🔥 * 💻 Simple and convenient CLI ### Coming soon * 📝 Metadata extraction, including title, authors, references & language * 📝 Chart understanding (Barchart, Piechart, LinePlot, etc) * 📝 Complex chemistry understanding (Molecular structures) ## Get started
Concepts
Learn Docling fundamendals
Examples
Try out recipes for various use cases, including conversion, RAG, and more
Integrations
Check out integrations with popular frameworks and tools
Reference
See more API details
## LF AI & Data Docling is hosted as a project in the [LF AI & Data Foundation](https://lfaidata.foundation/projects/). ### IBM ❤️ Open Source AI The project was started by the AI for knowledge team at IBM Research Zurich. [supported_formats]: ./usage/supported_formats.md [docling_document]: ./concepts/docling_document.md [integrations]: ./integrations/index.md