docling/docs/usage/enrichments.md

Docling allows to enrich the conversion pipeline with additional steps which process specific document components,
e.g. code blocks, pictures, etc. The extra steps usually require extra models executions which may increase
the processing time consistently. For this reason most enrichment models are disabled by default.

The following table provides an overview of the default enrichment models available in Docling.

| Feature | Parameter | Processed item | Description |
| ------- | --------- | ---------------| ----------- |
| Code understanding | `do_code_enrichment` | `CodeItem` | See [docs below](#code-understanding). |
| Formula understanding | `do_formula_enrichment` | `TextItem` with label `FORMULA` | See [docs below](#formula-understanding). |
| Picture classification | `do_picture_classification` | `PictureItem` | See [docs below](#picture-classification). |
| Picture description | `do_picture_description` | `PictureItem` | See [docs below](#picture-description). |


## Enrichments details

### Code understanding

The code understanding step allows to use advance parsing for code blocks found in the document.
This enrichment model also set the `code_language` property of the `CodeItem`.

Model specs: see the [`CodeFormula` model card](https://huggingface.co/ds4sd/CodeFormula).

Example command line:

```sh
docling --enrich-code FILE
```

Example code:

```py
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_code_enrichment = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("https://arxiv.org/pdf/2501.17887")
doc = result.document
```

### Formula understanding

The formula understanding step will analize the equation formulas in documents and extract their LaTeX representation.
The HTML export functions in the DoclingDocument will leverage the formula and visualize the result using the mathml html syntax.

Model specs: see the [`CodeFormula` model card](https://huggingface.co/ds4sd/CodeFormula).

Example command line:

```sh
docling --enrich-formula FILE
```

Example code:

```py
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_formula_enrichment = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("https://arxiv.org/pdf/2501.17887")
doc = result.document
```

### Picture classification

The picture classification step classifies the `PictureItem` elements in the document with the `DocumentFigureClassifier` model.
This model is specialized to understand the classes of pictures found in documents, e.g. different chart types, flow diagrams,
logos, signatures, etc.

Model specs: see the [`DocumentFigureClassifier` model card](https://huggingface.co/ds4sd/DocumentFigureClassifier).

Example command line:

```sh
docling --enrich-picture-classes FILE
```

Example code:

```py
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.generate_picture_images = True
pipeline_options.images_scale = 2
pipeline_options.do_picture_classification = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("https://arxiv.org/pdf/2501.17887")
doc = result.document
```


### Picture description

The picture description step allows to annotate a picture with a vision model. This is also known as a "captioning" task.
The Docling pipeline allows to load and run models completely locally as well as connecting to remote API which support the chat template.
Below follow a few examples on how to use some common vision model and remote services.


```py
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("https://arxiv.org/pdf/2501.17887")
doc = result.document

```

#### Granite Vision model

Model specs: see the [`ibm-granite/granite-vision-3.1-2b-preview` model card](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview).

Usage in Docling:

```py
from docling.datamodel.pipeline_options import granite_picture_description

pipeline_options.picture_description_options = granite_picture_description
```

#### SmolVLM model

Model specs: see the [`HuggingFaceTB/SmolVLM-256M-Instruct` model card](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct).

Usage in Docling:

```py
from docling.datamodel.pipeline_options import smolvlm_picture_description

pipeline_options.picture_description_options = smolvlm_picture_description
```

#### Other vision models

The option class `PictureDescriptionVlmOptions` allows to use any another model from the Hugging Face Hub.

```py
from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions

pipeline_options.picture_description_options = PictureDescriptionVlmOptions(
    repo_id="",  # <-- add here the Hugging Face repo_id of your favorite VLM
    prompt="Describe the image in three sentences. Be consise and accurate.",
)
```

#### Remote vision model

The option class `PictureDescriptionApiOptions` allows to use models hosted on remote platforms, e.g.
on local endpoints served by [VLLM](https://docs.vllm.ai), [Ollama](https://ollama.com/) and others,
or cloud providers like [IBM watsonx.ai](https://www.ibm.com/products/watsonx-ai), etc.

_Note: in most cases this option will send your data to the remote service provider._

Usage in Docling:

```py
from docling.datamodel.pipeline_options import PictureDescriptionApiOptions

# Enable connections to remote services
pipeline_options.enable_remote_services=True  # <-- this is required!

# Example using a model running locally, e.g. via VLLM
# $ vllm serve MODEL_NAME
pipeline_options.picture_description_options = PictureDescriptionApiOptions(
    url="http://localhost:8000/v1/chat/completions",
    params=dict(
        model="MODEL NAME",
        seed=42,
        max_completion_tokens=200,
    ),
    prompt="Describe the image in three sentences. Be consise and accurate.",
    timeout=90,
)
```

End-to-end code snippets for cloud providers are available in the examples section:

- [IBM watsonx.ai](../examples/pictures_description_api.py)


## Develop new enrichment models

Beside looking at the implementation of all the models listed above, the Docling documentation has a few examples
dedicated to the implementation of enrichment models.

- [Develop picture enrichment](../examples/develop_picture_enrichment.py)
- [Develop formula enrichment](../examples/develop_formula_understanding.py)
docs: Enrichment models (#1097) * warning for develop examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docs for enrichment models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * minor reorg of top-level docs (#1098) * minor reorg of top-level docs Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * fix typo [no ci] Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * trigger ci Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> 2025-03-04 14:24:38 +01:00			`Docling allows to enrich the conversion pipeline with additional steps which process specific document components,`
			`e.g. code blocks, pictures, etc. The extra steps usually require extra models executions which may increase`
			`the processing time consistently. For this reason most enrichment models are disabled by default.`

			`The following table provides an overview of the default enrichment models available in Docling.`

			`\| Feature \| Parameter \| Processed item \| Description \|`
			`\| ------- \| --------- \| ---------------\| ----------- \|`
			\| Code understanding \| `do_code_enrichment` \| `CodeItem` \| See [docs below](#code-understanding). \|
			\| Formula understanding \| `do_formula_enrichment` \| `TextItem` with label `FORMULA` \| See [docs below](#formula-understanding). \|
docs: fix spelling of picture in usage (#1165) Signed-off-by: serced <52759935+serced@users.noreply.github.com> 2025-03-17 09:33:51 +01:00			\| Picture classification \| `do_picture_classification` \| `PictureItem` \| See [docs below](#picture-classification). \|
docs: Enrichment models (#1097) * warning for develop examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docs for enrichment models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * minor reorg of top-level docs (#1098) * minor reorg of top-level docs Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * fix typo [no ci] Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * trigger ci Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> 2025-03-04 14:24:38 +01:00			\| Picture description \| `do_picture_description` \| `PictureItem` \| See [docs below](#picture-description). \|


			`## Enrichments details`

			`### Code understanding`

			`The code understanding step allows to use advance parsing for code blocks found in the document.`
			This enrichment model also set the `code_language` property of the `CodeItem`.

			Model specs: see the [`CodeFormula` model card](https://huggingface.co/ds4sd/CodeFormula).

			`Example command line:`

			```sh
			`docling --enrich-code FILE`
			```

			`Example code:`

			```py
			`from docling.document_converter import DocumentConverter, PdfFormatOption`
			`from docling.datamodel.pipeline_options import PdfPipelineOptions`
			`from docling.datamodel.base_models import InputFormat`

			`pipeline_options = PdfPipelineOptions()`
			`pipeline_options.do_code_enrichment = True`

			`converter = DocumentConverter(format_options={`
			`InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)`
			`})`

			`result = converter.convert("https://arxiv.org/pdf/2501.17887")`
			`doc = result.document`
			```

			`### Formula understanding`

			`The formula understanding step will analize the equation formulas in documents and extract their LaTeX representation.`
			`The HTML export functions in the DoclingDocument will leverage the formula and visualize the result using the mathml html syntax.`

			Model specs: see the [`CodeFormula` model card](https://huggingface.co/ds4sd/CodeFormula).

			`Example command line:`

			```sh
			`docling --enrich-formula FILE`
			```

			`Example code:`

			```py
			`from docling.document_converter import DocumentConverter, PdfFormatOption`
			`from docling.datamodel.pipeline_options import PdfPipelineOptions`
			`from docling.datamodel.base_models import InputFormat`

			`pipeline_options = PdfPipelineOptions()`
			`pipeline_options.do_formula_enrichment = True`

			`converter = DocumentConverter(format_options={`
			`InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)`
			`})`

			`result = converter.convert("https://arxiv.org/pdf/2501.17887")`
			`doc = result.document`
			```

			`### Picture classification`

			The picture classification step classifies the `PictureItem` elements in the document with the `DocumentFigureClassifier` model.
			`This model is specialized to understand the classes of pictures found in documents, e.g. different chart types, flow diagrams,`
			`logos, signatures, etc.`

			Model specs: see the [`DocumentFigureClassifier` model card](https://huggingface.co/ds4sd/DocumentFigureClassifier).

			`Example command line:`

			```sh
			`docling --enrich-picture-classes FILE`
			```

			`Example code:`

			```py
			`from docling.document_converter import DocumentConverter, PdfFormatOption`
			`from docling.datamodel.pipeline_options import PdfPipelineOptions`
			`from docling.datamodel.base_models import InputFormat`

			`pipeline_options = PdfPipelineOptions()`
			`pipeline_options.generate_picture_images = True`
			`pipeline_options.images_scale = 2`
			`pipeline_options.do_picture_classification = True`

			`converter = DocumentConverter(format_options={`
			`InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)`
			`})`

			`result = converter.convert("https://arxiv.org/pdf/2501.17887")`
			`doc = result.document`
			```


			`### Picture description`

			`The picture description step allows to annotate a picture with a vision model. This is also known as a "captioning" task.`
			`The Docling pipeline allows to load and run models completely locally as well as connecting to remote API which support the chat template.`
			`Below follow a few examples on how to use some common vision model and remote services.`


			```py
			`from docling.document_converter import DocumentConverter, PdfFormatOption`
			`from docling.datamodel.pipeline_options import PdfPipelineOptions`
			`from docling.datamodel.base_models import InputFormat`

			`pipeline_options = PdfPipelineOptions()`
			`pipeline_options.do_picture_description = True`

			`converter = DocumentConverter(format_options={`
			`InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)`
			`})`

			`result = converter.convert("https://arxiv.org/pdf/2501.17887")`
			`doc = result.document`

			```

			`#### Granite Vision model`

			Model specs: see the [`ibm-granite/granite-vision-3.1-2b-preview` model card](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview).

			`Usage in Docling:`

			```py
			`from docling.datamodel.pipeline_options import granite_picture_description`

			`pipeline_options.picture_description_options = granite_picture_description`
			```

			`#### SmolVLM model`

			Model specs: see the [`HuggingFaceTB/SmolVLM-256M-Instruct` model card](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct).

			`Usage in Docling:`

			```py
			`from docling.datamodel.pipeline_options import smolvlm_picture_description`

			`pipeline_options.picture_description_options = smolvlm_picture_description`
			```

			`#### Other vision models`

			The option class `PictureDescriptionVlmOptions` allows to use any another model from the Hugging Face Hub.

			```py
			`from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions`

			`pipeline_options.picture_description_options = PictureDescriptionVlmOptions(`
			`repo_id="", # <-- add here the Hugging Face repo_id of your favorite VLM`
			`prompt="Describe the image in three sentences. Be consise and accurate.",`
			`)`
			```

			`#### Remote vision model`

			The option class `PictureDescriptionApiOptions` allows to use models hosted on remote platforms, e.g.
			`on local endpoints served by [VLLM](https://docs.vllm.ai), [Ollama](https://ollama.com/) and others,`
			`or cloud providers like [IBM watsonx.ai](https://www.ibm.com/products/watsonx-ai), etc.`

			`_Note: in most cases this option will send your data to the remote service provider._`

			`Usage in Docling:`

			```py
			`from docling.datamodel.pipeline_options import PictureDescriptionApiOptions`

			`# Enable connections to remote services`
			`pipeline_options.enable_remote_services=True # <-- this is required!`

			`# Example using a model running locally, e.g. via VLLM`
			`# $ vllm serve MODEL_NAME`
			`pipeline_options.picture_description_options = PictureDescriptionApiOptions(`
			`url="http://localhost:8000/v1/chat/completions",`
			`params=dict(`
			`model="MODEL NAME",`
			`seed=42,`
			`max_completion_tokens=200,`
			`),`
			`prompt="Describe the image in three sentences. Be consise and accurate.",`
			`timeout=90,`
			`)`
			```

			`End-to-end code snippets for cloud providers are available in the examples section:`

			`- [IBM watsonx.ai](../examples/pictures_description_api.py)`


			`## Develop new enrichment models`

			`Beside looking at the implementation of all the models listed above, the Docling documentation has a few examples`
			`dedicated to the implementation of enrichment models.`

			`- [Develop picture enrichment](../examples/develop_picture_enrichment.py)`
			`- [Develop formula enrichment](../examples/develop_formula_understanding.py)`