2025-09-16 10:00:38 -04:00
|
|
|
# %% [markdown]
|
|
|
|
|
# Minimal VLM pipeline example: convert a PDF using a vision-language model.
|
|
|
|
|
#
|
|
|
|
|
# What this example does
|
|
|
|
|
# - Runs the VLM-powered pipeline on a PDF (by URL) and prints Markdown output.
|
|
|
|
|
# - Shows two setups: default (Transformers/SmolDocling) and macOS MPS/MLX.
|
|
|
|
|
#
|
|
|
|
|
# Prerequisites
|
|
|
|
|
# - Install Docling with VLM extras and the appropriate backend (Transformers or MLX).
|
|
|
|
|
# - Ensure your environment can download model weights (e.g., from Hugging Face).
|
|
|
|
|
#
|
|
|
|
|
# How to run
|
|
|
|
|
# - From the repository root, run: `python docs/examples/minimal_vlm_pipeline.py`.
|
|
|
|
|
# - The script prints the converted Markdown to stdout.
|
|
|
|
|
#
|
|
|
|
|
# Notes
|
|
|
|
|
# - `source` may be a local path or a URL to a PDF.
|
|
|
|
|
# - The second section demonstrates macOS MPS acceleration via MLX (`vlm_model_specs.SMOLDOCLING_MLX`).
|
|
|
|
|
# - For more configurations and model comparisons, see `docs/examples/compare_vlm_models.py`.
|
|
|
|
|
|
|
|
|
|
# %%
|
|
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
from docling.datamodel import vlm_model_specs
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
from docling.datamodel.base_models import InputFormat
|
|
|
|
|
from docling.datamodel.pipeline_options import (
|
|
|
|
|
VlmPipelineOptions,
|
|
|
|
|
)
|
|
|
|
|
from docling.document_converter import DocumentConverter, PdfFormatOption
|
|
|
|
|
from docling.pipeline.vlm_pipeline import VlmPipeline
|
|
|
|
|
|
2025-09-16 10:00:38 -04:00
|
|
|
# Convert a public arXiv PDF; replace with a local path if preferred.
|
2025-06-02 17:01:06 +02:00
|
|
|
source = "https://arxiv.org/pdf/2501.17887"
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
###### USING SIMPLE DEFAULT VALUES
|
|
|
|
|
# - SmolDocling model
|
|
|
|
|
# - Using the transformers framework
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
|
|
|
|
|
converter = DocumentConverter(
|
|
|
|
|
format_options={
|
|
|
|
|
InputFormat.PDF: PdfFormatOption(
|
|
|
|
|
pipeline_cls=VlmPipeline,
|
|
|
|
|
),
|
|
|
|
|
}
|
|
|
|
|
)
|
|
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
doc = converter.convert(source=source).document
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
print(doc.export_to_markdown())
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
|
|
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
###### USING MACOS MPS ACCELERATOR
|
2025-09-16 10:00:38 -04:00
|
|
|
# Demonstrates using MLX on macOS with MPS acceleration (macOS only).
|
|
|
|
|
# For more options see the `compare_vlm_models.py` example.
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
pipeline_options = VlmPipelineOptions(
|
|
|
|
|
vlm_options=vlm_model_specs.SMOLDOCLING_MLX,
|
|
|
|
|
)
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
converter = DocumentConverter(
|
|
|
|
|
format_options={
|
|
|
|
|
InputFormat.PDF: PdfFormatOption(
|
|
|
|
|
pipeline_cls=VlmPipeline,
|
|
|
|
|
pipeline_options=pipeline_options,
|
|
|
|
|
),
|
|
|
|
|
}
|
|
|
|
|
)
|
2025-03-19 15:38:54 +01:00
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
doc = converter.convert(source=source).document
|
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
|
|
|
|
2025-06-02 17:01:06 +02:00
|
|
|
print(doc.export_to_markdown())
|