97 Commits

Author SHA1 Message Date
Michele Dolfi
97aa06bfbc
docs: Add details and examples on optimal GPU setup (#2531)
* docs for GPU optimizations

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* improve time reporting and improve execution

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix standard pipeline

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* tune examples with batch size 64

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add benchmark results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* improve docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* typo in excluded tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* explicit pipeline in table

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-30 13:22:05 +01:00
Cesar Berrospi Ramis
9a6fdf936b
docs: update opensearch notebook and backend documentation (#2519)
* docs(opensearch): update the example notebook RAG with OpenSearch

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs(uspto): remove direct usage of the backend class for conversion

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: remove direct usage of backends from documentation

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-10-27 10:02:50 +01:00
Ken Steele
657ce8b01c
feat(ASR): MLX Whisper Support for Apple Silicon (#2366)
* add mlx-whisper support

* added mlx-whisper example and test. update docling cli to use MLX automatically if present.

* fix pre-commit checks and added proper type safety

* fixed linter issue

* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>

I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: a979a680e1dc2fee8461401335cfb5dda8cfdd98
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 9827068382ca946fe1387ed83f747ae509fcf229
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: ebbeb45c7dc266260e1fad6bdb54a7041f8aeed4
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 2f6fd3cf46c8ca0bb98810191578278f1df87aa3

Signed-off-by: Ken Steele <ksteele@gmail.com>

* fix unit tests and code coverage for CI

* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>

I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 5e61bf11139a2133978db2c8d306be6289aed732

Signed-off-by: Ken Steele <ksteele@gmail.com>

* fix CI example test - mlx_whisper_example.py defaults to tests/data/audio/sample_10s.mp3 if no args specified.

Signed-off-by: Ken Steele <ksteele@gmail.com>

* refactor: centralize audio file extensions and MIME types in base_models.py

- Move audio file extensions from CLI hardcoded set to FormatToExtensions[InputFormat.AUDIO]
- Add support for additional audio formats: m4a, aac, ogg, flac, mp4, avi, mov
- Update FormatToMimeType mapping to include MIME types for all audio formats
- Update CLI auto-detection to use centralized FormatToExtensions mapping
- Add comprehensive tests for audio file auto-detection and pipeline selection
- Ensure explicit pipeline choices are not overridden by auto-detection

Fixes issue where only .mp3 and .wav files were processed as audio despite
CLI auto-detection working for all formats. The document converter now
properly recognizes all audio formats through MIME type detection.

Addresses review comments:
- Centralizes audio extensions in base_models.py as suggested
- Maintains existing auto-detection behavior while using centralized data
- Adds proper test coverage for the audio detection functionality

All examples and tests pass with the new centralized approach.
All audio formats (mp3, wav, m4a, aac, ogg, flac, mp4, avi, mov) now work correctly.

Signed-off-by: Ken Steele <ksteele@gmail.com>

* feat: address reviewer feedback - improve CLI auto-detection and add explicit model options

Review feedback addressed:
1. Fix CLI auto-detection to only switch to ASR pipeline when ALL files are audio
   - Previously switched if ANY file was audio, now requires ALL files to be audio
   - Added warning for mixed file types with guidance to use --pipeline asr

2. Add explicit WHISPER_X_MLX and WHISPER_X_NATIVE model options
   - Users can now force specific implementations if desired
   - Auto-selecting models (WHISPER_BASE, etc.) still choose best for hardware
   - Added 12 new explicit model options: _MLX and _NATIVE variants for each size

CLI now supports:
- Auto-selecting: whisper_tiny, whisper_base, etc. (choose best for hardware)
- Explicit MLX: whisper_tiny_mlx, whisper_base_mlx, etc. (force MLX)
- Explicit Native: whisper_tiny_native, whisper_base_native, etc. (force native)

Addresses reviewer comments from @dolfim-ibm

Signed-off-by: Ken Steele <ksteele@gmail.com>

* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>

I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: c60e72d2b504a477797d183790eb74fb4fc9b019
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 94803317a3807451de76996e2509fc58e1ecacb0
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 21905e8acef341e94052c189376b0b45a7bb1fef
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 96c669d155c8e9bd6455ecff4720933ad7d9e7cb
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 8371c060ea85295d05ad040f1d1608b560e0424d

Signed-off-by: Ken Steele <ksteele@gmail.com>

* test(asr): add coverage for MLX options, pipeline helpers, and VLM prompts

- tests/test_asr_mlx_whisper.py: verify explicit MLX options (framework, repo ids)
- tests/test_asr_pipeline.py: cover _has_text/_determine_status and backend support with proper InputDocument/NoOpBackend wiring
- tests/test_interfaces.py: add BaseVlmPageModel.formulate_prompt tests (RAW/NONE/CHAT, invalid style), with minimal InlineVlmOptions scaffold

Improves reliability of ASR and VLM components by validating configuration paths and helper logic.

Signed-off-by: Ken Steele <ksteele@gmail.com>

* test(asr): broaden coverage for model selection, pipeline flows, and VLM prompts

- tests/test_asr_mlx_whisper.py
  - Add MLX/native selector coverage across all Whisper sizes
  - Validate repo_id choices under MLX and Native paths
  - Cover fallback path when MPS unavailable and mlx_whisper missing

- tests/test_asr_pipeline.py
  - Relax silent-audio assertion to accept PARTIAL_SUCCESS or SUCCESS
  - Force CPU native path in helper tests to avoid torch in device selection
  - Add language handling tests for native/MLX transcribe
  - Cover native run success (BytesIO) and failure (exception) branches
  - Cover MLX run success/failure branches with mocked transcribe
  - Add init path coverage with artifacts_path

- tests/test_interfaces.py
  - Add focused VLM prompt tests (NONE/CHAT variants)

Result: all tests passing with significantly improved coverage for ASR model selectors, pipeline execution paths, and VLM prompt formulation.

Signed-off-by: Ken Steele <ksteele@gmail.com>

* simplify ASR model settings (no pipeline detection needed)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* clean up disk space in runners

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Ken Steele <ksteele@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-21 08:05:59 +02:00
Peter W. J. Staar
3e6da2c62d
docs: Example on PII obfuscation (#2459)
* added example on PII obfuscation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatting code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* add in index and fix heading formatting

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add GLINER to PII

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* final commit

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-14 15:39:16 +02:00
Jeremy Chen
90200443bc
docs: Remove deprecated call in custom_convert.py (#2447)
Update custom_convert.py

export_to_document_tokens is deprecated so change it to export_to_doctags

Signed-off-by: Jeremy Chen <github@jeremychen.email>
2025-10-13 09:30:02 +02:00
Utsav Talwar
f2854b2e1d
docs: Add MongoDB + VoyageAI (#2382)
Signed-off-by: Utsav Talwar <114057324+utsavMongoDB@users.noreply.github.com>
Co-authored-by: Utsav Talwar <114057324+utsavMongoDB@users.noreply.github.com>
2025-10-07 14:36:19 -04:00
Utsav Talwar
8a4b946a1a
docs: add RAG example with MongoDB Atlas Vector Search and VoyageAI embeddings (#2341)
* Add MongoDB RAG example

* Update MongoDB RAG Example

* Update MongoDB RAG Example

* Update MongoDB RAG Example

* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com>

I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: fbdbf53aa8f5df3157cdc4b32fc52408994507ae
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 9b3065ba2b533ab3a81aa77ee2737bbbd8248485
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 1983f9db35f97fb5604d170688168631c8d8bbdc
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 0522aa105d4503c84a573969d03c0cb4f705ec01
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: f5a67e8012852221d59b3364ddde33d6b367fda1

Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com>

* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com>

I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: fbdbf53aa8f5df3157cdc4b32fc52408994507ae
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 9b3065ba2b533ab3a81aa77ee2737bbbd8248485
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 1983f9db35f97fb5604d170688168631c8d8bbdc
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 0522aa105d4503c84a573969d03c0cb4f705ec01
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: f5a67e8012852221d59b3364ddde33d6b367fda1

Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com>

* docs: Add example with MongoDB

* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com>

I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: bb245a31ed7ad69999f39e009578c8367ac6d1a1
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 25436e543cf901a2ccf43854c9906af322136dfa

Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com>

* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com>

I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: bb245a31ed7ad69999f39e009578c8367ac6d1a1
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 25436e543cf901a2ccf43854c9906af322136dfa

Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com>

* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com>

I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: bb245a31ed7ad69999f39e009578c8367ac6d1a1
I, utsavMongoDB <utsav.talwar@mongodb.com>, hereby add my Signed-off-by to this commit: 25436e543cf901a2ccf43854c9906af322136dfa

Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com>

---------

Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com>
Signed-off-by: Utsav Talwar <114057324+utsavMongoDB@users.noreply.github.com>
2025-10-03 13:29:43 +02:00
Christoph Auer
1e9dc43b72
feat: Repetition-based StoppingCriteria for GraniteDocling (#2323)
* Experimental code for repetition detection, VLLM Streaming

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update VLLM Streaming

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update VLLM inference code, CLI and VLM specs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix generation and decoder args for HF model

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix vllm device args

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Bugfixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove streaming VLLM for the moment

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add repetition StoppingCriteria for GraniteDocling/SmolDocling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make GenerationStopper base class and port for MLX

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add streaming support and custom GenerationStopper support for ApiVlmModel

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for ApiVlmModel

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for ApiVlmModel

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix api_image_request_streaming when GenerationStopper triggers.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move DocTagsRepetitionStopper to utility unit, update examples

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-09-30 15:26:09 +02:00
Luis
a873200c9d
docs(vlm): Update SmolDocling to GraniteDocling references (#2315)
Update minimal_vlm_pipeline.py

Signed-off-by: Luis <luis.rojas@ibm.com>
2025-09-25 11:07:39 +02:00
Christoph Auer
8b7e83a8c7
docs: Update API VLM example with granite-docling (#2294)
chore: Update API VLM example with granite-docling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-09-19 12:23:53 +02:00
Panos Vagenas
8322c2ea9b
docs: fix examples rendering (#2281)
fix examples rendering

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-09-17 20:50:50 -04:00
Christoph Auer
17afb664d0
feat: Add granite-docling model (#2272)
* adding granite-docling preview

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the model specs

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* typo

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use granite-docling and add to the model downloader

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update docs and README

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update final repo_ids for GraniteDocling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update final repo_ids for GraniteDocling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix model name in CLI usage example

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Fix VLM model name in README.md

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-09-17 15:15:49 +02:00
Mingxuan Zhao
ff351fd40c
docs: Describe examples (#2262)
* Update .py examples with clearer guidance,
update out of date imports and calls

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>

* Fix minimal.py string error, fix ruff format error

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>

* fix more CI issues

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>

---------

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>
2025-09-16 16:00:38 +02:00
Michele Dolfi
2c9123419f
feat: enrichment steps on all convert pipelines (incl docx, html, etc) (#2251)
* allow enrichment on all convert pipelines

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* set options in CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-09-11 15:09:00 +02:00
Cesar Berrospi Ramis
f8cc545bab
docs: add an example of RAG with OpenSearch (#2238)
* docs: add an example of RAG with OpeanSearch

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore: pin latest docling-core and update uv.lock

Pin latest version release of docling-core in pyproject.toml
Update the dependencies in uv.lock file
Run the notebook rag_opensearch.ipynb to pick up changes from docling-core

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-09-10 14:37:22 +02:00
Tamás Bitai
55f5f3752f
docs: Document VLM support requirement in extraction example (#2231)
* docs: Document VLM support requirement in extraction example

* DCO Remediation Commit for Tamás Bitai <bitai.tamas@gmail.com>

I, Tamás Bitai <bitai.tamas@gmail.com>, hereby add my Signed-off-by to this commit: b90defdb77ceb5c0090c72d5bfd6c3fb490e5efb

Signed-off-by: Tamás Bitai <bitai.tamas@gmail.com>

---------

Signed-off-by: Tamás Bitai <bitai.tamas@gmail.com>
2025-09-09 13:45:55 +02:00
Panos Vagenas
a9f41b088e
docs: add information extraction example (#2199)
* docs: add information exctraction example

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update README

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* minor typo

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update README

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-09-05 11:27:09 +02:00
Shikhar Bhardwaj
9f0286bcac
fix: translation example (#2166)
* fix: translation example

Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com>

* Fix translation example formatting

Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com>

---------

Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com>
2025-09-01 11:04:46 +02:00
Panos Vagenas
96cab6b536
docs: enrich landing pages (#2165)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-08-29 17:19:05 +02:00
geoHeil
3f60a0fa78
feat: Upgrade to RapidOCR 3.x (#2088)
* feat: exploring new version

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* chore: autoformat

DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775

* feat: enable configurable runtime for rapidocr and handle new result better;

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* chore: fix linter

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* chore: use new server model

* chore:  change default engine type to onnx

* chore: tests update for new rapidocr

* fix: rebase from main and fix clashes

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 02f9db85f562e5cdfda40c52fee55cfd4030d70a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a7bcb205faedb881f94a89b3bbd29cb31ccd54f0
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a39482a98cbcff7a825c8321134732af0c65930a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 63e9d717fa26951566b02761f3fdfc752c31f805
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ef12a6ec1ea2846a8a8e2e776eeaa59c2a0c4dfe

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 2222d2340387f8d9d66f3ca9d8e21a0945a44e7a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: bc6a1dc507d7f146ec4797a2d3840414f46ac64d
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 56e0d67da7c57d4b5caf8eaef8dff7056c3efd32
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 871ca21271412006c76acf3c19426140efed3d50
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 7b1b77159da729d483a581a86c7309acba1712a7
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a792a714a43e19a91b2b782f54621c1c5efda632

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: d1fed26323ff829b716bc667fe69532839363e45
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 346ec1cad943765f886e5d17fb0a54221124689c
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 4d0bbe5bd6e9f7261b97362ff8823af244267089
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 34a5ad53892a7064a6bf35f890d344d464c78b2f
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 9151959db3ad53535011d1cfdcf9181fdf936bb1
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 8ef5536f2c098826c6c0a05190f8a80614c3f3cb

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 7e18637a35c6786c90bc41b40607404f4b084b45
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 63fb8ff599035186aba2d958fbaec32739e92d01
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 0cb9444fb89b978e456dcf607815d7a8416c1ffa
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 38940d9978c5c18bd7fbffb8170f1b1a90680b94
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: b6d461ac427ebc8b814a7e1d0a452a4ac8a374af
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ee55eb3408ed5decb5324ec441e166e180512cf4

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

---------

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
2025-08-25 12:10:33 +02:00
Maroun Touma
e76298c40d
docs: DPK pipeline example using docling library (#2112)
* Notebook showing example on how to use docling transforms in DPK

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* fix HF Token name

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* use %pip instead of pip install jupyter lab

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* run formatter

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* add example to mkdocs and fix typo

Signed-off-by: Maroun Touma <touma@us.ibm.com>

---------

Signed-off-by: Maroun Touma <touma@us.ibm.com>
2025-08-21 10:14:36 +02:00
Shkarupa Alex
5f050f94e1
feat(vlm): Ability to preprocess VLM response (#1907)
* Add ability to preprocess VLM response

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* Move response decoding to vlm options (requires inheritance to override). Per-page prompt formulation also moved to vlm options to keep api consistent.

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

---------

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
2025-08-12 15:20:24 +02:00
Michele Dolfi
90a7cc4bdd
docs: enrich existing DoclingDocument (#1969)
add example for enriching an existing doclingdocument

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-07-22 16:20:15 +02:00
Copilot
c5fb353f10
fix: Change granite vision model URL from preview to stable version (#1925)
* Initial plan

* Fix granite vision model URL from preview to stable version

Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com>

* Update to granite vision 3.3

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Update to granite vision 3.3 (2)

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

---------

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com>
2025-07-11 08:46:03 +02:00
geoHeil
a07ba863c4
feat: add image-text-to-text models in transformers (#1772)
* feat(dolphin): add dolphin support

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* rename

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* reformat

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* fix mypy

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* add prompt style and examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-07-08 05:54:57 +02:00
Shkarupa Alex
b8813eea80
feat(vlm): Dynamic prompts (#1808)
* Unify temperature options for Vlm models

* Dynamic prompt support with example

* DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com>

I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: 34d446cb9829835cf6b8f8fdb4abd9fef3455c3a
I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: 9c595d574fce5e3e139f5af780f8223496735ff1

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* Replace Page with SegmentedPage

* Fix example HF repo link 

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Sign-off

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com>

I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: 1a162066dd3e4ee240d272d9d503d549a0856590

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* Use lmstudio-community model

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Swap inference engine to LM Studio

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

---------

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
2025-07-07 16:58:42 +02:00
Peter W. J. Staar
f3ae3029b8
docs: update readme and add ASR example (#1836)
* updated the README

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added minimal_asr_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Updated README and added ASR example

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Updated docs.index.md

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated CI and mkdocs

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added link tp existing audio file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added link tp existing audio file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatting

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-06-23 18:55:16 +02:00
Michele Dolfi
64ac043786
docs: support running examples from root or subfolder (#1816)
support running examples from root or subfolder

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-19 11:10:40 +02:00
Michele Dolfi
0432a31b2f
docs: update vlm models api examples with LM Studio (#1759)
update vlm models api examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-12 12:58:44 +02:00
Peter W. J. Staar
cfdf4cea25
feat: new vlm-models support (#1570)
* feat: adding new vlm-models support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* got microsoft/Phi-4-multimodal-instruct to work

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on vlm's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the VLM part

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all working, now serious refacgtoring necessary

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the download_model

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the formulate_prompt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* pixtral 12b runs via MLX and native transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the VlmPredictionToken

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring minimal_vlm_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the MyPy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added pipeline_model_specializations file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to get Phi4 working again ...

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finalising last points for vlms support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the pipeline for Phi4

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* streamlining all code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixing the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the html backend to the VLM pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the static load_from_doctags

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restore stable imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use AutoModelForVision2Seq for Pixtral and review example (including rename)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused value

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* refactor instances of VLM models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip compare example in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use lowercase and uppercase only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename pipeline_vlm_model_spec

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move more argument to options and simplify model init

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add supported_devices

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove not-needed function

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* exclude minimal_vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add message for transformers version

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename to specs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use module import and remove MLX from non-darwin

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove hf_vlm_model and add extra_generation_args

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use single HF VLM model class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove torch type

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for vision models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-02 17:01:06 +02:00
Panos Vagenas
7c4c356e76
chore: fix chunking example data link (#1596)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-05-16 08:44:47 +02:00
Panos Vagenas
9f28abf061
docs: add advanced chunking & serialization example (#1589)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-05-14 14:35:07 +02:00
Panos Vagenas
3220a592e7
docs: add serialization docs, update chunking docs (#1556)
* docs: add serializers docs, update chunking docs

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update notebook to improve MD table rendering

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-05-08 21:43:01 +02:00
nkh0472
a097ccd8d5
chore: typo fix (#1465)
* typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

---------

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>
2025-04-28 08:52:09 +02:00
Ryan Lin
a2fbbba9f7
feat: add tutorial using Milvus and Docling for RAG pipeline (#1449)
* feat: add milvus rag with docling tutorial

Signed-off-by: Ryan Lin <linjinhong@yandex.com>

* chore: run pre-commit

Signed-off-by: Ryan Lin <linjinhong@yandex.com>

* feat: add RAG with Milvus example to mkdocs

Signed-off-by: Ryan Lin <linjinhong@yandex.com>

---------

Signed-off-by: Ryan Lin <linjinhong@yandex.com>
2025-04-25 09:12:35 +02:00
nkh0472
c2470ed216
docs: Fix wrong output format in example code (#1427)
fix: wrong output format

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>
2025-04-22 12:32:55 +02:00
Michele Dolfi
5458a88464
ci: add coverage and ruff (#1383)
* add coverage calculation and push

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* new codecov version and usage of token

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* enable ruff formatter instead of black and isort

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply ruff lint fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply ruff unsafe fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add removed imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* runs 1 on linter issues

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* finalize linter fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update pyproject.toml

Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-04-14 18:01:26 +02:00
Peter W. J. Staar
c0ba88edf1
feat(cli): add option for html with split-page mode (#1355)
* updated the cli to output html in split-page mode

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* add pin for new docling-core with html split argument

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* relock with fixed html export in docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update test results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update more tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update lock with docling-core fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update test results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add again chunking extras

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-14 08:41:50 +02:00
Gabe Goodhart
c605edd8e9
feat: OllamaVlmModel for Granite Vision 3.2 (#1337)
* build: Add ollama sdk dependency

Branch: OllamaVlmModel

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Add option plumbing for OllamaVlmOptions in pipeline_options

Branch: OllamaVlmModel

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Full implementation of OllamaVlmModel

Branch: OllamaVlmModel

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Connect "granite_vision_ollama" pipeline option to CLI

Branch: OllamaVlmModel

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* Revert "build: Add ollama sdk dependency"

After consideration, we're going to use the generic OpenAI API instead
of the Ollama-specific API to avoid duplicate work.

This reverts commit bc6b366468cdd66b52540aac9c7d8b584ab48ad0.

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Move OpenAI API call logic into utils.utils

This will allow reuse of this logic in a generic VLM model

NOTE: There is a subtle change here in the ordering of the text prompt and
the image in the call to the OpenAI API. When run against Ollama, this
ordering makes a big difference. If the prompt comes before the image, the
result is terse and not usable whereas the prompt coming after the image
works as expected and matches the non-OpenAI chat API.

Branch: OllamaVlmModel

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Refactor from Ollama SDK to generic OpenAI API

Branch: OllamaVlmModel

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Linting, formatting, and bug fixes

The one bug fix was in the timeout arg to openai_image_request. Otherwise,
this is all style changes to get MyPy and black passing cleanly.

Branch: OllamaVlmModel

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* remove model from download enum

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* generalize input args for other API providers

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename and refactor

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* require flag for remote services

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* disable example from CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add examples to docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-10 18:03:04 +02:00
Panos Vagenas
71148eb381
docs: add visual grounding example (#1270)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-04-02 14:03:19 +02:00
Clément Doumouro
0974ba4e1c
docs(examples): batch conversion doc raises_on_error (#1147)
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
2025-03-25 11:14:39 +01:00
Maxim Lysak
1c26769785
feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199)
* Initial implementation to support MLX for VLM pipeline and SmolDocling

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* mlx_model unit

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Add CLI choices for VLM pipeline and model

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Initial implementation to support MLX for VLM pipeline and SmolDocling

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* mlx_model unit

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Add CLI choices for VLM pipeline and model

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updated minimal vlm pipeline example

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* make vlm_pipeline python3.9 compatible

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fixed extract_text_from_backend definition

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated README

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated example

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated documentation

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* corrections in the documentation

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Consmetic changes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-03-19 15:38:54 +01:00
Christoph Auer
3960b199d6
feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905)
* Add DoclingParseV3 backend implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Use docling-core with docling-parse types

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes and test updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix streams

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix streams

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reset tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* update test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* update test units

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add back DoclingParse v1 backend, pipeline options

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update locks

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: update docling-core to 2.22.0

Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* Ground-truth files updated

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests, use TextCell.from_ocr property

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Text fixes, new test data

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Rename docling backend to v4

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Test all backends, fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reset all tests to use docling-parse v1 for now

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for DPv4 backend init, better test coverage

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* test_input_doc use default backend

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-03-18 10:38:19 +01:00
Michele Dolfi
fa16b12316
chore: move to docling-project org (#1160)
* chore: rename org

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update docs/faq/index.md

Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

* update github pages

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* revert test content

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-03-14 12:35:29 +01:00
Michele Dolfi
357d41cc47
docs: Enrichment models (#1097)
* warning for develop examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for enrichment models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* minor reorg of top-level docs (#1098)

* minor reorg of top-level docs

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* fix typo [no ci]

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* trigger ci

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-03-04 14:24:38 +01:00
Panos Vagenas
db3ceefd4a
docs: improve docs on token limit warning triggered by HybridChunker (#1077)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-02-28 14:54:46 +01:00
Christoph Auer
3c9fe76b70
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* wip smolDocling inference and vlm pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fixes to preserve page image and demo export to html

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Enabled figure support in vlm_pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fix for table span compute in vlm_pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added tokens/sec measurement, improved example

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added capability for vlm_pipeline to grab text from preconfigured backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Exposed "force_backend_text" as pipeline parameter

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Flipped keep_backend to True for vlm_pipeline assembly to work

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated vlm pipeline assembly and smol docling model code to support updated doctags

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fixing doctags starting tag, that broke elements on first line during assembly

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated example of Smol Docling usage

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Update minimal smoldocling example

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix repo id

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleaned up unnecessary logging

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* More elegant solution in removing the input prompt

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removed minimal_smol_docling example from CI checks

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Removed special html code wrapping when exporting to docling document, cleaned up comments

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Moved keep_backend = True to vlm pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added example on how to get original predicted doctags in minimal_smol_docling

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removing changes from base_pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Replaced remaining strings to appropriate enums

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated poetry.lock

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* re-built poetry.lock

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Generalize and refactor VLM pipeline and models

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Rename example

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move imports

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Expose control over using flash_attention_2

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix VLM example exclusion in CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add back device_map and accelerate

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make drawing code resilient against bad bboxes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: clean up code and comments

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: more cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: fix leftover .to(device)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: add proper table provenance

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
Christoph Auer
c93e36988f
feat: Implement new reading-order model (#916)
* Implement new reading-order model, replacing DS GLM model (WIP)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update reading-order model branch

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update lockfile [skip ci]

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add captions, footnotes and merges [skip ci]

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updates for reading-order implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updates for reading-order implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests and lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes, update tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add normalization, update tests again

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests with code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Push final lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* sanitize text

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Inlcude furniture, Update tests with furniture

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix content_layer assignment

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: Delete empty file docling/models/ds_glm_model.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-20 17:51:17 +01:00
Panos Vagenas
27c04007bc
docs: revamp picture description example (#1015)
* docs: revamp picture description example

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* Improvements for visualization example (#1017)

* fix colab install, use granite and improve viz of description

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* switch docs to notbook

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* show results with all models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* show other vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-02-19 11:28:54 +01:00
Ahmed Nassar
77eb77bdc2
feat: Support cuda:n GPU device allocation (#694)
* Adding multi-gpu support, and cuda device allocation

Signed-off-by: ahn <ahn@zurich.ibm.com>

* Fixes pydantic exception with cuda:n
Signed-off-by: ahn <ahn@zurich.ibm.com>

* Pydantic field validator and comment restored.

Signed-off-by: ahn <ahn@zurich.ibm.com>

* chore: Accept AcceleratorDevice enum type

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Resetted some options to default, removed EasyOCR model wrap.
Signed-off-by: ahn <ahn@zurich.ibm.com>

* Fixed rebased issues
Signed-off-by: ahn <ahn@zurich.ibm.com>

* Revert accelerator test options
Signed-off-by: ahn <ahn@zurich.ibm.com>

---------

Signed-off-by: ahn <ahn@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: ahn <ahn@sonny.zuvela.ibm.com>
Co-authored-by: ahn <ahn@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-02-17 11:31:13 +01:00