docling

mirror of https://github.com/docling-project/docling.git synced 2025-06-27 05:20:05 +00:00

Author	SHA1	Message	Date
Michele Dolfi	c18f47c5c0	fix: remove unused httpx (#919 ) * remove unused httpx Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use requests instead of httpx Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove more usage of httpx Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-07 17:51:31 +01:00
Michele Dolfi	4cc6e3ea5e	feat: Describe pictures using vision models (#259 ) * draft for picture description models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * vlm description using AutoModelForVision2Seq Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add generation options Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update vlm API Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * allow only localhost traffic Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename model Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * do not run with vlm api Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * more renaming Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix examples path Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * apply CLI download login Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix name of cli argument Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use with_smolvlm in models download Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-07 16:30:42 +01:00
github-actions[bot]	fba3cf9be7	chore: bump version to 2.19.0 [skip ci] v2.19.0	2025-02-07 13:36:54 +00:00
Michele Dolfi	02faf5376b	refactor: use org--name in artifacts-path (#912 ) use org--name in artifacts-path Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-07 13:58:05 +01:00
Panos Vagenas	90b766e2ae	fix(markdown): handle nested lists (#910 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-07 12:55:12 +01:00
Michele Dolfi	9114ada7bc	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 ) fix: Support for RTL programmatic documents fix(parser): detect and handle rotated pages fix(parser): fix bug causing duplicated text fix(formula): improve stopping criteria chore: update lock file fix: temporary constrain beautifulsoup * switch to code formula model v1.0.1 and new test pdf Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> * switch to code formula model v1.0.1 and new test pdf Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> * cleaned up the data folder in the tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * switch to code formula model v1.0.1 and new test pdf Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> * added three test-files for right-to-left Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fix black Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> * added new gt for test_e2e_conversion Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> * added new gt for test_e2e_conversion Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> * Add code to expose text direction of cell Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * new test file Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> * update lock Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix mypy reports Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix example filepaths Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add test data results Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * pin wheel of latest docling-parse release Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use latest docling-core Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove debugging code Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix path to files in example Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Revert unwanted RTL additions Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix test data paths in examples Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com> Co-authored-by: Peter Staar <taa@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-02-07 08:43:31 +01:00
Michele Dolfi	ed74fe2ec0	feat: new artifacts path and CLI utility (#876 ) * fix artifacts path Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docling-models utility Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * missing formatting Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename utility to docling-tools Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename download methods and deprecation warnings Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * propagate artifacts path usage for ocr models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * move function to utils Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove unused file Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * simplify downloading specific model(s) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * minor refactor Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-06 15:46:32 +01:00
Vladimir Gurevich	722a6eb7b9	fix(msword_backend): handle conversion error in label parsing (#896 ) Updated label parsing to use `str_to_int` with a default value to prevent potential conversion errors. Signed-off-by: Vladimir Gurevich <vladimir@beaconcure.com> Co-authored-by: Vladimir Gurevich <vladimir@beaconcure.com>	2025-02-06 12:30:51 +01:00
Michele Dolfi	5ad6de0560	fix: enrichment models batch size and expose picture classifier (#878 ) * expose picture classifier in CLI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use different batch size in each model Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove batch size from CLI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * cleanup imports Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-05 11:46:01 +01:00
Panos Vagenas	17448163e7	chore: fix docs search (#880 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-04 11:35:34 +01:00
Nikos Livathinos	6d3fea0196	docs: Introduce example with custom models for RapidOCR (#874 ) * docs: Introduce example with custom models for RapidOCR Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Exclude the example with custom RapidOCR models from the examples to run in github actions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-04 10:07:00 +01:00
github-actions[bot]	b5da4080c9	chore: bump version to 2.18.0 [skip ci] v2.18.0	2025-02-03 14:58:50 +00:00
Panos Vagenas	5ac2887e4a	fix(markdown): fix parsing if doc ending with table (#873 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-03 14:38:38 +01:00
Panos Vagenas	a40544a546	chore: clean up top-level file (#872 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-03 14:10:12 +01:00
Panos Vagenas	94751a78f4	fix(markdown): add support for HTML content (#855 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-03 12:21:05 +01:00
Michele Dolfi	6a76b49a47	feat: Expose equation exports (#869 ) * pin new docling-core and exploit it via assembler changes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update test results Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update with docling-core release Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-03 10:31:19 +01:00
Cesar Berrospi Ramis	0cd81a8122	fix(docx): merged table cells not properly converted (#857 ) * fix(docx): merged cells not properly converted Fix conversion issue of merged cells in Word tables leading to repeated text. Simplify Word table conversion code. Add docx file with several table formats for regression tests. Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * chore: add type hinting to docx backend Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> --------- Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-02-03 10:20:03 +01:00
Maxim Lysak	eff16b62cc	fix: Processing of placeholder shapes in pptx that have text but no bbox (#868 ) Processing of placeholder shapes in pptx that have text but no bbox Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2025-02-03 09:33:33 +01:00
Maxim Lysak	b1cf796730	fix: KeyError in tableformer prediction (#854 ) * fix for KeyError in tableformer prediction Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * chore: rewrite cumbersome dictionary checking Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-01-31 17:00:14 +01:00
Christoph Auer	70d68b6164	feat: Add option to define page range (#852 ) Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-01-31 15:23:00 +01:00
Maxim Lysak	d727b04ad0	feat(docx): Support of SDTs in docx backend (#853 ) Support of table of content containers in docx backend Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2025-01-31 14:52:24 +01:00
Maxim Lysak	2c037ae62e	fix: Fixed docx import with headers that are also lists (#842 ) * Fix for docx when headers are also lists, now recorded as appropriate headers and subheaders, unit test included Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Update docling/backend/msword_backend.py Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> Signed-off-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com> * Update docling/backend/msword_backend.py Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> Signed-off-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Signed-off-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-01-31 10:51:21 +01:00
Michele Dolfi	2a1f8afe7e	fix: use new add_code in html backend and add more typing hints (#850 ) fix add_code in html backend and add more typing hints Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-01-31 09:54:17 +01:00
Michele Dolfi	4df085aa6c	feat: Python 3.13 support (#841 ) * test: update results with new docling-core Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update all deps in the lock Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix table in test results Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix version for python3.13 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * latest poetry version in CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * activate py3.13 in CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update docs about python 3.13 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * test with rapidocr only on python <3.13 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-01-30 17:26:42 +01:00
Panos Vagenas	bccb022fc8	fix(markdown): fix empty block handling (#843 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-01-30 16:22:29 +01:00
Maxim Lysak	fea0a99a95	fix: Fix for the crash when encountering WMF images in pptx and docx (#837 ) * Fix for the crash when encountering WMF images in pptx and docx backends on non Windows platforms Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Updated faq Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2025-01-30 14:58:27 +01:00
Michele Dolfi	d01a2e73ee	test: update results with new docling-core (#839 ) * test: update results with new docling-core Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix table output in 2203.01017v2.md Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-01-30 14:07:52 +01:00
Peter W. J. Staar	d7c082894e	docs: updated the readme with upcoming features (#831 ) * updated the readme with upcoming features Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the docs-index Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-01-30 09:52:54 +01:00
Christoph Auer	f9144f2bb6	docs: Add example for inspection of picture content (#624 ) * chore: Add example for inspection of picture content Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Test case re-generation Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Test case re-generation only on CPU Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Add missing GT files Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-01-29 10:39:00 +01:00
github-actions[bot]	4d11d87d06	chore: bump version to 2.17.0 [skip ci] v2.17.0	2025-01-28 18:37:26 +00:00
Panos Vagenas	5aed9f8aeb	fix: fix single newline handling in MD backend (#824 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-01-28 19:05:55 +01:00
Cesar Berrospi Ramis	adf6353483	fix: use file extension if filetype fails with PDF (#827 ) Filetype library may not identify some files as PDF. Leverage the file extension as a simple solution. Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-01-28 19:03:54 +01:00
Panos Vagenas	ba521dd88f	chore: add missing imports to Office type tests (#826 ) * chore: add missing import to XLSX test Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * Update test_backend_msword.py [skip ci] Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * Update test_backend_pptx.py Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-01-28 16:17:44 +01:00
Panos Vagenas	6875913e34	docs: document Docling JSON parsing (#819 ) * docs: document Docling JSON parsing Also: - factored out and expanded supported formats - reorged feature list Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * update feature list, minor fixes Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-01-28 13:23:30 +01:00
Anastas Stoyanovsky	5139b48e4e	docs: Add SSL verification error mitigation (#821 ) Add SSL verification error mitigation Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>	2025-01-28 07:22:43 +01:00
Michele Dolfi	6882e6c38d	feat(CLI): Expose code and formula models in the CLI (#820 ) feat: expose code and formula models in the CLI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-01-28 06:26:03 +01:00
Cesar Berrospi Ramis	4d41db3f7a	docs(backend XML): do not delete temp file in notebook (#817 ) Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-01-27 18:53:39 +01:00
Cesar Berrospi Ramis	a112d7a035	fix: parse html with omitted body tag (#818 ) * fix: parse HTML files without body tag Parse HTML files without 'body' tag, since it is optional in HTML5 specification. Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * test: ensure docling converts HTML without body tag Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> --------- Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-01-27 16:59:00 +01:00
Panos Vagenas	95b293a723	feat: add platform info to CLI version printout (#816 ) * feat: add platform info to CLI version printout Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * Update main.py Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * add Python implementation & language versions Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-01-27 16:04:57 +01:00
Yorick Terweijden	53327552e8	feat(ocr): expose `rec_keys_path` in RapidOcrOptions to support custom dictionaries (#786 ) * Expose `rec_keys_path` in RapidOcrOptions to support custom dictionaries - Added `rec_keys_path` to `RapidOcrOptions` to align with RapidOCR's capability to use custom character dictionaries. - Passed `rec_keys_path` to `RapidOcrModel` initialization, ensuring the recognition model can load the correct dictionary (e.g., for Latin characters). Signed-off-by: Yorick Terweijden <yorick@spread.ai> * style(rapidocr-options): fix alignment of `rec_keys_path` comment Adjusted the alignment of the comment for `rec_keys_path` to maintain consistent formatting. No functional changes were made. Signed-off-by: Yorick Terweijden <yorick@spread.ai> --------- Signed-off-by: Yorick Terweijden <yorick@spread.ai>	2025-01-27 13:38:15 +01:00
Michele Dolfi	9022c6d855	chore: update deps in lockfile (#815 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-01-27 12:41:18 +01:00
Farzad Sunavala	8a4ec77576	docs: typo (#814 ) * Update rag_azuresearch.ipynb Signed-off-by: Farzad Sunavala <40604067+farzad528@users.noreply.github.com> * typo Signed-off-by: Farzad Sunavala <40604067+farzad528@users.noreply.github.com> --------- Signed-off-by: Farzad Sunavala <40604067+farzad528@users.noreply.github.com>	2025-01-27 11:24:26 +01:00
Farzad Sunavala	b885b2fa3c	docs: added markdown headings to enable TOC in github pages (#808 ) * docs: added markdown headings to enable TOC in github pages Signed-off-by: Farzad Sunavala <40604067+farzad528@users.noreply.github.com> * minor renames Signed-off-by: Farzad Sunavala <40604067+farzad528@users.noreply.github.com> * part 3 heading Signed-off-by: Farzad Sunavala <40604067+farzad528@users.noreply.github.com> --------- Signed-off-by: Farzad Sunavala <40604067+farzad528@users.noreply.github.com>	2025-01-27 09:40:35 +01:00
Cesar Berrospi Ramis	c2ae1cc4ca	docs: description of supported formats and backends (#788 ) * chore: remove type-ignore marks for attaching text to non GroupItems After commit b74208 of docling-core, text items can be attached to any NodeItem and therefore the ignore[arg-type] type marks can be removed. Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * test: remove unnecessary imports Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * docs: add documentation on supported formats and backends Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * docs: add notebook example with XML backends Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> --------- Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-01-26 08:10:33 +01:00
Nikos Livathinos	3be2fb581f	feat: Introduce automatic language detection in TesseractOcrCliModel (#800 ) * feat: Introduce automatic language detection in tesseract_ocr_cli model. Extend unit tests. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Add example how to use "auto" language with tesseract OCR engines Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Refactor the TesseractOcrModel and TesseractOcrCliModel to validate if the auto-detected language is installed in the system and if not fall back to a default option without language. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-01-26 08:07:56 +01:00
github-actions[bot]	9e4ca90db1	chore: bump version to 2.16.0 [skip ci] v2.16.0	2025-01-24 18:21:14 +00:00
Peter W. J. Staar	a458e298ca	fix: added extraction of byte-images in excel (#804 ) * fix(msexcel): ignore Mypy checking for _find_images_in_sheet function Signed-off-by: Jiun An Tsai <andrew@247365-Macbook.local> * fixed some issues Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * pinned pillow in pyproject Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Jiun An Tsai <andrew@247365-Macbook.local> Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Jiun An Tsai <andrew@247365-Macbook.local> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-01-24 18:48:02 +01:00
Matteo	16a218d871	feat: New document picture classifier (#805 ) * figure classifier Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com> * gt for e2e tests Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com> * tests Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com> --------- Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>	2025-01-24 18:05:51 +01:00
Panos Vagenas	88a0e66adc	feat: add Docling JSON ingestion (#783 ) * feat: add Docling JSON ingestion Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * update conversion as per review comments, add tests, revert Docling JSON disambiguation, document intricacies Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * Update docling/backend/json/docling_json_backend.py Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-01-24 18:05:23 +01:00
Yusik Kim	e9768ae6a5	chore: expose draw_clusters function (#803 ) feat: expose draw_clusters function add type annotations to function signature Signed-off-by: Yusik Kim <kmyusk@gmail.com>	2025-01-24 17:35:29 +01:00

... 3 4 5 6 7 ...

550 Commits