Commit Graph

  • 3d4120c529 Deployed e79e4f0 with MkDocs version: 1.6.1 gh-pages 2025-06-26 17:52:43 +00:00
  • e79e4f0ab6
    fix(markdown): make parsing of rich table cells valid (#1821) main Michael Honaker 2025-06-26 13:50:45 -04:00
  • 70ac6a2ff5 chore: propagate list remodeling remodel-lists Panos Vagenas 2025-06-26 14:34:32 +02:00
  • ee4781075a chore: bump version to 2.38.1 [skip ci] v2.38.1 github-actions[bot] 2025-06-25 16:27:46 +00:00
  • d337825b8e
    fix: updated granite vision model version for picture description (#1852) pranaymiri 2025-06-25 21:19:56 +05:30
  • 7c5614a37a
    fix(markdown): fix single-formatted headings & list items (#1820) Panos Vagenas 2025-06-25 13:05:06 +02:00
  • 41e8cae26b
    fix: fix response type of ollama (#1850) Michele Dolfi 2025-06-25 04:33:09 -05:00
  • 4002de1f92
    fix: Handle missing runs to avoid out of range exception (#1844) Allen N. 2025-06-24 22:55:27 -07:00
  • 1dc63d0aa9 chore: bump version to 2.38.0 [skip ci] v2.38.0 github-actions[bot] 2025-06-23 18:14:24 +00:00
  • f3ae3029b8
    docs: update readme and add ASR example (#1836) Peter W. J. Staar 2025-06-23 18:55:16 +02:00
  • 1557e7ce3e
    feat: Support audio input (#1763) Peter W. J. Staar 2025-06-23 14:47:26 +02:00
  • acfd1dab86 Update all test cases again (2) cau/dp4-test-diff Christoph Auer 2025-06-20 17:35:56 +02:00
  • 033f504a82 Update all test cases again (2) Christoph Auer 2025-06-20 16:57:30 +02:00
  • a6efb2eb3d Merge branch 'main' of github.com:DS4SD/docling into cau/dp4-test-diff Christoph Auer 2025-06-20 16:51:35 +02:00
  • 6158a2e784 Update all test cases again cau/integrate-list-item-cleanup Christoph Auer 2025-06-20 14:56:46 +02:00
  • d26dac61a8
    fix(docx): ensure list items have a list parent (#1827) Cesar Berrospi Ramis 2025-06-20 14:47:25 +02:00
  • c146c8f309 Update all test cases Christoph Auer 2025-06-20 13:31:28 +02:00
  • 926e32037d Update to final version Christoph Auer 2025-06-20 11:42:35 +02:00
  • 1350a8d3e5
    fix(msword_backend): Identify text in the same line after an image #1425 (#1610) mkrssg 2025-06-20 10:55:30 +02:00
  • 48ee8a1291 Integrate ListItemMarkerProcessor into document assembly Christoph Auer 2025-06-20 10:28:59 +02:00
  • 90da15f611 initial reference to granite-doclong dev/add-granite-docling-preview Peter Staar 2025-06-20 07:47:12 +02:00
  • 0e63cb09e6 Remove pages.json from diff Christoph Auer 2025-06-19 16:08:07 +02:00
  • 64ac043786
    docs: support running examples from root or subfolder (#1816) Michele Dolfi 2025-06-19 04:10:40 -05:00
  • 4e332500a8 add table raw cells when no table structure model was used fix-print-raw-table Michele Dolfi 2025-06-19 09:12:08 +02:00
  • dd7f64ff28
    fix: Ensure uninitialized pages are removed before assembling document (#1812) Christoph Auer 2025-06-19 07:33:25 +02:00
  • 861abcdcb0
    feat(markdown): add formatting & improve inline support (#1804) Panos Vagenas 2025-06-18 15:57:57 +02:00
  • 215b540f6c
    feat: Maximum image size for Vlm models (#1802) Shkarupa Alex 2025-06-18 13:57:37 +03:00
  • dbab30e92c
    fix: formula conversion with page_range param set (#1791) Mahafuzur Rahman 2025-06-17 17:58:45 +06:00
  • c2ef69718a
    chore: dco advisor (#1795) Michele Dolfi 2025-06-17 02:45:56 -05:00
  • 7bae3b6c06 chore: bump version to 2.37.0 [skip ci] v2.37.0 github-actions[bot] 2025-06-16 11:02:54 +00:00
  • f28d23cf03
    fix: pptx line break and space handling (#1664) Martin Wind 2025-06-16 10:44:30 +02:00
  • b886e4df31
    fix(asciidoc): set default size when missing in image directive (#1769) Cesar Berrospi Ramis 2025-06-16 10:38:46 +02:00
  • 7d3302cb48
    feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) Christoph Auer 2025-06-13 19:01:55 +02:00
  • 0432a31b2f
    docs: update vlm models api examples with LM Studio (#1759) Michele Dolfi 2025-06-12 05:58:44 -05:00
  • 7a275c7637
    fix: Handle NoneType error in MsPowerpointDocumentBackend (#1747) Bruno Rigal 2025-06-10 19:43:20 +02:00
  • df140227c3
    feat: support xlsm files (#1520) Ayraf 2025-06-10 20:25:59 +05:30
  • 6613b9e98b
    fix: prov for merged-elems (#1728) Peter W. J. Staar 2025-06-10 11:22:42 +02:00
  • e979750ce9
    fix(tesseract): initialize df_osd to avoid uninitialized variable error (#1718) Maras Ioannis 2025-06-10 11:57:45 +03:00
  • f7f31137f1
    fix: allow custom torch_dtype in vlm models (#1735) Michele Dolfi 2025-06-10 03:52:15 -05:00
  • 3a76433b83 Update test files dev/fix_msword_backend_identify_text_after_image Christoph Auer 2025-06-10 09:52:31 +02:00
  • 5fac357995 Merge branch 'main' of github.com:docling-project/docling into dev/fix_msword_backend_identify_text_after_image Christoph Auer 2025-06-10 09:52:15 +02:00
  • 49b10e7419
    docs: add open webui (#1734) Michele Dolfi 2025-06-10 02:35:20 -05:00
  • 52b8b9163f Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-06-06 20:53:40 +02:00
  • 9dbcb3d7d4
    fix: Improve extraction from textboxes in Word docs (#1701) AndrewTsai0406 2025-06-06 17:37:46 +08:00
  • 2bc564ccef Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-06-05 22:20:09 +02:00
  • a2b83fe4ae
    fix: Add WEBP to the list of image file extensions (#1711) Eugene 2025-06-05 11:09:27 +04:00
  • 40df0d74ad chore: bump version to 2.36.1 [skip ci] v2.36.1 github-actions[bot] 2025-06-04 11:43:13 +00:00
  • 8846f1a393
    fix: remove typer and click constraints (#1707) Michele Dolfi 2025-06-04 13:06:23 +02:00
  • be42b03f9b
    docs: flash-attn usage and install (#1706) Michele Dolfi 2025-06-04 11:09:54 +02:00
  • 96c54dba91 chore: bump version to 2.36.0 [skip ci] v2.36.0 github-actions[bot] 2025-06-03 13:54:25 +00:00
  • cdd401847a
    feat: simplify dependencies, switch to uv (#1700) Michele Dolfi 2025-06-03 15:18:54 +02:00
  • 61d0d6c755
    test: mark flaky test (#1698) Panos Vagenas 2025-06-03 13:13:44 +02:00
  • cfdf4cea25
    feat: new vlm-models support (#1570) Peter W. J. Staar 2025-06-02 17:01:06 +02:00
  • 08dcacc5cb chore: bump version to 2.35.0 [skip ci] v2.35.0 github-actions[bot] 2025-06-02 12:30:26 +00:00
  • 11ca4f7a7b
    docs: fix typo in index.md (#1676) Edgar Hipp 2025-06-02 12:35:59 +02:00
  • 1c8a1283c4
    test: ensure utf-8 in test data utils (#1691) Panos Vagenas 2025-06-02 12:13:19 +02:00
  • 984cb137f6
    fix: guess HTML content starting with script tag (#1673) cp_main_20250602 Cesar Berrospi Ramis 2025-06-02 08:43:24 +02:00
  • fa561170f6 chore: Update lock with the dependencies for D-FINE nli/layout_dfine Nikos Livathinos 2025-05-31 16:57:09 +02:00
  • dcc63ae00b Merge branch 'main' into nli/layout_rtdetr_v2 nli/layout_rtdetr_v2 Nikos Livathinos 2025-05-31 16:55:06 +02:00
  • 7aa2be93d6 Merge branch 'main' into nli/layout_dfine Nikos Livathinos 2025-05-31 16:48:28 +02:00
  • 30dafd976d chore: Update dependencies to docling-ibm-models and transformers to support D-FINE layout model Nikos Livathinos 2025-05-31 16:39:25 +02:00
  • 93d98dfa63 test: added groundtruth test files for fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Michael Krissgau 2025-05-29 15:12:55 +02:00
  • 84dc120d39 Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-05-29 15:04:06 +02:00
  • 3942923125
    chore: fix or ignore runtime and deprecation warnings (#1660) Cesar Berrospi Ramis 2025-05-28 17:55:31 +02:00
  • b3e0042813
    chore: exclude data from GH Linguist (#1671) Panos Vagenas 2025-05-28 15:42:34 +02:00
  • 106951e71e
    test: add missing ground truth files (#1667) Cesar Berrospi Ramis 2025-05-28 13:26:49 +02:00
  • b356b33059
    feat: Add visualization of bbox on page with html export. (#1663) Peter W. J. Staar 2025-05-28 13:10:38 +02:00
  • 51d3450915
    fix: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte (#1665) DavidLee 2025-05-27 20:06:05 +08:00
  • 2579d89510 chore: bump version to 2.34.0 [skip ci] v2.34.0 github-actions[bot] 2025-05-22 18:44:45 +00:00
  • fffa865014 test: add test file and case for fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Michael Krissgau 2025-05-22 19:02:59 +02:00
  • af4aaa28af fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Michael Krissgau 2025-05-22 17:45:15 +02:00
  • c2f595d283
    fix: fix ZeroDivisionError for cell_bbox.area() (#1636) Said Gürbüz 2025-05-22 13:43:33 +02:00
  • 45265bf8b1
    feat(ocr): auto-detect rotated pages in Tesseract (#1167) Clément Doumouro 2025-05-21 18:12:33 +02:00
  • 90875247e5
    feat: Establish confidence estimation for document and pages (#1313) Christoph Auer 2025-05-21 12:32:49 +02:00
  • 14d4f5b109
    fix(integration): update the Apify Actor integration (#1619) Václav Vančura 2025-05-21 02:47:55 +02:00
  • 84d0889829 chore: bump version to 2.33.0 [skip ci] v2.33.0 github-actions[bot] 2025-05-20 19:54:51 +00:00
  • f4d9d4111b
    fix: Fix issue with detecting docx files, and files with upper case extensions (#1609) MoheyElDin Badr 2025-05-20 20:42:37 +03:00
  • 0e00a263fa
    fix: load_from_doctags static usage (#1617) Said Gürbüz 2025-05-20 15:06:12 +02:00
  • f2e9c0784c
    fix: incorrect force_backend_text behaviour for VLM DocTag pipelines (#1371) Krishnan 2025-05-20 13:29:38 +05:30
  • 98b5eeb844
    fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) Pedro Ribeiro 2025-05-19 14:26:00 +01:00
  • 12a0e64892
    feat: add textbox content extraction in msword_backend (#1538) AndrewTsai0406 2025-05-19 21:01:36 +08:00
  • 7c4c356e76
    chore: fix chunking example data link (#1596) Panos Vagenas 2025-05-16 08:44:47 +02:00
  • aeb0716bbb chore: bump version to 2.32.0 [skip ci] v2.32.0 github-actions[bot] 2025-05-14 14:28:21 +00:00
  • 3a04f2a367
    feat: Improve parallelization for remote services API calls (#1548) Vinay R Damodaran 2025-05-14 06:47:55 -07:00
  • 9f8b479f17
    fix(ocr): orig field in TesseractOcrCliModel as str (#1553) jimkarag02 2025-05-14 16:05:52 +03:00
  • 9f28abf061
    docs: add advanced chunking & serialization example (#1589) Panos Vagenas 2025-05-14 13:35:07 +01:00
  • 2efb7a7c06
    fix(settings): fix nested settings load via environment variables (#1551) Alex Sokolov 2025-05-14 14:42:10 +03:00
  • 12dab0a1e8
    feat: support image/webp file type (#1415) Elwin 2025-05-14 15:47:28 +08:00
  • 23238c241f chore: bump version to 2.31.2 [skip ci] v2.31.2 github-actions[bot] 2025-05-13 10:09:19 +00:00
  • 4046d0b2f3
    fix: AsciiDoc header identification (#1562) (#1563) Marco Fargetta 2025-05-13 11:17:26 +02:00
  • 8baa85a49d
    fix: restrict click version and update lock file (#1582) Michele Dolfi 2025-05-13 10:40:08 +02:00
  • 0d0fa6cbe3 chore: bump version to 2.31.1 [skip ci] v2.31.1 github-actions[bot] 2025-05-12 09:44:26 +00:00
  • 127e38646f
    fix: add smoldocling in download utils (#1577) Michele Dolfi 2025-05-12 10:48:07 +02:00
  • 76501331d2 need to fix ruff linter dev/add-asr-pipeline Peter Staar 2025-05-12 07:34:24 +02:00
  • 32ad65cb9f work in progress: slowly adding ASR pipeline and its derivatives Peter Staar 2025-05-12 07:33:38 +02:00
  • 844babb390
    docs: update links in data_prep_kit (#1559) Oleg Lavrovsky 2025-05-11 20:38:25 +02:00
  • 776e7ecf9a
    fix(HTML): handle row spans in header rows (#1536) Cesar Berrospi Ramis 2025-05-09 15:14:32 +02:00
  • 6e956dc551 Merge branch 'main' into nli/layoutmodel_improvements nli/layoutmodel_improvements Nikos Livathinos 2025-05-09 14:47:44 +02:00
  • 3220a592e7
    docs: add serialization docs, update chunking docs (#1556) Panos Vagenas 2025-05-08 21:43:01 +02:00
  • f1658edbad
    fix: mime error in document streams (#1523) DavidLee 2025-05-06 15:30:46 +08:00