mirror of
https://github.com/docling-project/docling.git
synced 2025-06-27 05:20:05 +00:00
chore: update the with input formats and DoclingDocument (#188)
--------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
parent
f542460af3
commit
94a5290789
14
.github/workflows/cd-docs.yml
vendored
Normal file
14
.github/workflows/cd-docs.yml
vendored
Normal file
@ -0,0 +1,14 @@
|
||||
name: "Run Docs CD"
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- "main"
|
||||
|
||||
jobs:
|
||||
build-deploy-docs:
|
||||
uses: ./.github/workflows/docs.yml
|
||||
with:
|
||||
deploy: true
|
||||
permissions:
|
||||
contents: write
|
6
.github/workflows/cd.yml
vendored
6
.github/workflows/cd.yml
vendored
@ -10,12 +10,6 @@ env:
|
||||
jobs:
|
||||
code-checks:
|
||||
uses: ./.github/workflows/checks.yml
|
||||
build-deploy-docs:
|
||||
uses: ./.github/workflows/docs.yml
|
||||
with:
|
||||
deploy: true
|
||||
permissions:
|
||||
contents: write
|
||||
pre-release-check:
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
|
16
.github/workflows/ci-docs.yml
vendored
Normal file
16
.github/workflows/ci-docs.yml
vendored
Normal file
@ -0,0 +1,16 @@
|
||||
name: "Run Docs CI"
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, reopened, synchronize]
|
||||
push:
|
||||
branches:
|
||||
- "**"
|
||||
- "!gh-pages"
|
||||
|
||||
jobs:
|
||||
build-docs:
|
||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
||||
uses: ./.github/workflows/docs.yml
|
||||
with:
|
||||
deploy: false
|
6
.github/workflows/ci.yml
vendored
6
.github/workflows/ci.yml
vendored
@ -6,6 +6,7 @@ on:
|
||||
push:
|
||||
branches:
|
||||
- "**"
|
||||
- "!main"
|
||||
- "!gh-pages"
|
||||
|
||||
env:
|
||||
@ -16,8 +17,3 @@ jobs:
|
||||
code-checks:
|
||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
||||
uses: ./.github/workflows/checks.yml
|
||||
build-docs:
|
||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
||||
uses: ./.github/workflows/docs.yml
|
||||
with:
|
||||
deploy: false
|
||||
|
@ -22,8 +22,9 @@ Docling parses documents and exports them to the desired format with ease and sp
|
||||
|
||||
## Features
|
||||
|
||||
* 🗂️ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
|
||||
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
|
||||
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
|
||||
* 📑 Advanced PDF document understanding including page layout, reading order & table structures
|
||||
* 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format
|
||||
* 📝 Metadata extraction, including title, authors, references & language
|
||||
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
|
||||
* 🔍 OCR support for scanned PDFs
|
||||
|
@ -7,6 +7,8 @@ pydantic datatype, which can express several features common to documents, such
|
||||
* Layout information (i.e. bounding boxes) for all items, if available
|
||||
* Provenance information
|
||||
|
||||
The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/DS4SD/docling-core/tree/main/docling_core/types/doc).
|
||||
|
||||
It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.
|
||||
|
||||
## Example document structures
|
||||
|
@ -19,8 +19,9 @@ Docling parses documents and exports them to the desired format with ease and sp
|
||||
|
||||
## Features
|
||||
|
||||
* 🗂️ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
|
||||
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
|
||||
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
|
||||
* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
|
||||
* 📝 Metadata extraction, including title, authors, references & language
|
||||
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
|
||||
* 🔍 OCR support for scanned PDFs
|
||||
|
Loading…
x
Reference in New Issue
Block a user