mirror of
https://github.com/docling-project/docling.git
synced 2025-06-27 05:20:05 +00:00

* warning for develop examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docs for enrichment models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * minor reorg of top-level docs (#1098) * minor reorg of top-level docs Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * fix typo [no ci] Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * trigger ci Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
1.1 KiB
1.1 KiB
Docling can parse various documents formats into a unified representation (Docling Document), which it can export to different formats too — check out Architecture for more details.
Below you can find a listing of all supported input and output formats.
Supported input formats
Format | Description |
---|---|
DOCX, XLSX, PPTX | Default formats in MS Office 2007+, based on Office Open XML |
Markdown | |
AsciiDoc | |
HTML, XHTML | |
CSV | |
PNG, JPEG, TIFF, BMP | Image formats |
Schema-specific support:
Format | Description |
---|---|
USPTO XML | XML format followed by USPTO patents |
JATS XML | XML format followed by JATS articles |
Docling JSON | JSON-serialized Docling Document |
Supported output formats
Format | Description |
---|---|
HTML | Both image embedding and referencing are supported |
Markdown | |
JSON | Lossless serialization of Docling Document |
Text | Plain text, i.e. without Markdown markers |
Doctags |