mirror of
https://github.com/docling-project/docling.git
synced 2025-06-27 05:20:05 +00:00
chore: move to docling-project org (#1160)
* chore: rename org Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update docs/faq/index.md Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> * update github pages Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * revert test content Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
parent
f94da44ec5
commit
fa16b12316
2
.github/SECURITY.md
vendored
2
.github/SECURITY.md
vendored
@ -20,4 +20,4 @@ After the initial reply to your report, the security team will keep you informed
|
||||
|
||||
## Security Alerts
|
||||
|
||||
We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/DS4SD/docling/discussions/categories/announcements).
|
||||
We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/docling-project/docling/discussions/categories/announcements).
|
||||
|
2
.github/workflows/ci-docs.yml
vendored
2
.github/workflows/ci-docs.yml
vendored
@ -10,7 +10,7 @@ on:
|
||||
|
||||
jobs:
|
||||
build-docs:
|
||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'docling-project/docling' && github.event.pull_request.head.repo.full_name != 'docling-project/docling') }}
|
||||
uses: ./.github/workflows/docs.yml
|
||||
with:
|
||||
deploy: false
|
||||
|
2
.github/workflows/ci.yml
vendored
2
.github/workflows/ci.yml
vendored
@ -15,5 +15,5 @@ env:
|
||||
|
||||
jobs:
|
||||
code-checks:
|
||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'docling-project/docling' && github.event.pull_request.head.repo.full_name != 'docling-project/docling') }}
|
||||
uses: ./.github/workflows/checks.yml
|
||||
|
666
CHANGELOG.md
666
CHANGELOG.md
File diff suppressed because it is too large
Load Diff
@ -2,13 +2,13 @@
|
||||
Our project welcomes external contributions. If you have an itch, please feel
|
||||
free to scratch it.
|
||||
|
||||
To contribute code or documentation, please submit a [pull request](https://github.com/DS4SD/docling/pulls).
|
||||
To contribute code or documentation, please submit a [pull request](https://github.com/docling-project/docling/pulls).
|
||||
|
||||
A good way to familiarize yourself with the codebase and contribution process is
|
||||
to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/DS4SD/docling/issues).
|
||||
to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/docling-project/docling/issues).
|
||||
Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.
|
||||
|
||||
For general questions or support requests, please refer to the [discussion section](https://github.com/DS4SD/docling/discussions).
|
||||
For general questions or support requests, please refer to the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||
|
||||
**Note: We appreciate your effort and want to avoid situations where a contribution
|
||||
requires extensive rework (by you or by us), sits in the backlog for a long time, or
|
||||
@ -16,14 +16,14 @@ cannot be accepted at all!**
|
||||
|
||||
### Proposing New Features
|
||||
|
||||
If you would like to implement a new feature, please [raise an issue](https://github.com/DS4SD/docling/issues)
|
||||
If you would like to implement a new feature, please [raise an issue](https://github.com/docling-project/docling/issues)
|
||||
before sending a pull request so the feature can be discussed. This is to avoid
|
||||
you spending valuable time working on a feature that the project developers
|
||||
are not interested in accepting into the codebase.
|
||||
|
||||
### Fixing Bugs
|
||||
|
||||
If you would like to fix a bug, please [raise an issue](https://github.com/DS4SD/docling/issues) before sending a
|
||||
If you would like to fix a bug, please [raise an issue](https://github.com/docling-project/docling/issues) before sending a
|
||||
pull request so it can be tracked.
|
||||
|
||||
### Merge Approval
|
||||
@ -78,7 +78,7 @@ This project strictly adheres to using dependencies that are compatible with the
|
||||
|
||||
## Communication
|
||||
|
||||
Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions).
|
||||
Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||
|
||||
|
||||
|
||||
|
28
README.md
28
README.md
@ -1,6 +1,6 @@
|
||||
<p align="center">
|
||||
<a href="https://github.com/ds4sd/docling">
|
||||
<img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/docs/assets/docling_processing.png" width="100%"/>
|
||||
<a href="https://github.com/docling-project/docling">
|
||||
<img loading="lazy" alt="Docling" src="https://github.com/docling-project/docling/raw/main/docs/assets/docling_processing.png" width="100%"/>
|
||||
</a>
|
||||
</p>
|
||||
|
||||
@ -11,7 +11,7 @@
|
||||
</p>
|
||||
|
||||
[](https://arxiv.org/abs/2408.09869)
|
||||
[](https://ds4sd.github.io/docling/)
|
||||
[](https://docling-project.github.io/docling/)
|
||||
[](https://pypi.org/project/docling/)
|
||||
[](https://pypi.org/project/docling/)
|
||||
[](https://python-poetry.org/)
|
||||
@ -19,7 +19,7 @@
|
||||
[](https://pycqa.github.io/isort/)
|
||||
[](https://pydantic.dev)
|
||||
[](https://github.com/pre-commit/pre-commit)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://pepy.tech/projects/docling)
|
||||
|
||||
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
||||
@ -51,7 +51,7 @@ pip install docling
|
||||
|
||||
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
|
||||
|
||||
More [detailed installation instructions](https://ds4sd.github.io/docling/installation/) are available in the docs.
|
||||
More [detailed installation instructions](https://docling-project.github.io/docling/installation/) are available in the docs.
|
||||
|
||||
## Getting started
|
||||
|
||||
@ -66,28 +66,28 @@ result = converter.convert(source)
|
||||
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
|
||||
```
|
||||
|
||||
More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in
|
||||
More [advanced usage options](https://docling-project.github.io/docling/usage/) are available in
|
||||
the docs.
|
||||
|
||||
## Documentation
|
||||
|
||||
Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on
|
||||
Check out Docling's [documentation](https://docling-project.github.io/docling/), for details on
|
||||
installation, usage, concepts, recipes, extensions, and more.
|
||||
|
||||
## Examples
|
||||
|
||||
Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/),
|
||||
Go hands-on with our [examples](https://docling-project.github.io/docling/examples/),
|
||||
demonstrating how to address different application use cases with Docling.
|
||||
|
||||
## Integrations
|
||||
|
||||
To further accelerate your AI application development, check out Docling's native
|
||||
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks
|
||||
[integrations](https://docling-project.github.io/docling/integrations/) with popular frameworks
|
||||
and tools.
|
||||
|
||||
## Get help and support
|
||||
|
||||
Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions).
|
||||
Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||
|
||||
## Technical report
|
||||
|
||||
@ -95,7 +95,7 @@ For more details on Docling's inner workings, check out the [Docling Technical R
|
||||
|
||||
## Contributing
|
||||
|
||||
Please read [Contributing to Docling](https://github.com/DS4SD/docling/blob/main/CONTRIBUTING.md) for details.
|
||||
Please read [Contributing to Docling](https://github.com/docling-project/docling/blob/main/CONTRIBUTING.md) for details.
|
||||
|
||||
## References
|
||||
|
||||
@ -123,6 +123,6 @@ For individual model usage, please refer to the model licenses found in the orig
|
||||
|
||||
Docling has been brought to you by IBM.
|
||||
|
||||
[supported_formats]: https://ds4sd.github.io/docling/usage/supported_formats/
|
||||
[docling_document]: https://ds4sd.github.io/docling/concepts/docling_document/
|
||||
[integrations]: https://ds4sd.github.io/docling/integrations/
|
||||
[supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
|
||||
[docling_document]: https://docling-project.github.io/docling/concepts/docling_document/
|
||||
[integrations]: https://docling-project.github.io/docling/integrations/
|
||||
|
@ -121,7 +121,7 @@ def download(
|
||||
"Using the CLI:",
|
||||
f"`docling --artifacts-path={output_dir} FILE`",
|
||||
"\n",
|
||||
"Using Python: see the documentation at <https://ds4sd.github.io/docling/usage>.",
|
||||
"Using Python: see the documentation at <https://docling-project.github.io/docling/usage>.",
|
||||
)
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@ class OcrMacModel(BaseOcrModel):
|
||||
"ocrmac is not correctly installed. "
|
||||
"Please install it via `pip install ocrmac` to use this OCR engine. "
|
||||
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
||||
"https://ds4sd.github.io/docling/installation/"
|
||||
"https://docling-project.github.io/docling/installation/"
|
||||
)
|
||||
try:
|
||||
from ocrmac import ocrmac
|
||||
|
@ -31,14 +31,14 @@ class TesseractOcrModel(BaseOcrModel):
|
||||
"Note that tesserocr might have to be manually compiled for working with "
|
||||
"your Tesseract installation. The Docling documentation provides examples for it. "
|
||||
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
||||
"https://ds4sd.github.io/docling/installation/"
|
||||
"https://docling-project.github.io/docling/installation/"
|
||||
)
|
||||
missing_langs_errmsg = (
|
||||
"tesserocr is not correctly configured. No language models have been detected. "
|
||||
"Please ensure that the TESSDATA_PREFIX envvar points to tesseract languages dir. "
|
||||
"You can find more information how to setup other OCR engines in Docling "
|
||||
"documentation: "
|
||||
"https://ds4sd.github.io/docling/installation/"
|
||||
"https://docling-project.github.io/docling/installation/"
|
||||
)
|
||||
|
||||
try:
|
||||
|
@ -7,7 +7,7 @@ pydantic datatype, which can express several features common to documents, such
|
||||
* Layout information (i.e. bounding boxes) for all items, if available
|
||||
* Provenance information
|
||||
|
||||
The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/DS4SD/docling-core/tree/main/docling_core/types/doc).
|
||||
The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/docling-project/docling-core/tree/main/docling_core/types/doc).
|
||||
|
||||
It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/backend_xml_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/backend_xml_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -36,7 +36,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This is an example of using [Docling](https://ds4sd.github.io/docling/) for converting structured data (XML) into a unified document\n",
|
||||
"This is an example of using [Docling](https://docling-project.github.io/docling/) for converting structured data (XML) into a unified document\n",
|
||||
"representation format, `DoclingDocument`, and leverage its riched structured content for RAG applications.\n",
|
||||
"\n",
|
||||
"Data used in this example consist of patents from the [United States Patent and Trademark Office (USPTO)](https://www.uspto.gov/) and medical\n",
|
||||
|
@ -103,7 +103,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://ds4sd.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
|
||||
"> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://docling-project.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -321,7 +321,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "docling-aMWN2FRM-py3.12",
|
||||
"display_name": "docling-hgXEfXco-py3.12",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
@ -36,7 +36,7 @@
|
||||
"## A recipe 🧑🍳 🐥 💚\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
|
||||
"- [Docling](https://ds4sd.github.io/docling/) for document parsing and chunking\n",
|
||||
"- [Docling](https://docling-project.github.io/docling/) for document parsing and chunking\n",
|
||||
"- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
|
||||
"- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
|
||||
"\n",
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_haystack.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_haystack.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -247,7 +247,7 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/pva/work/github.com/DS4SD/docling/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:2232: FutureWarning: `stop_sequences` is a deprecated argument for `text_generation` task and will be removed in version '0.28.0'. Use `stop` instead.\n",
|
||||
"/Users/pva/work/github.com/docling-project/docling/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:2232: FutureWarning: `stop_sequences` is a deprecated argument for `text_generation` task and will be removed in version '0.28.0'. Use `stop` instead.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
}
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_langchain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_langchain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -168,7 +168,7 @@
|
||||
"source": [
|
||||
"> Note: a message saying `\"Token indices sequence length is longer than the specified\n",
|
||||
"maximum sequence length...\"` can be ignored in this case — details\n",
|
||||
"[here](https://github.com/DS4SD/docling-core/issues/119#issuecomment-2577418826)."
|
||||
"[here](https://github.com/docling-project/docling-core/issues/119#issuecomment-2577418826)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_llamaindex.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_llamaindex.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[](https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_weaviate.ipynb)"
|
||||
"[](https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_weaviate.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -29,7 +29,7 @@
|
||||
"\n",
|
||||
"## A recipe 🧑🍳 🐥 💚\n",
|
||||
"\n",
|
||||
"This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://ds4sd.github.io/docling/).\n",
|
||||
"This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://docling-project.github.io/docling/).\n",
|
||||
"\n",
|
||||
"In this notebook, we accomplish the following:\n",
|
||||
"* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n",
|
||||
|
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/hybrid_rag_qdrant\n",
|
||||
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/hybrid_rag_qdrant\n",
|
||||
".ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
@ -109,7 +109,7 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/pva/work/github.com/DS4SD/docling/.venv/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py:155: UserWarning: Cannot enable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=1` is set and has priority.\n",
|
||||
"/Users/pva/work/github.com/docling-project/docling/.venv/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py:155: UserWarning: Cannot enable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=1` is set and has priority.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
}
|
||||
|
@ -1,6 +1,6 @@
|
||||
# FAQ
|
||||
|
||||
This is a collection of FAQ collected from the user questions on <https://github.com/DS4SD/docling/discussions>.
|
||||
This is a collection of FAQ collected from the user questions on <https://github.com/docling-project/docling/discussions>.
|
||||
|
||||
|
||||
??? question "Is Python 3.13 supported?"
|
||||
@ -41,7 +41,7 @@ This is a collection of FAQ collected from the user questions on <https://github
|
||||
]
|
||||
```
|
||||
|
||||
Source: Issue [#283](https://github.com/DS4SD/docling/issues/283#issuecomment-2465035868)
|
||||
Source: Issue [#283](https://github.com/docling-project/docling/issues/283#issuecomment-2465035868)
|
||||
|
||||
|
||||
??? question "Are text styles (bold, underline, etc) supported?"
|
||||
@ -74,7 +74,7 @@ This is a collection of FAQ collected from the user questions on <https://github
|
||||
)
|
||||
```
|
||||
|
||||
Source: Issue [#326](https://github.com/DS4SD/docling/issues/326)
|
||||
Source: Issue [#326](https://github.com/docling-project/docling/issues/326)
|
||||
|
||||
|
||||
??? question " Which model weights are needed to run Docling?"
|
||||
@ -84,7 +84,7 @@ This is a collection of FAQ collected from the user questions on <https://github
|
||||
|
||||
For processing PDF documents, Docling requires the model weights from <https://huggingface.co/ds4sd/docling-models>.
|
||||
|
||||
When OCR is enabled, some engines also require model artifacts. For example EasyOCR, for which Docling has [special pipeline options](https://github.com/DS4SD/docling/blob/main/docling/datamodel/pipeline_options.py#L68) to control the runtime behavior.
|
||||
When OCR is enabled, some engines also require model artifacts. For example EasyOCR, for which Docling has [special pipeline options](https://github.com/docling-project/docling/blob/main/docling/datamodel/pipeline_options.py#L68) to control the runtime behavior.
|
||||
|
||||
|
||||
??? question "SSL error downloading model weights"
|
||||
@ -174,6 +174,6 @@ This is a collection of FAQ collected from the user questions on <https://github
|
||||
print(f"Model max length: {tokenizer.model_max_length}")
|
||||
```
|
||||
|
||||
Also see [docling#725](https://github.com/DS4SD/docling/issues/725).
|
||||
Also see [docling#725](https://github.com/docling-project/docling/issues/725).
|
||||
|
||||
Source: Issue [docling-core#119](https://github.com/DS4SD/docling-core/issues/119)
|
||||
Source: Issue [docling-core#119](https://github.com/docling-project/docling-core/issues/119)
|
||||
|
@ -11,7 +11,7 @@
|
||||
[](https://pycqa.github.io/isort/)
|
||||
[](https://pydantic.dev)
|
||||
[](https://github.com/pre-commit/pre-commit)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://pepy.tech/projects/docling)
|
||||
|
||||
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
||||
|
@ -5,7 +5,7 @@ Docling is available as a converter in [Haystack](https://haystack.deepset.ai/):
|
||||
- 🧑🏽🍳 [Docling Haystack integration example][example]
|
||||
- 📦 [Docling Haystack integration PyPI][pypi]
|
||||
|
||||
[github]: https://github.com/DS4SD/docling-haystack
|
||||
[github]: https://github.com/docling-project/docling-haystack
|
||||
[docs]: https://haystack.deepset.ai/integrations/docling
|
||||
[pypi]: https://pypi.org/project/docling-haystack
|
||||
[example]: ../examples/rag_haystack.ipynb
|
||||
|
@ -8,7 +8,7 @@ To get started, check out the [step-by-step guide in LangChain][guide].
|
||||
- 📦 [LangChain Docling integration PyPI][pypi]
|
||||
|
||||
[docs]: https://python.langchain.com/docs/integrations/providers/docling/
|
||||
[github]: https://github.com/DS4SD/docling-langchain
|
||||
[github]: https://github.com/docling-project/docling-langchain
|
||||
[guide]: https://python.langchain.com/docs/integrations/document_loaders/docling/
|
||||
[example]: ../examples/rag_langchain.ipynb
|
||||
[pypi]: https://pypi.org/project/langchain-docling/
|
||||
|
@ -1,7 +1,7 @@
|
||||
site_name: Docling
|
||||
site_url: https://ds4sd.github.io/docling/
|
||||
repo_name: DS4SD/docling
|
||||
repo_url: https://github.com/DS4SD/docling
|
||||
site_url: https://docling-project.github.io/docling/
|
||||
repo_name: docling-project/docling
|
||||
repo_url: https://github.com/docling-project/docling
|
||||
|
||||
theme:
|
||||
name: material
|
||||
|
@ -13,8 +13,8 @@ authors = [
|
||||
]
|
||||
license = "MIT"
|
||||
readme = "README.md"
|
||||
repository = "https://github.com/DS4SD/docling"
|
||||
homepage = "https://github.com/DS4SD/docling"
|
||||
repository = "https://github.com/docling-project/docling"
|
||||
homepage = "https://github.com/docling-project/docling"
|
||||
keywords = [
|
||||
"docling",
|
||||
"convert",
|
||||
|
@ -179,7 +179,7 @@ def test_guess_format(tmp_path):
|
||||
# Non-Docling JSON
|
||||
# TODO: Docling JSON is currently the single supported JSON flavor and the pipeline
|
||||
# will try to validate *any* JSON (based on suffix/MIME) as Docling JSON; proper
|
||||
# disambiguation seen as part of https://github.com/DS4SD/docling/issues/802
|
||||
# disambiguation seen as part of https://github.com/docling-project/docling/issues/802
|
||||
test_str = "{}"
|
||||
stream = DocumentStream(name="test.json", stream=BytesIO(f"{test_str}".encode()))
|
||||
assert dci._guess_format(stream) == InputFormat.JSON_DOCLING
|
||||
|
Loading…
x
Reference in New Issue
Block a user