docling/docs/examples/visual_grounding.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/visual_grounding.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visual grounding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Step | Tech | Execution | \n",
    "| --- | --- | --- |\n",
    "| Embedding | Hugging Face / Sentence Transformers | 💻 Local |\n",
    "| Vector store | Milvus | 💻 Local |\n",
    "| Gen AI | Hugging Face Inference API | 🌐 Remote | "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example showcases Docling's **visual grounding** capabilities, which can be combined\n",
    "with any agentic AI / RAG framework.\n",
    "\n",
    "In this instance, we illustrate these capabilities leveraging the\n",
    "[LangChain Docling integration](../../integrations/langchain/), along with a Milvus\n",
    "vector store, as well as sentence-transformers embeddings."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 👉 For best conversion speed, use GPU acceleration whenever available; e.g. if running on Colab, use GPU-enabled runtime.\n",
    "- Notebook uses HuggingFace's Inference API; for increased LLM quota, token can be provided via env var `HF_TOKEN`.\n",
    "- Requirements can be installed as shown below (`--no-warn-conflicts` meant for Colab's pre-populated Python env; feel free to remove for stricter usage):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "%pip install -q --progress-bar off --no-warn-conflicts langchain-docling langchain-core langchain-huggingface langchain_milvus langchain matplotlib python-dotenv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from pathlib import Path\n",
    "from tempfile import mkdtemp\n",
    "\n",
    "from dotenv import load_dotenv\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_docling.loader import ExportType\n",
    "\n",
    "\n",
    "def _get_env_from_colab_or_os(key):\n",
    "    try:\n",
    "        from google.colab import userdata\n",
    "\n",
    "        try:\n",
    "            return userdata.get(key)\n",
    "        except userdata.SecretNotFoundError:\n",
    "            pass\n",
    "    except ImportError:\n",
    "        pass\n",
    "    return os.getenv(key)\n",
    "\n",
    "\n",
    "load_dotenv()\n",
    "\n",
    "# https://github.com/huggingface/transformers/issues/5486:\n",
    "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
    "\n",
    "HF_TOKEN = _get_env_from_colab_or_os(\"HF_TOKEN\")\n",
    "SOURCES = [\"https://arxiv.org/pdf/2408.09869\"]  # Docling Technical Report\n",
    "EMBED_MODEL_ID = \"sentence-transformers/all-MiniLM-L6-v2\"\n",
    "GEN_MODEL_ID = \"mistralai/Mixtral-8x7B-Instruct-v0.1\"\n",
    "QUESTION = \"Which are the main AI models in Docling?\"\n",
    "PROMPT = PromptTemplate.from_template(\n",
    "    \"Context information is below.\\n---------------------\\n{context}\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: {input}\\nAnswer:\\n\",\n",
    ")\n",
    "TOP_K = 3\n",
    "MILVUS_URI = str(Path(mkdtemp()) / \"docling.db\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Document store setup\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Document loading\n",
    "\n",
    "We first define our converter, in this case including options for keeping page images (for visual grounding)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from docling.datamodel.base_models import InputFormat\n",
    "from docling.datamodel.pipeline_options import PdfPipelineOptions\n",
    "from docling.document_converter import DocumentConverter, PdfFormatOption\n",
    "\n",
    "converter = DocumentConverter(\n",
    "    format_options={\n",
    "        InputFormat.PDF: PdfFormatOption(\n",
    "            pipeline_options=PdfPipelineOptions(\n",
    "                generate_page_images=True,\n",
    "                images_scale=2.0,\n",
    "            ),\n",
    "        )\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We set up a simple doc store for keeping converted documents, as that is needed for visual grounding further below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "doc_store = {}\n",
    "doc_store_root = Path(mkdtemp())\n",
    "for source in SOURCES:\n",
    "    dl_doc = converter.convert(source=source).document\n",
    "    file_path = Path(doc_store_root / f\"{dl_doc.origin.binary_hash}.json\")\n",
    "    dl_doc.save_as_json(file_path)\n",
    "    doc_store[dl_doc.origin.binary_hash] = file_path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can instantiate our loader and load documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Token indices sequence length is longer than the specified maximum sequence length for this model (648 > 512). Running this sequence through the model will result in indexing errors\n"
     ]
    }
   ],
   "source": [
    "from langchain_docling import DoclingLoader\n",
    "\n",
    "from docling.chunking import HybridChunker\n",
    "\n",
    "loader = DoclingLoader(\n",
    "    file_path=SOURCES,\n",
    "    converter=converter,\n",
    "    export_type=ExportType.DOC_CHUNKS,\n",
    "    chunker=HybridChunker(tokenizer=EMBED_MODEL_ID),\n",
    ")\n",
    "\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://docling-project.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspecting some sample splits:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "- d.page_content='Docling Technical Report\\nVersion 1.0\\nChristoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar\\nAI4K Group, IBM Research R¨ uschlikon, Switzerland'\n",
      "- d.page_content='Abstract\\nThis technical report introduces Docling , an easy to use, self-contained, MITlicensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.'\n",
      "- d.page_content='1 Introduction\\nConverting PDF documents back into a machine-processable format has been a major challenge for decades due to their huge variability in formats, weak standardization and printing-optimized characteristic, which discards most structural features and metadata. With the advent of LLMs and popular application patterns such as retrieval-augmented generation (RAG), leveraging the rich content embedded in PDFs has become ever more relevant. In the past decade, several powerful document understanding solutions have emerged on the market, most of which are commercial software, cloud offerings [3] and most recently, multi-modal vision-language models. As of today, only a handful of open-source tools cover PDF conversion, leaving a significant feature and quality gap to proprietary solutions.\\nWith Docling , we open-source a very capable and efficient document conversion tool which builds on the powerful, specialized AI models and datasets for layout analysis and table structure recognition we developed and presented in the recent past [12, 13, 9]. Docling is designed as a simple, self-contained python library with permissive license, running entirely locally on commodity hardware. Its code architecture allows for easy extensibility and addition of new features and models.\\nHere is what Docling delivers today:\\n· Converts PDF documents to JSON or Markdown format, stable and lightning fast\\n· Understands detailed page layout, reading order, locates figures and recovers table structures\\n· Extracts metadata from the document, such as title, authors, references and language\\n· Optionally applies OCR, e.g. for scanned PDFs\\n· Can be configured to be optimal for batch-mode (i.e high throughput, low time-to-solution) or interactive mode (compromise on efficiency, low time-to-solution)\\n· Can leverage different accelerators (GPU, MPS, etc).'\n",
      "...\n"
     ]
    }
   ],
   "source": [
    "for d in docs[:3]:\n",
    "    print(f\"- {d.page_content=}\")\n",
    "print(\"...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ingestion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "from tempfile import mkdtemp\n",
    "\n",
    "from langchain_huggingface.embeddings import HuggingFaceEmbeddings\n",
    "from langchain_milvus import Milvus\n",
    "\n",
    "embedding = HuggingFaceEmbeddings(model_name=EMBED_MODEL_ID)\n",
    "\n",
    "\n",
    "milvus_uri = str(Path(mkdtemp()) / \"docling.db\")  # or set as needed\n",
    "vectorstore = Milvus.from_documents(\n",
    "    documents=docs,\n",
    "    embedding=embedding,\n",
    "    collection_name=\"docling_demo\",\n",
    "    connection_args={\"uri\": milvus_uri},\n",
    "    index_params={\"index_type\": \"FLAT\"},\n",
    "    drop_old=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## RAG"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.\n"
     ]
    }
   ],
   "source": [
    "from langchain.chains import create_retrieval_chain\n",
    "from langchain.chains.combine_documents import create_stuff_documents_chain\n",
    "from langchain_huggingface import HuggingFaceEndpoint\n",
    "\n",
    "retriever = vectorstore.as_retriever(search_kwargs={\"k\": TOP_K})\n",
    "llm = HuggingFaceEndpoint(\n",
    "    repo_id=GEN_MODEL_ID,\n",
    "    huggingfacehub_api_token=HF_TOKEN,\n",
    ")\n",
    "\n",
    "\n",
    "def clip_text(text, threshold=100):\n",
    "    return f\"{text[:threshold]}...\" if len(text) > threshold else text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/pva/work/github.com/DS4SD/docling/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'post' (from 'huggingface_hub.inference._client') is deprecated and will be removed from version '0.31.0'. Making direct POST requests to the inference server is not supported anymore. Please use task methods instead (e.g. `InferenceClient.chat_completion`). If your use case is not supported, please open an issue in https://github.com/huggingface/huggingface_hub.\n",
      "  warnings.warn(warning_message, FutureWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question:\n",
      "Which are the main AI models in Docling?\n",
      "\n",
      "Answer:\n",
      "The main AI models in Docling are:\n",
      "1. A layout analysis model, an accurate object-detector for page elements.\n",
      "2. TableFormer, a state-of-the-art table structure recognition model.\n"
     ]
    }
   ],
   "source": [
    "from docling.chunking import DocMeta\n",
    "from docling.datamodel.document import DoclingDocument\n",
    "\n",
    "question_answer_chain = create_stuff_documents_chain(llm, PROMPT)\n",
    "rag_chain = create_retrieval_chain(retriever, question_answer_chain)\n",
    "resp_dict = rag_chain.invoke({\"input\": QUESTION})\n",
    "\n",
    "clipped_answer = clip_text(resp_dict[\"answer\"], threshold=200)\n",
    "print(f\"Question:\\n{resp_dict['input']}\\n\\nAnswer:\\n{clipped_answer}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visual grounding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Source 1:\n",
      "  text: \"3.2 AI models\\nAs part of Docling, we initially release two highly capable AI models to the open-source community, which have been developed and published recently by our team. The first model is a layout analysis model, an accurate object-detector for page elements [13]. The second model is TableFormer [12, 9], a state-of-the-art table structure re...\"\n",
      "  page: 3\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA5AAAASXCAYAAACX7pASAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsnQWcXNX1x8/MrEvc3T1BEiBAgOBe3KFAkRZpcW9LoRSnAhRpC8WKyx93De6QhBB3d1nfnfl/vnf3LpNhk8zK7M7s/r7tkJ2ZN+/dd+99753fPeeeG4hEIhETQgghhBBCCCG2QHBLGwghhBBCCCGEECABKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLiQghRBCCCGEEELEhQSkEEIIIYQQQoi4kIAUQgghhBBCCBEXEpBCCCGEEEIIIeJCAlIIIYQQQgghRFxIQAohhBBCCCGEiAsJSCGEEEIIIYQQcSEBKYQQQgghhBAiLtLi20wIIUSiiUQiFo5ELBgIWCAQcO/Lw2ErLClznwfYxswy00KWnZHutuHzopIyK6uoqN4Pn+dmZlhaqHKMsKSs3IpKyzY6VmZ6mmWlp1XuIxy2gpJSqwiz98pjpAWDlpOZbqFg5T6KS8usqKzclcFDGTLSQlX7iNi6ouKfzsXM0oNBy83KqD4XylBa/lM5IS8r05Vzs+eameHe83lh1bn6crDvvKwMV072wf5jz5Xz5HzZtqLqGPzrCcVxrnyfHqo8V367objUHa/6XEPsI8O1nT/XkvLKcvI9n7MP6hWo6w3FJRvVF3WZE9WuhSWlVlYR/ulcLWD52ZkWDAY2Old+G4g616yM9KpjVJ4r9eq/5xxzM9MtWFVfxWXlrn/4MrAd5cxIqzQPyisq+4Y/V6AestlH1bkWVrVrdJtwDI7F37RXQXHpRueamZZm2RmVbSKEECK1kIAUQogmBiN82doC+3rWAvtx4XI7dIfh1rdTO/fd9EUr7N43P7N1hZViA6G22/B+9svx21haKOQM8wff/cq+mrWwen/t83Ls7P3HWr/O7d37z6fPtwff+7paNGH4HzxmqP1iu2GG/b6moNhuf+Ujm7NsdaXYM7N+ndrZb/bdwTq1znO/eXviDHvyo4lOvFiVsDt+121s3JA+bh/ri0rsikdec4LElTMSseE9O9nvDtjZCRreP/7h9/bBlFlOCEFaMGCXHTreBnTr4N5PW7TC7nnjM9tQVHWukYjtNqyvnbrHGFSJO9f73/7Cvpm9qFp4tM7OtIsP3c16tG/t3n8xY7498sG31aIIDh87wg4cPcQdddX6Qrv79U9t9rLV1d/369zOztx7e+vcJt+9f/3b6fbcZ5Oqj8G5njR+W9tpcG/3fumaDXbTc+/Zuqpy0n7b9utmp+25vRPM8PhH39v7k2ZVij0nlDPsnP12tCHdO7rv5y5fYzc++64Td/5cd606V46KMLz3jc9s4rwl1fWFKLv2uH2sXV6O2/6z6fPt0Qk/nSvlOGqnUe5cYcW6Qrvn9U9tzvKfzrVPx7Z27v47WvtWuYYmfO2bafbSl1Pc/iA9LWin7D6m+lwXrFpr/3jpI1tTUFS9j9H9utvJu492YhYe++Bb+2DKHAtV9Y18znX/nWxQVbvOWLTSbntxQnX/ow/vvdUAO3bc1pYWClhFRdgmzV9qHVrlWte2+a5/CiGESF4kIIUQoonA4Me789rX0+yh979yIrJnh9a2z9YDq7dBkAzu3smKSkqrje/u7VpViwq8PD07tLGS8irBFDHLz8ly3kFP27xsG96zs4UjlQY8wsgJwyo7PT0tZAO6tHfeQG+7I6bwiHk6tMqzEb06VwtIxGvb3OzqfeBFHNqjs5VVeDFj1qtjG+fpcsc0s+7tW9mIXl2qPVXOK1cluKxKZA3t3rHag4hDtHu7SmHojhEMWu+ObSq9clU7wWOXlfHTo6xNbo4N69FpI49sx1a51X9zTghGPLTePdi1Hef60z46tc6tLGfUubXhXKvAmzmkRyfnIaw+1w5tqwUU0Ea+vpBm2enp1eIS8L4Ni2oT9tHDtWsloQDn2tZ97svBcdODoeo2pF2jz5VtY8+1f5d2lpddddwI7Zrn2ts3SufWuTa8V+dqDyN13CYn66dypqfb4O4dfvIgRsx6dmxT7a1159q+tY2M6hv0PbyxHs6bwQQvUvm3a9tW1edF373thQ9c/z9ul63syLEjnRdU3kkhhEhOApHouBQhhBCNBob0uxNnOuN5t+F97dDtR1jfzu1cOKQP+9wcW9rGG+Bb2qYx9tHUZWis4zSX+op/H26reh2H62DWkpX2whdTnOcX8X7+QeOcB1MiUgghkg8JSCGEaCK4/a7aUGRL1qx3HkA8TEK0VErLyu3LWQvt7lc/sSN3HGkHbTd0I0+nEEKI5EACUgghGhFuuSRQIXTRJ2URQlSCN3JdYbELv/WJooQQQiQXGtoTQohGhBG7r2YusKc/nvi
      "text/plain": [
       "<Figure size 1500x1500 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Source 2:\n",
      "  text: \"3 Processing pipeline\\nDocling implements a linear pipeline of operations, which execute sequentially on each given document (see Fig. 1). Each document is first parsed by a PDF backend, which retrieves the programmatic text tokens, consisting of string content and its coordinates on the page, and also renders a bitmap image of each page to support ...\"\n",
      "  page: 2\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA5AAAASXCAYAAACX7pASAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qm4vtX0P/7b1yyzJBVJEoUIDUQDmpTmUUUDFUqTijJGkWZFg1RUpJLmNMgQRUgDIhmiTMk8+379r9f+Xevzv90955znM3+q9/u6znXOeZ773nvttdfe93rvtfa+7/ef//znP10QBEEQBEEQBEEQTIH/meqCIAiCIAiCIAiCIIAQyCAIgiAIgiAIgmAshEAGQRAEQRAEQRAEYyEEMgiCIAiCIAiCIBgLIZBBEARBEARBEATBWAiBDIIgCIIgCIIgCMZCCGQQBEEQBEEQBEEwFkIggyAIgiAIgiAIgrEQAhkEQRAEQRAEQRCMhRDIIAiCIAiCIAiCYCyEQAZBEARBEARBEARjIQQyCIIgCIIgCIIgGAshkEEQBEEQBEEQBMFYCIEMgiAIgiAIgiAIxkIIZBAEQRAEQRAEQTAWQiCDIAiCIAiCIAiCsRACGQRBEARBEARBEIyFEMggCIIgCIIgCIJgLIRABkEQBEEQBEEQBGMhBDIIgiAIgiAIgiAYCyGQQRAEQRAEQRAEwVgIgQyCIAiCIAiCIAjGQghkEARBEARBEARBMBZCIIMgCIIgCIIgCIKxEAIZBEEQBEEQBEEQjIUQyCAIgiAIgiAIgmAshEAGQRAEQRAEQRAEYyEEMgiCIAiCIAiCIBgLIZBBEARBEARBEATBWAiBDIIgCIIgCIIgCMZCCGQQBEEQBEEQBEEwFkIggyAIgiAIgiAIgrEQAhkEQRAEQRAEQRCMhRDIIAiCIAiCIAiCYCyEQAZBEARBEARBEARjIQQyCIIgCIIgCIIgGAshkEEQBEEQBEEQBMFYCIEMgiAIgiAIgiAIxkIIZBAEQRAEQRAEQTAWQiCDIAiCIAiCIAiCsRACGQRBEARBEARBEIyFEMggCIIgCIIgCIJgLIRABkEQBEEQBEEQBGMhBDIIgiAIgiAIgiAYCyGQQRAEQRAEQRAEwVgIgQyCIAiCIAiCIAjGQghkEARBEARBEARBMBZCIIMgCIIgCIIgCIKxEAIZBEEQBEEQBEEQjIUQyCAIgiAIgiAIgmAshEAGQRAEQRAEQRAEYyEEMgiCIAiCIAiCIBgLIZBBEARBEARBEATBWAiBDIIgCIIgCIIgCMZCCGQQBEEQBEEQBEEwFkIggyAIgiAIgiAIgrEQAhkEQRAEQRAEQRCMhRDIIAiCIAiCIAiCYCyEQAZBEARBEARBEARjIQQyCIIgCIIgCIIgGAshkEEQBEEQBEEQBMFYCIEMgiAIgiAIgiAIxkIIZBAEQRAEQRAEQTAWQiCDIAiCIAiCIAiCsRACGQRBEARBEARBEIyFEMggCIIgCIIgCIJgLIRABkEQBEEQBEEQBGMhBDIIgiAIgiAIgiAYCyGQQRAEQRAEQRAEwVgIgQyCIAiCIAiCIAjGQghkEARBEARBEARBMBZCIIMgCIIgCIIgCIKxEAIZBEEQBEEQBEEQjIUQyCAIgiAIgiAIgmAshEAGQRAEQRAEQRAEYyEEMgiCIAiCIAiCIBgLIZBBEARBEARBEATBWAiBDIIgCIIgCIIgCMZCCGQQBEEQBEEQBEEwFkIggyAIgiAIgiAIgrEQAhkEQRAEQRAEQRCMhRDIIAiCIAiCIAiCYCyEQAZBEARBEARBEARjIQQyCIIgCIIgCIIgGAshkEEQBEEQBEEQBMFYCIEMgiAIgiAIgiAIxkIIZBAEQRAEQRAEQTAWQiCDIAiCIAiCIAiCsRACGQRBEARBEARBEIyFEMggCIIgCIIgCIJgLIRABkEQBEEQBEEQBGMhBDIIgiAIgiAIgiAYCyGQQRAEQRAEQRAEwVgIgQyCIAiCIAiCIAjGQghkEARBEARBEARBMBZCIIMgCIIgCIIgCIKxEAIZBEEQBEEQBEEQjIUQyCAIgiAIgiAIgmAshEAGQRAEQRAEQRAEYyEEMgiCIAiCIAiCIBgLIZBBEARBEARBEATBWAiBDIIgCIIgCIIgCMZCCGQQBEEQBEEQBEEwFkIggyAIgiAIgiAIgrEQAhkEQRAEQRAEQRCMhRDIIAiCIAiCIAiCYCyEQAZBEARBEARBEARjIQQyCIIgCIIgCIIgGAshkEEQBEEQBEEQBMFYCIEMgiAIgiAIgiAIxkIIZBAEQRAEQRAEQTAWQiCDIAiCIAiCIAiCsRACGQRBEARBEARBEIyFEMggCIIgCIIgCIJgLIRABkEQBEEQBEEQBGMhBDIIgiAIgiAIgiAYCyGQQRAEQRAEQRAEwVgIgQyCIAiCIAiCIAjGwgPGuywI5jz+8pe/dD//+c+7//3f/+3+7//+r3vIQx7SLbbYYt1f//rX7mc/+1m7xucPf/jDuyc96Und/e9//1la/3/+85/uRz/6UZNhxRVX7B70oAfN0rL/9Kc/tfIf9rCHdUsssUR3v/vdb6bL/dvf/tb9+Mc/buUD3T3wgQ/sHve4x7Wf//mf/7dmNKN1KffOO+/sfvKTn3SPfexju6c+9andP/7xj+6qq67qllxyyW6RRRaZJe0YQj//5je/afWygQUWWGBC+e66667upz/9afeIRzyiW3zxxbu///3v3TXXXNM97WlPa3YyO+Sb1dAOfcnO/V5qqaVmqf0FQRAEQRDMKBKBDOZZID/f+973uo033rjba6+9GpHjWP/rX//qvva1r3Xrrrtu9453vKMRhiJMsxL//ve/u8suu6w75JBDWh2zGt/61re6nXfeuTvrrLOmW36y+Rn1+Te+8Y1u66237nbZZZfu85//fHf22Wd3b3vb27o999yz+8pXvtII34zCvZdffnm3ww47dJ/73OfaZ7/+9a+7Aw88sJU9u6DeT33qU91rXvOaZhMTgW0gs+T7zGc+04jnL3/5y+6www7rvvCFL3Rzy47JNT19TO5bb72123333bsjjjhipvosCIIgCIJgViIEMphn8chHPrJbddVVuwc84AHdE5/4xO55z3te+1vk62Uve1lzsp/+9Kd3z3rWs9rnsxrK3GijjRqBfPzjHz/Ly3/GM57RykUuppeQXH311d0tt9xyt+9E3VZYYYUWrRLV3HXXXbu3vvWtjWjT5xve8IbujDPOmGFC8uAHP7hFY/VBEdiFFlqoO/roo7s11lhjtkX3RJ+XX3759lu/TwTRVvKJUNITiDp+8IMf7NZZZ505Hn0k63e/+922WDCZ3EOIFIs6ivD+85//nK0yBkEQBEEQTA+SwhrM80DkpKf2nX9/+6zSVkV3Ktr0u9/9rqVSPupRj2rfIWh//vOfG7n41a9+1QjPQx/60O6Pf/xji2oiVgsvvHArs+qoaNFjHvOY9jeyxKn3+xe/+EUrb8EFF2z3Fnmt69SBvMw333ztHtcCedQrYkeWaoPff/jDH7o77rijEZ/5559/Wnnacvvtt7d6tEl53/72t7sDDjigReNci8z103erHa6t309+8pO7PfbYo6VEfuhDH2qke9lll211/P73v296kApMD8in+yqNUv3+RsQQuL7eQfmLLrrotCibfqBbZFOarjY85SlPaf8X1Km9/dTakregLD9SV+lMSnOl4Nb31Yd0q1+HdgH+1n6kWXuU5159oW5l+EG+9bfrpMCqi9z61/VSYfWvFGptVR+Z6JTu9J0y+/L57qCDDuqWXnrppls2o1z1+Y5O1FH24B6EUWqz74btrbRuf2uT+qQUa5f7ye8z4wDIRGZ94G9tuSek8AZBEARBMO8iBDK4R4ATzOmuKA4nvggGcKwvuuiiRkw4/9Ic999//xbhkwKIrIj4SWPcZ599mqP99a9/vTnT0lTXX3/9bvPNN/8vAsAJ/9jHPtZSMz/84Q83h126KVLE6f/EJz7Rbbfddi3SV0Ayjj322FbPUUcd1crbd999G1F517ve1QiB+0RURezAnkUplt/85jdbexA8e/e
      "text/plain": [
       "<Figure size 1500x1500 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Source 3:\n",
      "  text: \"6 Future work and contributions\\nDocling is designed to allow easy extension of the model library and pipelines. In the future, we plan to extend Docling with several more models, such as a figure-classifier model, an equationrecognition model, a code-recognition model and more. This will help improve the quality of conversion for specific types of ...\"\n",
      "  page: 5\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA5AAAASXCAYAAACX7pASAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs/QWUrUtxPv5vEgJRIhAPhBBIIDgED+7u7u7uBJeLWyC4u7u7BZeE4BpCEkKEEPfwX5/+/577bV72nrPPOXPOnXPv86w1a2b2ft+Wqurqqq7q7uN973vf+96qKIqiKIqiKIqiKPaBH9rXA0VRFEVRFEVRFEUBdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKoiiKrVAHsiiKoiiKoiiKotgKdSCLoiiKoiiKYoHvfe9746coiu/H8Rf/F8WxHiaDf/qnf1p97WtfW/30T//06uQnP/nqh37omF1L+ed//ufVxz/+8dVv//Zvr375l395dbzjHe8YocdP/dRPrX7zN3/zoOpX3r/927+t/uIv/mL1L//yL6vTnva0qx/90R/9vmf+7u/+bvXJT35ydfazn331Mz/zM7vQi81tQds/+7M/W53whCdc/dZv/dYB9+2//uu/Vt/85jdX//Ef/zH+/9///d/V8Y9//CFDP//zP7/6kR/5kYOmm7aq47//+79Xpz/96cdnn/jEJ1Y/8RM/sTrNaU6z+uEf/uHV4cK//uu/rj7ykY+sTnWqU61OetKTHlAZ//d//7f627/929U3vvGN1S/90i+NcvRRuWc4wxlWv/iLv7g61EDDf/iHf1j99V//9fgfDY0xMvjv//7v4zPy+Wu/9muDn5Fbz2kv2u8m1Gms/8qv/MrqFKc4xaDHn//5n492Gv//+Z//OWTAc2c605n2yXNy+O1vf3u0+5SnPOXqZ3/2Z9fKIV585zvfGbzwDL2Hx8eU3jkYXfX1r3998IWuOqZ1927hf/7nf1Z/+Zd/ufrWt741+L7UmXSCcfMLv/ALg8/HRL/pwE996lNjnqCPDqe8kPMPf/jDq5/7uZ8b8nqo+0/W0PyrX/3q6vOf//zQX2c961lXJzjBCfb8OCmKw4Fjh+YtjrWIEmf87BaUxWi6wQ1usHrjG9+4q2XvT78YDCZFYPw96lGPGhPkMYEPfehDq5vd7GarV7/61Qe92oqenNF73OMeq4c97GHDQF7iM5/5zPjuy1/+8upQAF3Jjd8csBvf+Mar17zmNQfVN/1i2N/mNrdZXfe61x3lvf71r1895CEPWd3qVrdave51rxt9PdA6yMOf/umfrm5729uunvSkJ432cyae+tSnrl75yleO/w8nGLOPe9zjVh/4wAcOuAxO0Fve8pYx1t7//vcP2pCNRz7ykcMQPVzgED75yU9eXeEKVxg8Q0vtCC8tMJCVtO9Od7rT6n3ve9/R43M3waH+wz/8w0EX5TNOw3MOHRm44x3vuHrMYx6zVf0Wa1784hcPGd9pPHkOL/VX3ZFnvPjoRz+6OhKgzR/72MdWN7nJTVavetWrxpjZBLz0PF7v1QjSPL9997vfHeONLPzjP/7j2rHke3PWTv3eVA/n72DpoF3PfOYzhz7abZrq0079snD3+Mc/fujdw6ELtYWuePe73736+7//+zFfWXzZCbttqxTFXkYdyGJPIwaoyXW3YOXyLGc5y1iFP9xGecAwfOc73zlWm+E3fuM3Vo9+9KNXF77whY+R9ogGnfjEJ95vw2QTfX/nd35nrFBzgNYZGr/7u7+7+oM/+IMRZdttmMA/+9nPrv74j/94rBSLgFq1PlheiwiImIo2WoHnFN31rnddPehBD1qd5zznWd3nPvdZPeIRjxhG1oEYV6Jf5FJkSFu1/cd+7MdGuTe96U3HyvfhhHY8/OEPX13iEpc44DJ+/Md/fHX+859/RGnjDJ361KcezhGaHQ6g48lOdrLV9a53vSEbDNGf/MmfHPVf5CIXGVE5/NRWPPjVX/3VIZdXvepVVyc60Yl2vT0ifQ984ANXV7va1UZ0Ec9FIvHc2FG
      "text/plain": [
       "<Figure size 1500x1500 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from PIL import ImageDraw\n",
    "\n",
    "for i, doc in enumerate(resp_dict[\"context\"][:]):\n",
    "    image_by_page = {}\n",
    "    print(f\"Source {i + 1}:\")\n",
    "    print(f\"  text: {json.dumps(clip_text(doc.page_content, threshold=350))}\")\n",
    "    meta = DocMeta.model_validate(doc.metadata[\"dl_meta\"])\n",
    "\n",
    "    # loading the full DoclingDocument from the document store:\n",
    "    dl_doc = DoclingDocument.load_from_json(doc_store.get(meta.origin.binary_hash))\n",
    "\n",
    "    for doc_item in meta.doc_items:\n",
    "        if doc_item.prov:\n",
    "            prov = doc_item.prov[0]  # here we only consider the first provenence item\n",
    "            page_no = prov.page_no\n",
    "            if img := image_by_page.get(page_no):\n",
    "                pass\n",
    "            else:\n",
    "                page = dl_doc.pages[prov.page_no]\n",
    "                print(f\"  page: {prov.page_no}\")\n",
    "                img = page.image.pil_image\n",
    "                image_by_page[page_no] = img\n",
    "            bbox = prov.bbox.to_top_left_origin(page_height=page.size.height)\n",
    "            bbox = bbox.normalized(page.size)\n",
    "            thickness = 2\n",
    "            padding = thickness + 2\n",
    "            bbox.l = round(bbox.l * img.width - padding)\n",
    "            bbox.r = round(bbox.r * img.width + padding)\n",
    "            bbox.t = round(bbox.t * img.height - padding)\n",
    "            bbox.b = round(bbox.b * img.height + padding)\n",
    "            draw = ImageDraw.Draw(img)\n",
    "            draw.rectangle(\n",
    "                xy=bbox.as_tuple(),\n",
    "                outline=\"blue\",\n",
    "                width=thickness,\n",
    "            )\n",
    "    for p in image_by_page:\n",
    "        img = image_by_page[p]\n",
    "        plt.figure(figsize=[15, 15])\n",
    "        plt.imshow(img)\n",
    "        plt.axis(\"off\")\n",
    "        plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}