kotaemon/docs/development/data-components.md
ian_Cin b507eef541
Improve manuals (#19)
* Rename Admin -> Resources
* Improve ui
* Update docs
2024-04-10 17:04:04 +07:00

995 B

Data & Data Structure Components

The data & data structure components include:

  • The Document class.
  • The document store.
  • The vector store.

Data Loader

  • PdfLoader

  • Layout-aware with table parsing PdfLoader

    • MathPixLoader: To use this loader, you need MathPix API key, refer to mathpix docs for more information

    • OCRLoader: This loader uses lib-table and Flax pipeline to perform OCR and read table structure from PDF file (TODO: add more info about deployment of this module).

    • Output:

      • Document: text + metadata to identify whether it is table or not

        - "source": source file name
        - "type": "table" or "text"
        - "table_origin": original table in markdown format (to be feed to LLM or visualize using external tools)
        - "page_label": page number in the original PDF document
        

Document Store

  • InMemoryDocumentStore

Vector Store

  • ChromaVectorStore
  • InMemoryVectorStore