mirror of
https://github.com/run-llama/llama-hub.git
synced 2025-08-14 11:41:56 +00:00
1.1 KiB
1.1 KiB
Flat PDF Loader
This loader extracts the text from a local flat PDF file using the PyMuPDF
Python package and image loader. A single local file is passed in each time you call load_data
.
Usage
To use this loader, you need:
- Download
ImageReader
andFlatPdfReader
usingdownload_loader
- Init a
ImageReader
- Init a
FlatPdfReader
and passImageReader
on init - Pass a
Path
to a local file in methodload_data
.
from pathlib import Path
from llama_index import download_loader
ImageReader = download_loader("ImageReader")
imageLoader = ImageReader(text_type="plain_text")
FlatPdfReader = download_loader("FlatPdfReader")
pdfLoader = FlatPdfReader(image_loader=imageLoader)
document = pdfLoader.load_data(file=Path('./file.pdf'))
This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent. See here for examples.