mirror of
https://github.com/run-llama/llama-hub.git
synced 2025-08-13 03:01:46 +00:00
PyMuPDF Loader
This loader extracts text from a local PDF file using the PyMuPDF
Python library. This is the fastest among all other PDF parsing options available in loader_hub
. If metadata
is passed as True while calling load
function; extracted documents will include basic metadata such as page numbers, file path and total number of pages in pdf.
Usage
To use this loader, you need to pass file path of the local file as string or Path
when you call load
function. By default, including metadata is set to True. You can also pass extra information in a dict
format when you call load
function.
from pathlib import Path
from llama_index import download_loader
PyMuPDFReader = download_loader("PyMuPDFReader")
loader = PyMuPDFReader()
documents = loader.load(file_path=Path('./article.pdf'), metadata=True)
This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent. See here for examples.