2023-05-28 11:42:42 -07:00
..
2023-05-21 09:13:35 -07:00
2023-05-22 20:50:36 -07:00
2023-03-13 18:22:08 -07:00
2023-05-21 09:13:35 -07:00
2023-02-22 09:43:29 -08:00
2023-02-03 00:05:28 -08:00

File Loader

This loader takes in a local directory containing files and extracts Documents from each of the files. By default, the loader will utilize the specialized loaders in this library to parse common file extensions (e.g. .pdf, .png, .docx, etc). You can optionally pass in your own custom loaders. Note: if no loader is found for a file extension, and the file extension is not in the list to skip, the file will be read directly.

Usage

To use this loader, you simply need to instantiate the SimpleDirectoryReader class with a directory, along with other optional settings, such as whether to ignore hidden files. See the code for the complete list.

from llama_index import download_loader

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")

loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)
documents = loader.load_data()

Examples

This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent.

LlamaIndex

from llama_index import GPTVectorStoreIndex, download_loader

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")

loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)
documents = loader.load_data()
index = GPTVectorStoreIndex.from_documents(documents)
index.query('What are these files about?')

LangChain

Note: Make sure you change the description of the Tool to match your use-case.

from llama_index import GPTVectorStoreIndex, download_loader
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.chains.conversation.memory import ConversationBufferMemory

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")

loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)
documents = loader.load_data()
index = GPTVectorStoreIndex.from_documents(documents)

tools = [
    Tool(
        name="Local Directory Index",
        func=lambda q: index.query(q),
        description=f"Useful when you want answer questions about the files in your local directory.",
    ),
]
llm = OpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = initialize_agent(
    tools, llm, agent="zero-shot-react-description", memory=memory
)

output = agent_chain.run(input="What are these files about?")