llama-hub/loader_hub/file/README.md

# File Loader

This loader takes in a local directory containing files and extracts `Document`s from each of the files. By default, the loader will utilize the specialized loaders in this library to parse common file extensions (e.g. .pdf, .png, .docx, etc). You can optionally pass in your own custom loaders. Note: if no loader is found for a file extension, and the file extension is not in the list to skip, the file will be read directly.

## Usage

To use this loader, you simply need to instantiate the `SimpleDirectoryReader` class with a directory, along with other optional settings, such as whether to ignore hidden files. See the code for the complete list.

```python
from llama_index import download_loader

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")

loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)
documents = loader.load_data()
```

## Examples

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/jerryjliu/gpt_index/tree/main/gpt_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent.

### LlamaIndex

```python
from llama_index import GPTVectorStoreIndex, download_loader

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")

loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)
documents = loader.load_data()
index = GPTVectorStoreIndex.from_documents(documents)
index.query('What are these files about?')
```

### LangChain

Note: Make sure you change the description of the `Tool` to match your use-case.

```python
from llama_index import GPTVectorStoreIndex, download_loader
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.chains.conversation.memory import ConversationBufferMemory

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")

loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)
documents = loader.load_data()
index = GPTVectorStoreIndex.from_documents(documents)

tools = [
    Tool(
        name="Local Directory Index",
        func=lambda q: index.query(q),
        description=f"Useful when you want answer questions about the files in your local directory.",
    ),
]
llm = OpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = initialize_agent(
    tools, llm, agent="zero-shot-react-description", memory=memory
)

output = agent_chain.run(input="What are these files about?")
```
Added files 2023-02-01 17:35:33 -08:00			`# File Loader`

Fixed the READMEs 2023-02-05 09:48:56 -08:00			This loader takes in a local directory containing files and extracts `Document`s from each of the files. By default, the loader will utilize the specialized loaders in this library to parse common file extensions (e.g. .pdf, .png, .docx, etc). You can optionally pass in your own custom loaders. Note: if no loader is found for a file extension, and the file extension is not in the list to skip, the file will be read directly.
Added files 2023-02-01 17:35:33 -08:00
			`## Usage`

			To use this loader, you simply need to instantiate the `SimpleDirectoryReader` class with a directory, along with other optional settings, such as whether to ignore hidden files. See the code for the complete list.

			```python
swap out gpt_index imports for llama_index imports (#49) * cr * cr * cr --------- Co-authored-by: Jerry Liu <jerry@robustintelligence.com> Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com> 2023-02-20 21:46:58 -08:00			`from llama_index import download_loader`
Added files 2023-02-01 17:35:33 -08:00
Small bug fixes 2023-02-09 00:40:45 -08:00			`SimpleDirectoryReader = download_loader("SimpleDirectoryReader")`
Fixed the READMEs 2023-02-05 09:48:56 -08:00
			`loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)`
Added files 2023-02-01 17:35:33 -08:00			`documents = loader.load_data()`
			```

			`## Examples`

swap out gpt_index imports for llama_index imports (#49) * cr * cr * cr --------- Co-authored-by: Jerry Liu <jerry@robustintelligence.com> Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com> 2023-02-20 21:46:58 -08:00			`This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/jerryjliu/gpt_index/tree/main/gpt_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent.`
Added files 2023-02-01 17:35:33 -08:00
swap out gpt_index imports for llama_index imports (#49) * cr * cr * cr --------- Co-authored-by: Jerry Liu <jerry@robustintelligence.com> Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com> 2023-02-20 21:46:58 -08:00			`### LlamaIndex`
Added files 2023-02-01 17:35:33 -08:00
			```python
Update after refactoring away parsers in LlamaIndex, also update docs to 0.6.0 API (#264) 2023-05-16 23:26:33 -04:00			`from llama_index import GPTVectorStoreIndex, download_loader`
Fixed the READMEs 2023-02-05 09:48:56 -08:00
Small bug fixes 2023-02-09 00:40:45 -08:00			`SimpleDirectoryReader = download_loader("SimpleDirectoryReader")`
Added files 2023-02-01 17:35:33 -08:00
Fixed the READMEs 2023-02-05 09:48:56 -08:00			`loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)`
Added files 2023-02-01 17:35:33 -08:00			`documents = loader.load_data()`
Update after refactoring away parsers in LlamaIndex, also update docs to 0.6.0 API (#264) 2023-05-16 23:26:33 -04:00			`index = GPTVectorStoreIndex.from_documents(documents)`
Added files 2023-02-01 17:35:33 -08:00			`index.query('What are these files about?')`
			```

			`### LangChain`

			Note: Make sure you change the description of the `Tool` to match your use-case.

			```python
Update after refactoring away parsers in LlamaIndex, also update docs to 0.6.0 API (#264) 2023-05-16 23:26:33 -04:00			`from llama_index import GPTVectorStoreIndex, download_loader`
Added files 2023-02-01 17:35:33 -08:00			`from langchain.agents import initialize_agent, Tool`
			`from langchain.llms import OpenAI`
			`from langchain.chains.conversation.memory import ConversationBufferMemory`

Small bug fixes 2023-02-09 00:40:45 -08:00			`SimpleDirectoryReader = download_loader("SimpleDirectoryReader")`
Fixed the READMEs 2023-02-05 09:48:56 -08:00
			`loader = SimpleDirectoryReader('./data', recursive=True, exclude_hidden=True)`
Added files 2023-02-01 17:35:33 -08:00			`documents = loader.load_data()`
Update after refactoring away parsers in LlamaIndex, also update docs to 0.6.0 API (#264) 2023-05-16 23:26:33 -04:00			`index = GPTVectorStoreIndex.from_documents(documents)`
Added files 2023-02-01 17:35:33 -08:00
			`tools = [`
			`Tool(`
			`name="Local Directory Index",`
			`func=lambda q: index.query(q),`
			`description=f"Useful when you want answer questions about the files in your local directory.",`
			`),`
			`]`
			`llm = OpenAI(temperature=0)`
			`memory = ConversationBufferMemory(memory_key="chat_history")`
			`agent_chain = initialize_agent(`
			`tools, llm, agent="zero-shot-react-description", memory=memory`
			`)`

			`output = agent_chain.run(input="What are these files about?")`
			```