llama-hub/README.md

# Llama Hub 🦙

This is a simple library of all the data loaders / readers that have been created by the community. The goal is to make it extremely easy to connect large language models to a large variety of knowledge sources. These are general-purpose utilities that are meant to be used in [GPT Index](https://github.com/jerryjliu/gpt_index/tree/main/gpt_index) (e.g. when building a index) and [LangChain](https://github.com/hwchase17/langchain) (e.g. when building different tools an agent can use). For example, there are loaders to parse Google Docs, SQL Databases, PDF files, PowerPoints, Notion, Slack, Obsidian, and many more. Note that because different loaders produce the same types of Documents, you can easily use them together in the same index.

## Usage

These general-purpose loaders are designed to be used as a way to load data into [GPT Index](https://github.com/jerryjliu/gpt_index/tree/main/gpt_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent. **You can use them with `download_loader` from GPT Index in a single line of code!** For example, see the code snippets below using the Google Docs Loader.

### GPT Index

```python
from gpt_index import GPTSimpleVectorIndex, download_loader

GoogleDocsReader = download_loader('GoogleDocsReader')

gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTSimpleVectorIndex(documents)
index.query('Where did the author go to school?')
```

### LangChain

Note: Make sure you change the description of the `Tool` to match your use-case.

```python
from gpt_index import GPTSimpleVectorIndex, download_loader
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.chains.conversation.memory import ConversationBufferMemory

GoogleDocsReader = download_loader('GoogleDocsReader')

gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTSimpleVectorIndex(documents)

tools = [
    Tool(
        name="Google Doc Index",
        func=lambda q: index.query(q),
        description=f"Useful when you want answer questions about the Google Documents.",
    ),
]
llm = OpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = initialize_agent(
    tools, llm, agent="zero-shot-react-description", memory=memory
)

output = agent_chain.run(input="Where did the author go to school?")
```

## How to add a loader

Adding a loader simply requires forking this repo and making a Pull Request. The Loader Hub website will update automatically. However, please keep in the mind the following guidelines when making your PR.

### Step 1: Create a new directory

In `loader_hub`, create a new directory for your new loader. It can be nested within another, but name it something unique because the name of the directory will become the identifier for your loader (e.g. `google_docs`). Inside your new directory, create a `__init__.py` file, which can be empty, a `base.py` file which will contain your loader implementation, and, if needed, a `requirements.txt` file to list the package dependencies of your loader.

If you'd like, you can create the new directory and files by running the following script.

```
./loader_hub/add_loader.sh [NAME_OF_NEW_DIRECTORY]
```

Make sure to put your dependencies into a `requirements.txt` file in the new directory so the required packages may be automatically installed when your loader is downloaded.

### Step 2: Write your README

Inside your new directory, create a `README.md` that mirrors that of the existing ones. It should have a summary of what your loader does, its inputs, and how its used in the context of GPT Index and LangChain.

### Step 3: Add your loader to the library

Finally, add your loader to the `loader_hub/library.json` file so that it may be used by others. As is exemplified by the current file, add in the class name of your loader, along with its id, author, etc. This file is referenced by the Loader Hub website and the download function within GPT Index.

# Questions?

Feel free to hop into the [community Discord](https://discord.gg/dGcwcsnxhU) or tag the official [Twitter account](https://twitter.com/gpt_index)!
Update README.md 2023-02-06 13:35:46 -08:00			`# Llama Hub 🦙`
README 2023-02-01 16:02:30 -08:00
Update README.md 2023-02-06 13:35:23 -08:00			This is a simple library of all the data loaders / readers that have been created by the community. The goal is to make it extremely easy to connect large language models to a large variety of knowledge sources. These are general-purpose utilities that are meant to be used in [GPT Index](https://github.com/jerryjliu/gpt_index/tree/main/gpt_index) (e.g. when building a index) and [LangChain](https://github.com/hwchase17/langchain) (e.g. when building different tools an agent can use). For example, there are loaders to parse Google Docs, SQL Databases, PDF files, PowerPoints, Notion, Slack, Obsidian, and many more. Note that because different loaders produce the same types of Documents, you can easily use them together in the same index.
Added web and instructions 2023-02-01 22:44:43 -08:00
			`## Usage`

Update README.md 2023-02-04 00:51:55 -08:00			These general-purpose loaders are designed to be used as a way to load data into [GPT Index](https://github.com/jerryjliu/gpt_index/tree/main/gpt_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent. You can use them with `download_loader` from GPT Index in a single line of code! For example, see the code snippets below using the Google Docs Loader.
Added web and instructions 2023-02-01 22:44:43 -08:00
			`### GPT Index`

			```python
Readme updates 2023-02-03 14:36:26 -08:00			`from gpt_index import GPTSimpleVectorIndex, download_loader`

			`GoogleDocsReader = download_loader('GoogleDocsReader')`
Added web and instructions 2023-02-01 22:44:43 -08:00
			`gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']`
			`loader = GoogleDocsReader()`
			`documents = loader.load_data(document_ids=gdoc_ids)`
			`index = GPTSimpleVectorIndex(documents)`
			`index.query('Where did the author go to school?')`
			```

			`### LangChain`

			Note: Make sure you change the description of the `Tool` to match your use-case.

			```python
Readme updates 2023-02-03 14:36:26 -08:00			`from gpt_index import GPTSimpleVectorIndex, download_loader`
Added web and instructions 2023-02-01 22:44:43 -08:00			`from langchain.agents import initialize_agent, Tool`
			`from langchain.llms import OpenAI`
			`from langchain.chains.conversation.memory import ConversationBufferMemory`

Readme updates 2023-02-03 14:36:26 -08:00			`GoogleDocsReader = download_loader('GoogleDocsReader')`

Added web and instructions 2023-02-01 22:44:43 -08:00			`gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']`
			`loader = GoogleDocsReader()`
			`documents = loader.load_data(document_ids=gdoc_ids)`
			`index = GPTSimpleVectorIndex(documents)`

			`tools = [`
			`Tool(`
			`name="Google Doc Index",`
			`func=lambda q: index.query(q),`
			`description=f"Useful when you want answer questions about the Google Documents.",`
			`),`
			`]`
			`llm = OpenAI(temperature=0)`
			`memory = ConversationBufferMemory(memory_key="chat_history")`
			`agent_chain = initialize_agent(`
			`tools, llm, agent="zero-shot-react-description", memory=memory`
			`)`

			`output = agent_chain.run(input="Where did the author go to school?")`
			```

			`## How to add a loader`

Readme updates 2023-02-03 14:36:26 -08:00			`Adding a loader simply requires forking this repo and making a Pull Request. The Loader Hub website will update automatically. However, please keep in the mind the following guidelines when making your PR.`
Added web and instructions 2023-02-01 22:44:43 -08:00
			`### Step 1: Create a new directory`

Fix current READMEs 2023-02-03 21:15:15 -08:00			In `loader_hub`, create a new directory for your new loader. It can be nested within another, but name it something unique because the name of the directory will become the identifier for your loader (e.g. `google_docs`). Inside your new directory, create a `__init__.py` file, which can be empty, a `base.py` file which will contain your loader implementation, and, if needed, a `requirements.txt` file to list the package dependencies of your loader.
Readme updates 2023-02-03 14:36:26 -08:00
Update README.md 2023-02-04 00:51:55 -08:00			`If you'd like, you can create the new directory and files by running the following script.`
Readme updates 2023-02-03 14:36:26 -08:00
			```
			`./loader_hub/add_loader.sh [NAME_OF_NEW_DIRECTORY]`
			```
Added web and instructions 2023-02-01 22:44:43 -08:00
Update README.md 2023-02-04 00:51:55 -08:00			Make sure to put your dependencies into a `requirements.txt` file in the new directory so the required packages may be automatically installed when your loader is downloaded.
Fix current READMEs 2023-02-03 21:15:15 -08:00
Readme updates 2023-02-03 14:36:26 -08:00			`### Step 2: Write your README`
Added web and instructions 2023-02-01 22:44:43 -08:00
Readme updates 2023-02-03 14:36:26 -08:00			Inside your new directory, create a `README.md` that mirrors that of the existing ones. It should have a summary of what your loader does, its inputs, and how its used in the context of GPT Index and LangChain.
Added web and instructions 2023-02-01 22:44:43 -08:00
Readme updates 2023-02-03 14:36:26 -08:00			`### Step 3: Add your loader to the library`
Added web and instructions 2023-02-01 22:44:43 -08:00
README 2023-02-02 22:36:00 -08:00			Finally, add your loader to the `loader_hub/library.json` file so that it may be used by others. As is exemplified by the current file, add in the class name of your loader, along with its id, author, etc. This file is referenced by the Loader Hub website and the download function within GPT Index.
README updates 2023-02-05 17:56:28 -08:00
			`# Questions?`

			`Feel free to hop into the [community Discord](https://discord.gg/dGcwcsnxhU) or tag the official [Twitter account](https://twitter.com/gpt_index)!`