71 lines
2.5 KiB
Markdown
Raw Normal View History

2023-01-31 22:53:56 -08:00
# Google Doc Loader
2023-02-01 16:42:50 -08:00
This loader takes in IDs of Google Docs and parses their text into `Document`s. You can extract a Google Doc's ID directly from its URL. For example, the ID of `https://docs.google.com/document/d/1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec/edit` is `1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec`.
2023-02-01 17:35:33 -08:00
As a prerequisite, you will need to register with Google and generate a `credentials.json` file in the directory where you run this loader. See [here](https://developers.google.com/workspace/guides/create-credentials) for instructions.
2023-01-31 22:53:56 -08:00
## Usage
To use this loader, you simply need to pass in an array of Google Doc IDs.
```python
from llama_index import download_loader
2023-02-03 21:15:15 -08:00
GoogleDocsReader = download_loader('GoogleDocsReader')
2023-01-31 22:53:56 -08:00
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
2023-02-01 16:42:50 -08:00
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
2023-01-31 22:53:56 -08:00
```
## Examples
This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/jerryjliu/gpt_index/tree/main/gpt_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent.
2023-01-31 22:53:56 -08:00
### LlamaIndex
2023-01-31 22:53:56 -08:00
```python
from llama_index import GPTVectorStoreIndex, download_loader
2023-02-03 21:15:15 -08:00
GoogleDocsReader = download_loader('GoogleDocsReader')
2023-01-31 22:53:56 -08:00
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
2023-02-01 16:42:50 -08:00
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
2023-01-31 22:53:56 -08:00
index.query('Where did the author go to school?')
```
### LangChain
Note: Make sure you change the description of the `Tool` to match your use-case.
```python
from llama_index import GPTVectorStoreIndex, download_loader
2023-01-31 22:53:56 -08:00
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.chains.conversation.memory import ConversationBufferMemory
2023-02-03 21:15:15 -08:00
GoogleDocsReader = download_loader('GoogleDocsReader')
2023-01-31 22:53:56 -08:00
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
2023-02-01 16:42:50 -08:00
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
2023-01-31 22:53:56 -08:00
tools = [
Tool(
name="Google Doc Index",
func=lambda q: index.query(q),
description=f"Useful when you want answer questions about the Google Documents.",
),
]
llm = OpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = initialize_agent(
tools, llm, agent="zero-shot-react-description", memory=memory
)
2023-02-01 16:02:30 -08:00
output = agent_chain.run(input="Where did the author go to school?")
2023-01-31 22:53:56 -08:00
```