2023-04-27 17:57:27 -07:00
..
2023-02-09 17:27:20 -08:00
2023-04-27 17:57:27 -07:00

Remote Page/File Loader

This loader makes it easy to extract the text from any remote page or file using just its url. If there's a file at the url, this loader will download it temporarily and parse it using SimpleDirectoryReader. It is an all-in-one tool for (almost) any url.

As a result, any page or type of file is supported. For instance, if a .txt url such as a Project Gutenberg book is passed in, the text will be parsed as is. On the other hand, if a hosted .mp3 url is passed in, it will be downloaded and parsed using AudioTranscriber.

Usage

To use this loader, you need to pass in a Path to a local file. Optionally, you may specify a file_extractor for the SimpleDirectoryReader to use, other than the default one.

from llama_index import download_loader

RemoteReader = download_loader("RemoteReader")

loader = RemoteReader()
documents = loader.load_data(url="https://en.wikipedia.org/wiki/File:Example.jpg")

This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent. See here for examples.