125 Commits

Author SHA1 Message Date
simonManydata
fa52ff2652
adding youtube url in the base remote reader (#65)
* adding youtube url in the base remote reader

* Update base.py

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-26 20:13:58 -08:00
Tommaso Soru
c2a2340c36
Fix imports in RDFReader. (#67) 2023-02-26 20:08:24 -08:00
akmhmgc
4814b6ed27
Fixed trivial typo (#66) 2023-02-26 20:07:24 -08:00
EmptyCrown
c53c487921 Change readme to llama 2023-02-24 23:52:43 -08:00
EmptyCrown
79d492b3e3 Cleanup 2023-02-24 23:47:25 -08:00
EmptyCrown
19ab1afa7d First GH test
Please enter the commit message for your changes. Lines starting
:x
2023-02-24 23:44:54 -08:00
ahmetkca
5a27264db1
Add GitHub Repository Reader (#34)
* add github repository, test a new way to download loader

* test imports when downloaded from gpt_index

* Refactor(Github Repo): Move github_client and utils to modules

* Moved github_client.py and utils.py from loader_hub/github_repo to modules/github_repo
* Updated import statements in base.py to reflect the new location

* temp

* Refactor(GithubRepositoryReader): Add github_client argument

- Add github_client argument to GithubRepositoryReader constructor
- Set default value for github_client argument
- Update docstring to reflect changes

* Refactor(Github Repo): Update init file

- Remove imports of base, github_client and utils
- Add imports of GithubRepositoryReader and GithubClient
- Update __all__ to include the new imports

* Fix(library): Update library.json

- Updated library.json to include __init__.py file

* Refactor(GithubRepositoryReader): Add filter for directories and files

- Add filter for directories and files in GithubRepositoryReader
- Ignore directories and files that do not pass the filter
- Print out if directory or file is ignored due to filter

* Refactor(BaseReader): Check filter files

- Refactor `_check_filter_files` to `_check_filter_file_extensions` in `BaseReader`
- Ignoring files due to filter

* Docs(FilterType): Add documentation for FilterType enum

- Add documentation for FilterType enum
- Explain what the enum is used for
- Describe the attributes of the enum

* Add(GPT Index): Add GPT Index example

Add GPT Index example to README
- Set OPENAI_API_KEY environment variable
- Download GithubRepositoryReader module
- Create GithubClient and GithubRepositoryReader
- Load data from Github Repository
- Create GPTSimpleVectorIndex
- Query the index

* Add(GPT Index): Add GPT Index example

Add GPT Index example to README
- Set OPENAI_API_KEY environment variable
- Download GithubRepositoryReader module
- Create GithubClient and GithubRepositoryReader
- Load data from Github Repository
- Create GPTSimpleVectorIndex
- Query the index

* Add(GPT Index): Add GPT Index example

Add GPT Index example to README
- Set OPENAI_API_KEY environment variable
- Download GithubRepositoryReader module
- Create GithubClient and GithubRepositoryReader
- Load data from Github Repository
- Create GPTSimpleVectorIndex
- Query the index

* change the import path for extras

* change import path for extra files to absolute

* Add test for GithubClient currently not using mocks which is not ideal

* Update test_github_reader.py

* Update test_github_reader.py

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-24 23:41:48 -08:00
EmptyCrown
457e7888e9 Cleanup 2023-02-24 23:39:32 -08:00
Tommaso Soru
049c3f1896
Add RDF file loader (#63)
* RDF file loader.

* Add RDF file loader to json.

* Update base.py

* Update base.py

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-24 23:30:56 -08:00
simonManydata
3cc026f98f
Add RemoteDepth to llamahub (#62)
* adding remote multiple

* removing the imports lazyloaded, and added the requirements txt for the libs

* updating description
2023-02-24 22:47:23 -08:00
EmptyCrown
9ed101e30b Fix substack 2023-02-24 10:49:50 -08:00
Jerry Liu
8b85a49d60
add chroma to llamahub (#61)
* add chroma to llamahub

* cr

---------

Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-24 10:01:56 -08:00
Edwin Ong
0cd691322e
Add Spotify Loader (#59)
* Add SpotifyReader

* Add a more interesting example

* Update README.md

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-23 16:19:05 -08:00
Edwin Ong
4d9e9a39c6
Add Google Calendar Loader (#58)
* Google Calendar reader

* Add GoogleCalendarReader to library.json

* Add README for GoogleCalendarReader

* Fix repo link

* Add an optional start_date to allow retrieval of past events

* Update README to include the start_date argument
2023-02-23 13:16:18 -08:00
Ari
c8d4172590
change pinecone to weaviate (#57) 2023-02-22 16:53:50 -08:00
Jerry Liu
e631266036
cr (#56)
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-22 09:43:29 -08:00
k4d
5d22855368
fix MoreComments NameError (#54) 2023-02-21 19:04:34 -08:00
Bubu
ff1389b3f5
Add Memos loader (#53)
* add memos loader

* handling networking errors and update README

* add keyword

* fix error

* fix

* fix

* Update README.md

* Update README.md

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-20 22:21:17 -08:00
Jerry Liu
e97bb81915
swap out gpt_index imports for llama_index imports (#49)
* cr

* cr

* cr

---------

Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-20 21:46:58 -08:00
Jesse Zhang
fd987c3f96
Update README.md 2023-02-20 13:19:31 -08:00
Jesse Zhang
193ae3aa41
README updates for LlamaIndex 2023-02-20 13:19:15 -08:00
Ji
423ea3cc3d
add Transcript Loader for Bilibili (#50)
* add Transcript Loader for Bilibili

This loader utilizes the `bilibili_api` to fetch the text transcript from Bilibili, one of the most beloved long-form video sites in China.

With this, users can easily obtain the transcript and general infor from Bilibili.

* add loader to libary.json

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-20 00:38:03 -08:00
EmptyCrown
70aeb75867 nit 2023-02-20 00:33:36 -08:00
EmptyCrown
8a9c54546e README change to llama 2023-02-20 00:33:01 -08:00
EmptyCrown
db927efaab README updates 2023-02-20 00:30:50 -08:00
vanessahlyan
af27337478
Add Reddit Reader (#52)
* Add reddit loader

* Improve descriptions
2023-02-20 00:22:54 -08:00
Jerry Liu
3a0493ccb0
cr (#51)
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-20 00:15:44 -08:00
batmanscode
00dff499f4
Add Whatsapp loader (#48)
* Whatsapp loader (#1)

* create whatsapp loader

* update readme

* update base.py and readme

added ":" in front of `author`

* update readme to say what verbose does

* use logging instead of print

- update readme
- add loader to `library.json`
2023-02-19 13:34:51 -08:00
Shimajiro
5c2315ff3d
fix typo (#47) 2023-02-17 21:21:23 -08:00
EmptyCrown
5a5d94bcdc Polish 2023-02-17 19:28:21 -08:00
Ravi Theja
b6d4b6b1a7
Add tesseract model for plain text image (#46)
* Add tessearact model for plain text image

* Update recommended changes
2023-02-17 19:18:00 -08:00
EmptyCrown
1937c19587 Added keywords for db loader 2023-02-16 22:47:35 -08:00
Sid Uppal
c67c2ca13a
Dad Jokes 🤓 (#43)
* Dad Jokes 🤓

* PR feedback

* Fix test issue

* fix test issue
2023-02-16 22:45:47 -08:00
lukasfrischknecht
b195fbb3bb
add option to pass languages to YoutubeTranscriptReader (#39)
* add option to pass languages to YoutubeTranscriptReader

* Add optional type

---------

Co-authored-by: lukas frischknecht <lukas.frischknecht@srf.ch>
Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-16 08:49:54 -08:00
EmptyCrown
615c21a2c8 Cleanup 2023-02-15 20:19:37 -08:00
David Bloomin
55377ef57a
Add AsanaReader to load documents from an Asana workspace (#38)
* Add AsanaReader to load documents from an Asana workspace

* always assume asana package is installed

* add asana to library.json
2023-02-15 20:18:31 -08:00
Jesse Zhang
3c497ac430
Better tests, including download_loader (#36)
* Better tests including download_loader

* Fix file name

* Import loaders locally

* Fix relative imports

* Update sys path

* Update sys path

* Import path

* Import path
2023-02-15 17:18:21 -08:00
EmptyCrown
6fb47cf7f6 Update bs4 readme 2023-02-15 09:20:06 -08:00
Smyja
23ae4928cb
Extended BeautifulSoup Reader loader. (#37)
* Extended BeautifulSoup Reader loader.

* removed link slice

* removed broken docs link list.

* lint and added metadata back

* fixed urljoin issue

* resolved the import issue and added typing
2023-02-15 09:17:34 -08:00
EmptyCrown
a1ab2d7738 Updated CJK library and readme 2023-02-15 09:06:30 -08:00
Shimajiro
6b6d93bc8f
Add Japanese PDF reader (#35)
* create

* add JapanesePDFReader

* Improved text extraction stability. Fixed bug that caused some PDFs to fail.

* modify class name and comment
2023-02-15 09:03:03 -08:00
Jesse Zhang
2d9c0f3580
Add optional setting for whether to caption images in pptx (#33)
* Add setting for whether to run HF model to caption pptx images. Default to false

* Update readme

* Lint
2023-02-13 08:55:35 -08:00
Jesse Zhang
20aba8b8c5
Update README.md 2023-02-12 21:39:51 -08:00
Jerry Liu
46176a1829
cr (#31)
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-12 21:38:00 -08:00
Ravi Theja
6b278bb978
Add google drive reader for text files (#25)
* Add google drive reader for  files

* Update Readme

* Update base.py file

* Update with all format files

* Update Readme and comments

* Update metadata, download to temporary dir

* Refactor google drive and address the google auth recurring

* Minor

---------

Co-authored-by: EmptyCrown <jessetanzhang@gmail.com>
2023-02-12 18:55:09 -08:00
Jesse Zhang
1d620b6c62
Update README.md 2023-02-11 12:46:34 -08:00
Jesse Zhang
88f4890e6a
Update README.md 2023-02-11 12:44:38 -08:00
EmptyCrown
8948cff28d skip folders in s3 2023-02-10 16:09:05 -08:00
Jesse Zhang
adaa2d78a6
Update README.md 2023-02-10 15:57:04 -08:00
EmptyCrown
e5dc38be2b Account for nested files in s3 reader 2023-02-10 15:44:56 -08:00