278 Commits

Author SHA1 Message Date
alexbowe
3c6edba93d convert iter to list 2023-03-03 18:10:41 -08:00
alexbowe
73602130fd debug 2023-03-03 18:09:26 -08:00
Smyja
f082d4608d added include_url_in_text parameter 2023-03-03 23:07:10 +01:00
alexbowe
28b67f25c1 convert to jsonstring 2023-03-02 19:04:18 -08:00
alexbowe
cb36cea403 revert dump to json 2023-03-02 19:00:41 -08:00
alexbowe
0289512187 dump to json string 2023-03-02 18:57:42 -08:00
alexbowe
36486cef29 use Document type 2023-03-02 18:56:52 -08:00
alexbowe
631e1812d1 fix examples and doc type 2023-03-02 18:40:20 -08:00
alexbowe
890d6b0884 remove unneeded import 2023-03-02 18:28:43 -08:00
alexbowe
fb8866a738 remove incomplete line 2023-03-02 18:28:26 -08:00
alexbowe
901260c887 raise for status 2023-03-02 18:19:22 -08:00
alexbowe
091ed4aeef update comment and import requests 2023-03-02 18:18:38 -08:00
alexbowe
44333124da update docstring 2023-03-02 18:15:06 -08:00
alexbowe
009d2ecc98 remove unneeded comment 2023-03-02 18:14:44 -08:00
alexbowe
5dce7d2ca8 fix member var name 2023-03-02 18:14:02 -08:00
Alex Bowe
ef366e32ac
Delete requirements.txt
Not needed for this - we just use requests.
2023-03-02 18:13:13 -08:00
alexbowe
0b178799b2 update library index 2023-03-02 18:07:45 -08:00
alexbowe
5f989e1d35 add readwise reader 2023-03-02 18:05:43 -08:00
Smyja
f9c7f31f5f added logging 2023-03-02 22:08:17 +01:00
Smyja
ef9a6a2c07 added reference links 2023-03-02 19:52:14 +01:00
Jerry Liu
35fb446853
Merge pull request #76 from SidU/main 2023-03-01 22:55:17 -08:00
Sid Uppal
d659378d9b
Fix typos in ReadMe
Typo fix
2023-03-01 22:43:28 -08:00
Jerry Liu
4d7b15fc85
cr (#74)
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-03-01 16:53:10 -08:00
ahmetkca
37c8bb8563 cleanup 2023-03-01 03:38:39 -05:00
ahmetkca
2ae0021d9d import llama_index 2023-03-01 03:36:50 -05:00
simonManydata
ac45380966
Add more patterns url for YoutubeTranscriptReader (#64)
* adding more patterns to extract youtube video id from url on the youtube loader

* updating patterns correspondingly
2023-02-28 21:32:49 -08:00
ahmetkca
92124b6bb5 fix import 2023-02-28 20:32:39 -05:00
reletreby
075367e721
Read Slack conversations in chronological order (#70)
- Current code reads conversations from a Slack channel from most recent to oldest.
- This adds a parameter that controls it in the `load_data` method.
2023-02-27 22:35:05 -08:00
ahmetkca
6e7d49dc38 fix import 2023-02-28 01:05:20 -05:00
ahmetkca
94e318af99 fix import 2023-02-28 00:53:23 -05:00
ahmetkca
e90fc59cfc correctly import from llamahub_modules 2023-02-28 00:42:24 -05:00
ahmetkca
327350d2da conditional import for llama_index and gpt_index 2023-02-28 00:30:49 -05:00
ahmetkca
1543509329 Merge remote-tracking branch 'upstream/main' into github-reader-test-and-fix 2023-02-28 00:28:39 -05:00
ahmetkca
62ea978d6c Add more test for GithubRepositoryReader and fix
Fix for filtering file extensions and directories.
Partial test coverage for GithubRepositoryReader.
Conditional import for llama_index and gpt_index
2023-02-28 00:08:01 -05:00
MarkusOdenthal
65a1f5e6eb
Fix Notion Loader: For a Database we're only loading the first 100 page_ids (#68)
* Fix Notion Loader: Database the loader is only loading the first 100 page_ids

During testing of the Notion Loader today, I noticed that I only received the first 100 pages when loading all pages from a database. Notion API returns the attribute has_more, and if this is true, we must request more pages. So this is the implementation of the fix.

* Implement suggestions from @emptycrown

* Update base.py

---------

Co-authored-by: Markus Odenthal <markus.odenthal@real-digital.de>
Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-27 11:18:30 -08:00
EmptyCrown
3f6e5af8f9 Fix test 2023-02-26 20:25:35 -08:00
simonManydata
fa52ff2652
adding youtube url in the base remote reader (#65)
* adding youtube url in the base remote reader

* Update base.py

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-26 20:13:58 -08:00
Tommaso Soru
c2a2340c36
Fix imports in RDFReader. (#67) 2023-02-26 20:08:24 -08:00
akmhmgc
4814b6ed27
Fixed trivial typo (#66) 2023-02-26 20:07:24 -08:00
EmptyCrown
c53c487921 Change readme to llama 2023-02-24 23:52:43 -08:00
EmptyCrown
79d492b3e3 Cleanup 2023-02-24 23:47:25 -08:00
EmptyCrown
19ab1afa7d First GH test
Please enter the commit message for your changes. Lines starting
:x
2023-02-24 23:44:54 -08:00
ahmetkca
5a27264db1
Add GitHub Repository Reader (#34)
* add github repository, test a new way to download loader

* test imports when downloaded from gpt_index

* Refactor(Github Repo): Move github_client and utils to modules

* Moved github_client.py and utils.py from loader_hub/github_repo to modules/github_repo
* Updated import statements in base.py to reflect the new location

* temp

* Refactor(GithubRepositoryReader): Add github_client argument

- Add github_client argument to GithubRepositoryReader constructor
- Set default value for github_client argument
- Update docstring to reflect changes

* Refactor(Github Repo): Update init file

- Remove imports of base, github_client and utils
- Add imports of GithubRepositoryReader and GithubClient
- Update __all__ to include the new imports

* Fix(library): Update library.json

- Updated library.json to include __init__.py file

* Refactor(GithubRepositoryReader): Add filter for directories and files

- Add filter for directories and files in GithubRepositoryReader
- Ignore directories and files that do not pass the filter
- Print out if directory or file is ignored due to filter

* Refactor(BaseReader): Check filter files

- Refactor `_check_filter_files` to `_check_filter_file_extensions` in `BaseReader`
- Ignoring files due to filter

* Docs(FilterType): Add documentation for FilterType enum

- Add documentation for FilterType enum
- Explain what the enum is used for
- Describe the attributes of the enum

* Add(GPT Index): Add GPT Index example

Add GPT Index example to README
- Set OPENAI_API_KEY environment variable
- Download GithubRepositoryReader module
- Create GithubClient and GithubRepositoryReader
- Load data from Github Repository
- Create GPTSimpleVectorIndex
- Query the index

* Add(GPT Index): Add GPT Index example

Add GPT Index example to README
- Set OPENAI_API_KEY environment variable
- Download GithubRepositoryReader module
- Create GithubClient and GithubRepositoryReader
- Load data from Github Repository
- Create GPTSimpleVectorIndex
- Query the index

* Add(GPT Index): Add GPT Index example

Add GPT Index example to README
- Set OPENAI_API_KEY environment variable
- Download GithubRepositoryReader module
- Create GithubClient and GithubRepositoryReader
- Load data from Github Repository
- Create GPTSimpleVectorIndex
- Query the index

* change the import path for extras

* change import path for extra files to absolute

* Add test for GithubClient currently not using mocks which is not ideal

* Update test_github_reader.py

* Update test_github_reader.py

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-24 23:41:48 -08:00
EmptyCrown
457e7888e9 Cleanup 2023-02-24 23:39:32 -08:00
Tommaso Soru
049c3f1896
Add RDF file loader (#63)
* RDF file loader.

* Add RDF file loader to json.

* Update base.py

* Update base.py

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-24 23:30:56 -08:00
ahmetkca
2b8673142e Add test for GithubClient currently not using mocks which is not ideal 2023-02-25 02:10:09 -05:00
simonManydata
3cc026f98f
Add RemoteDepth to llamahub (#62)
* adding remote multiple

* removing the imports lazyloaded, and added the requirements txt for the libs

* updating description
2023-02-24 22:47:23 -08:00
ahmetkca
a068fbb67e Merge remote-tracking branch 'upstream/main' into github-reader 2023-02-24 23:35:55 -05:00
EmptyCrown
9ed101e30b Fix substack 2023-02-24 10:49:50 -08:00
Jerry Liu
8b85a49d60
add chroma to llamahub (#61)
* add chroma to llamahub

* cr

---------

Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-24 10:01:56 -08:00