368 Commits

Author SHA1 Message Date
Emmanuel
e43264cbbc
feat: Added an option to choose the language in Wikipedia texts (#142) 2023-03-27 18:56:54 -07:00
Pandazki
d0185fc543
feat: ReadabilityWebPageReader add normalize func and custom TextSplitter (#138) 2023-03-26 18:23:33 -07:00
Jerry Liu
f29c41c8de
update readme example (#140) 2023-03-26 09:34:13 -07:00
Xuanwo
b944a50276
feat: Add OpendalReader and add s3|azblob|gcs support (#137)
Signed-off-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Jerry Liu <jerryjliu98@gmail.com>
2023-03-24 18:05:14 -07:00
Jerry Liu
1705c875f4
add chatgpt plugin loader (#139) 2023-03-24 15:36:05 -07:00
William Li
70985eda15
added confluence dataloader (#135)
Co-authored-by: William Li <twelvehertz@Williams-MacBook-Air.local>
2023-03-21 16:17:57 -07:00
Daisuke Hirata
883e472329
Add file id to Google drive file field meta data (#134) 2023-03-21 11:27:33 -07:00
akmhmgc
7d6763fa40
Modify sample code (#133) 2023-03-21 11:03:11 -07:00
Giuseppe
ba3806b587
Adding gitbook.io scraper (#130) 2023-03-20 09:58:57 -07:00
ikaruga
d8a05a767e
Add locale settings for Zendesk loader (#131) 2023-03-20 09:56:43 -07:00
Shoya SHIRAKI
a63116f473
Add Hatena Blog Loader (#129) 2023-03-18 21:37:28 -07:00
EmptyCrown
8d986fdd11 Rate limit fixes for pubmed 2023-03-18 15:15:55 -07:00
EmptyCrown
df26b14312 Fix pubmed 2023-03-18 15:00:44 -07:00
Jesse Zhang
767518d6c5
Small addition to allow extension filtering (#127) 2023-03-18 09:25:29 -07:00
Jerry Liu
94a0650527
add gpt repo loader (#126) 2023-03-17 00:05:40 -07:00
Smyja
3948ec0d22
No variable called file. (#124) 2023-03-16 22:56:24 -07:00
Smyja
9320136f6a
Airtable reader (#125) 2023-03-16 22:55:57 -07:00
Chris
f3d461c2c9
decode message by their encoding and then force 'utf-8' (#120) 2023-03-16 12:14:05 -07:00
gocampo
312a5a62b5
Fix bug #121 (#122)
Fixed the regex to take in account hyphens.
2023-03-16 11:56:11 -07:00
cdstrachan
4a1ccb8d06
Added domain lock paramater (#115)
* Update base.py
2023-03-16 01:00:26 -07:00
reletreby
6262be21c9
Bug fix (#116)
To resolve the following error message:
`AttributeError: 'SlackReader' object has no attribute 'earliest_date_timestamp'`
2023-03-15 19:29:24 -07:00
Jerry Liu
fdb8c86e42
Update image parser (#111) 2023-03-14 23:13:43 -07:00
ahmetkca
6072567da2
Add unittests for GHRepo reader and fix filter logic. (#104) 2023-03-13 21:09:34 -07:00
Rishav Dash
7c1e06a2f6
Update for MongoDB Atlast (#71)
* Upodate for MongoDB Atlast

Previously class took `host` and `port` as a parameter. Connect with MongoDB Atlas it's not possible to provide the host and port as it has a single URL. So directly pass the URL to the MongoClient.

* added mongo db url as a condition to connect to DB

* syntax error

---------

Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-03-13 18:26:22 -07:00
Jesse Zhang
f220fefc94
New paged csv reader (#105)
* New paged csv reader. Makes more sense

* CR
2023-03-13 18:22:08 -07:00
Jerry Liu
44d1f59d2a
Merge pull request #108 from reletreby/patch-7
Improvements to Gmail Reader
2023-03-13 17:43:18 -07:00
reletreby
039216197c
Update base.py 2023-03-13 18:16:09 -04:00
Jerry Liu
2ac2c42424
Merge pull request #106 from emptycrown/jerry/fix_db_reader 2023-03-13 13:23:40 -07:00
Jerry Liu
71578498f3 fix db reader 2023-03-13 13:22:21 -07:00
Jerry Liu
8c5d7ab2dd
Merge pull request #101 from emptycrown/jerry/fix_gmail_decoding 2023-03-12 00:05:00 -08:00
Jerry Liu
6fd2b17e7a cr 2023-03-12 00:01:12 -08:00
Jerry Liu
75480e16d0
Merge pull request #100 from emptycrown/jerry/revert_gh_changes
Revert "Merge pull request #73 from ahmetkca/github-reader-test-and-fix"
2023-03-11 19:14:27 -08:00
Jerry Liu
fdc44a79cb
Merge pull request #92 from AgentHQ/main 2023-03-11 19:08:10 -08:00
Bruno Bornsztein
77d5d9473c update readme 2023-03-11 21:02:25 -06:00
Bruno Bornsztein
68eb3b3483 fix readme typos 2023-03-11 21:02:25 -06:00
Bruno Bornsztein
564e98c40a add gmail reader requirements 2023-03-11 21:02:25 -06:00
Bruno Bornsztein
7c87cdc3ca readme and lazy imports 2023-03-11 21:02:24 -06:00
Bruno Bornsztein
eac86b14fe gmail reader
update
2023-03-11 21:02:22 -06:00
Jerry Liu
819437af8e Revert "Merge pull request #73 from ahmetkca/github-reader-test-and-fix"
This reverts commit 78bc97e9ed5e84e20d70156634b4e0ee7d612768, reversing
changes made to 13131d3e98c8be23b8a61a72098d68c5829b9a1f.
2023-03-11 17:04:01 -08:00
Jerry Liu
27ef2f0963
Merge pull request #98 from emptycrown/jerry/migrate_slack_changes 2023-03-11 15:35:59 -08:00
Jerry Liu
78bc97e9ed
Merge pull request #73 from ahmetkca/github-reader-test-and-fix 2023-03-11 15:32:24 -08:00
Jerry Liu
5b62821038 cr 2023-03-11 15:30:42 -08:00
Jerry Liu
92906aadc2 cr 2023-03-11 15:27:06 -08:00
Jerry Liu
13131d3e98
Merge pull request #94 from pandazki/main
feat: add readability webpage loader
2023-03-11 14:55:36 -08:00
Jerry Liu
206eb492c8 cr 2023-03-11 14:48:20 -08:00
Jerry Liu
692bbab132 cr 2023-03-11 14:41:42 -08:00
ahmetkca
179acb1b7c Merge remote-tracking branch 'upstream/main' into github-reader-test-and-fix 2023-03-11 14:35:39 -05:00
ahmetkca
ab63daf7ae add minor addition to test cases for filtering directories and file extensions 2023-03-11 14:34:34 -05:00
ahmetkca
d66823b0db Update READEME.md for github_repo 2023-03-11 14:33:27 -05:00
ahmetkca
fc255d595f Use if-block instead of match for backward compatibility. 2023-03-11 14:06:36 -05:00