Alon Diament
faca98e1dd
Added content arg to MarkdownReader ( #296 )
2023-05-28 11:42:42 -07:00
bhavishpahwa
f4ff2f95a6
Fix base.py for flat_pdf ( #285 )
...
Fixed the bug that was coming up due to this commit and ImageReader output being changed due to this commit:- a37df8c221 (diff-d4cdba126dd16c86090107c1c66621a520d20d8962f7f4dd461146cedbb1135d)
2023-05-22 20:50:36 -07:00
Anoop Sharma
0e2d5c6ba2
File reference metadata ( #280 )
2023-05-21 09:13:35 -07:00
Logan
62f94d0eba
add concat rows to pandas excel ( #262 )
...
Co-authored-by: Jerry Liu <jerryjliu98@gmail.com>
2023-05-17 09:16:05 -07:00
Simon Suo
a37df8c221
Update after refactoring away parsers in LlamaIndex, also update docs to 0.6.0 API ( #264 )
2023-05-16 23:26:33 -04:00
Jerry Liu
4682373984
fix deepdoctection reqs ( #258 )
2023-05-14 23:40:14 -04:00
Arun Brahma
1fe452119c
Updated gpt_index to llama_index in ReadMe files ( #252 )
2023-05-10 23:26:04 -07:00
Arun Brahma
fc6a8c04f4
updated extra_info dict and readme file ( #250 )
2023-05-10 12:16:29 -07:00
bhavishpahwa
49adaaa19b
Change gpt_index to llama_index in pdf loader readme ( #241 )
2023-05-05 16:14:08 -07:00
Arun Brahma
88827f9305
updated params in load func and readme file ( #236 )
2023-05-03 17:35:39 -07:00
Arun Brahma
d2a2b58497
feat: Added PyMuPDF loader for PDF files ( #227 )
...
Co-authored-by: Jerry Liu <jerryjliu98@gmail.com>
2023-05-03 10:19:57 -07:00
Sam S. Yu
4c521b0b72
Update pandas_excel loader ( #216 )
2023-04-29 21:49:31 -07:00
Raghav Mecheri
8fffb8462e
Quick patch to get mbox reading working again ( #222 )
2023-04-29 21:03:09 -07:00
Jerry Liu
410f233d94
add deepdoctection reader ( #217 )
2023-04-27 01:26:26 -07:00
Jerry Liu
190069a6ed
add more data loaders (image and ipynb) ( #214 )
2023-04-26 10:43:55 -07:00
Sivasurya Santhanam
1b2e016a36
utf-8 encoding fixes UnicodeDecodeError ( #196 )
...
Opening the file with encoding = 'utf-8', fixes the following UnicodeDecodeError
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1094: character maps to
OS: Windows 10 x64
2023-04-15 00:09:11 -07:00
YasuhiroKume
bef079f726
fix flat_pdf readme ( #184 )
2023-04-12 13:46:05 -07:00
Ravi Theja
fcf9e87f90
Add transcription using gladia ( #164 )
...
Co-authored-by: Jerry Liu <jerryjliu98@gmail.com>
2023-04-07 22:58:04 -07:00
Sarmad Qadri
8312da4ee8
Rename input_file to file to match other file reader load_data function signatures ( #160 )
2023-04-02 16:53:31 -07:00
Emmanuel
842412a3a2
feat: Created a loader for flattened pdfs ( #148 )
2023-04-01 10:25:17 -07:00
Smyja
3948ec0d22
No variable called file. ( #124 )
2023-03-16 22:56:24 -07:00
Jerry Liu
fdb8c86e42
Update image parser ( #111 )
2023-03-14 23:13:43 -07:00
Jesse Zhang
f220fefc94
New paged csv reader ( #105 )
...
* New paged csv reader. Makes more sense
* CR
2023-03-13 18:22:08 -07:00
Jerry Liu
5adde59e4f
Merge remote-tracking branch 'upstream/main' into add_excel_loader
2023-03-10 16:37:47 -08:00
Jerry Liu
fe60d7ce59
add json reader ( #86 )
...
* cr
* cr
2023-03-06 16:38:11 -08:00
Lucas Maccarini
c661e6ee59
Added excel loader
2023-03-06 19:49:08 -03:00
EmptyCrown
3f6e5af8f9
Fix test
2023-02-26 20:25:35 -08:00
Tommaso Soru
c2a2340c36
Fix imports in RDFReader. ( #67 )
2023-02-26 20:08:24 -08:00
EmptyCrown
457e7888e9
Cleanup
2023-02-24 23:39:32 -08:00
Tommaso Soru
049c3f1896
Add RDF file loader ( #63 )
...
* RDF file loader.
* Add RDF file loader to json.
* Update base.py
* Update base.py
---------
Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-24 23:30:56 -08:00
Jerry Liu
e631266036
cr ( #56 )
...
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-22 09:43:29 -08:00
Jerry Liu
e97bb81915
swap out gpt_index imports for llama_index imports ( #49 )
...
* cr
* cr
* cr
---------
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-02-20 21:46:58 -08:00
Shimajiro
5c2315ff3d
fix typo ( #47 )
2023-02-17 21:21:23 -08:00
EmptyCrown
5a5d94bcdc
Polish
2023-02-17 19:28:21 -08:00
Ravi Theja
b6d4b6b1a7
Add tesseract model for plain text image ( #46 )
...
* Add tessearact model for plain text image
* Update recommended changes
2023-02-17 19:18:00 -08:00
EmptyCrown
615c21a2c8
Cleanup
2023-02-15 20:19:37 -08:00
EmptyCrown
a1ab2d7738
Updated CJK library and readme
2023-02-15 09:06:30 -08:00
Shimajiro
6b6d93bc8f
Add Japanese PDF reader ( #35 )
...
* create
* add JapanesePDFReader
* Improved text extraction stability. Fixed bug that caused some PDFs to fail.
* modify class name and comment
2023-02-15 09:03:03 -08:00
Jesse Zhang
2d9c0f3580
Add optional setting for whether to caption images in pptx ( #33 )
...
* Add setting for whether to run HF model to caption pptx images. Default to false
* Update readme
* Lint
2023-02-13 08:55:35 -08:00
EmptyCrown
cd3219de68
Remove mentions of verbose
2023-02-10 14:47:37 -08:00
EmptyCrown
7869c1a331
Update README
2023-02-10 09:04:27 -08:00
Jesse Zhang
a70e60c94b
Remote reader ( #17 )
...
* Small bug fixes
* Remote loader for pages/files
* Add to library
2023-02-09 17:27:20 -08:00
EmptyCrown
9fbe79afa3
Unstructured readme
2023-02-09 08:34:27 -08:00
EmptyCrown
0ff027d210
Fix bug with unstructured
2023-02-09 08:30:33 -08:00
EmptyCrown
6ec52ecd2f
Small bug fixes
2023-02-09 00:55:35 -08:00
Jerry Liu
b53ab52c84
cr ( #16 )
...
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-08 23:12:44 -08:00
Ajinkya Indulkar
6c194fad66
🐛 fix file loading in PandasCSVReader
( #14 )
2023-02-08 12:58:27 -08:00
Jesse Zhang
e0fe338fe6
Unstructured.io loader ( #12 )
...
* Unstructured.io loader
* Formatting python in readme
* Added split_documents arg
* Readme tweak
2023-02-07 22:12:24 -08:00
EmptyCrown
4a418c02fe
Refactors mbox loader
2023-02-06 23:54:33 -08:00
Jerry Liu
12d5ce89b8
cr ( #8 )
...
Co-authored-by: Jerry Liu <jerry@robustintelligence.com>
2023-02-06 23:34:00 -08:00