3803 Commits

Author SHA1 Message Date
Bob van Luijt
c0cc8bc80f
Bump Weaviate version to 1.7.0 (#1412)
* Bump Weaviate

* Bump Weaviate

* Bump Weaviate client

* Bump Weaviate

* Revert client version

There is a change in the client API that needs to be addressed before bumping its version
2021-09-05 09:28:55 +02:00
Malte Pietsch
f3e7074c13
Remove stale bot 2021-09-03 17:39:24 +02:00
Malte Pietsch
f3d1df1664
Enable docker-compose for GPUs & Add public UI image (#1406)
* add docker-compose-gpu file

* Update README.md

* Update docker-compose.yml

* Update docker-compose-gpu.yml

* Update docker-compose.yml

* Update docker-compose-gpu.yml
2021-09-02 17:39:21 +02:00
Malte Pietsch
bb9ec90d3c
Fix tesseract installation in Dockerfile (#1405)
* Fix Dockerfile

* Update Dockerfile-GPU
2021-09-02 11:09:30 +02:00
bogdankostic
38128c6734
Ensure num_hard_negatives is 0 when embedding passages (#1402) 2021-09-02 10:46:02 +02:00
Julian Risch
b552bf9b4d
Add sentence-transformers as mandatory dependency and remove from dev… (#1387)
* Add sentence-transformers as mandatory dependency and remove from dev dependency

* Pin sentence-transformers version
2021-09-02 09:54:13 +02:00
Branden Chan
980d88a0f2
Update faq model (#1401) 2021-09-01 18:39:06 +02:00
Malte Pietsch
e4c3c3d423
Fix CI (introduced by OCR PR #1349) (#1399)
* satisfy mypy

* add import
2021-09-01 17:16:05 +02:00
Malte Pietsch
6093bf9ff6
Fix Github action 2021-09-01 16:50:29 +02:00
Shahrukh Khan
4822536886
Add ImageToTextConverter and PDFToTextOCRConverter that utilize OCR (#1349)
* add image.py converter

* add PDFtoImageConverter

* add init to PDFtoImageConverter and classes to __init__

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* revert change in base.py in file_conv

* Update base.py

* Update pdf.py

* add ocr file_converter testcase & update dockerfile

* fix tesseract exception message typo

* fix _image_to_text doctstring

* add tesseract installation to CI

* add tesseract installation to CI

* add content test for PDF OCR converter

* update PDFToTextOCRConverter constructor doctsring

* replace image files with tmp paths for image.py convert

* replace image files with tmp paths for image.py convert

* Update README.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-01 16:42:25 +02:00
oryx1729
1d2252e96d
docker-compose always pull REST API Image (#1385) 2021-09-01 16:28:25 +02:00
Ikram Ali
3fc7f3f695
[docs] crawler api docs updated. (#1388) 2021-09-01 12:07:32 +02:00
Branden Chan
4021eb838e
Add weaviate to init (#1379) 2021-08-31 15:23:06 +02:00
Branden Chan
1938fb001b
Add support for no Docker envs in Tutorial 13 (#1365)
* Add support for no docker envs e.g. colab

* Generate md
2021-08-31 15:22:51 +02:00
oryx1729
a71180a2ca
Refactor replicas config for Ray Pipelines (#1378) 2021-08-31 10:14:55 +02:00
Ikram Ali
da5ed43734
Catch Elastic's search_phase_execution and raise with descriptive message. (#1371)
* [document_store] Catch Elastic's search_phase_execution_exception (dense retrieval if not all documents have an embedding) closes #1135

* change error msg

* remove unused import

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-30 19:38:07 +02:00
Jeff Hammerbacher
1c8a03aaa2
Rag tutorial fixes (#1375)
* Update Tutorial7_RAG_Generator.ipynb

`delete_all_documents` --> `delete_documents` (cf. #1045)

* Update Tutorial7_RAG_Generator.py

`delete_all_documents` --> `delete_documents` (cf. #1045)
2021-08-30 15:27:18 +02:00
cambiumproject
4ca97dd5be
Fix behavior of delete_documents() with filters for Milvus (#1354)
* Fix behavior of delete_documents()

Delete filtered set of vectors rather than the whole collection

* Update milvus.py

* Update milvus.py
2021-08-30 15:22:53 +02:00
ramgarg102
51f0a56e5d
delete_all_documents() replaced by delete_documents() (#1377)
* [UPDT] delete_all_documents() replaced by delete_documents()

* [UPDT] warning logs to be fixed

* [UPDT] delete_all_documents() renamed and the same method added

Co-authored-by: Ram Garg <ramgarg102@gmai.com>
2021-08-30 15:18:28 +02:00
Markus Paff
be8d305190
Editing docs read.me for new docs website workflow (#1372)
* editing docs read.me for new docs website workflow

* added new links to docs
2021-08-30 14:59:40 +02:00
Shahrukh Khan
c3d8aa0643
Add query classifier usage docs (#1348)
* Create query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-24 15:56:11 +02:00
Ikram Ali
ead96730d3
Add Crawler support for indexing pipeline (#1360) 2021-08-24 14:25:22 +02:00
Markus Paff
cac15310bd
adding tutorial 13 and 14 (#1364) 2021-08-23 11:37:06 +02:00
Malte Pietsch
2a226daac4
Add simple docs2answer node to allow FAQ style QA / Doc search in API (#1361)
* minimal docs2answer node

* enable logs again
2021-08-20 17:01:55 +02:00
Markus Paff
ff2049cd45
updated tutorials (#1359) 2021-08-19 21:16:56 +02:00
annagruendler
a3c746abf5
Update test documentation in readme (#1355) 2021-08-19 10:36:21 +02:00
Ikram Ali
ef27f0d386
Add tests for Crawler (#1339) 2021-08-18 14:05:44 +02:00
Branden Chan
a023f0a32a
Support OpenDistro init (#1334)
* Support OpenDistro init

* Fix docstring
2021-08-17 12:07:36 +02:00
Julian Risch
eb990c9688
Removing probability field from answers in favor of score field (#1340)
* Removing probability field from reader and from test cases

* Add switch to FARMReader to choose score/probability

* Remove probability field from doc returned by doc store

* Relax assertion testing joined es and dpr predictions

* Use switch for confidence scores also for no_answer

* Add test that checks switching to old answer scores > 10

* Normalize score in elastic doc store and reset reader.md

* Scale weights of JoinDocuments to sum to 1 and adapt test case
2021-08-17 10:27:11 +02:00
Julian Risch
e7b3e2764c
Add link to arxiv paper on SAS (#1344) 2021-08-16 10:47:27 +02:00
Tanay Pant
79df82aec6
Remove empty bullet points (#1342) 2021-08-12 20:09:18 +02:00
Timo Moeller
07bd3c50ea
Add new QA eval metric: Semantic Answer Similarity (SAS) (#1338)
* init

* Add type annotation

* Add test case, fix mypy

* Add german model to docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-12 14:31:48 +02:00
Bob van Luijt
ba071cc052
Bump Weaviate version (#1336) 2021-08-12 09:54:09 +02:00
Markus Paff
7569ab97dd
Add faq annotation (#1333)
* add annotation faq to read.me

* design fix

* add faq to docs page

* changed format
2021-08-10 14:55:31 +02:00
Malte Pietsch
be9d19afa5
Remove Finder from tutorials (#1329) 2021-08-10 11:50:59 +02:00
Ikram Ali
d94674c5b6
Remove finder class from tutorial 1 (#1328) 2021-08-10 11:41:07 +02:00
Malte Pietsch
5e16ec4d76
Fix installation in Colab Tutorial 11 2021-08-10 08:50:04 +02:00
Malte Pietsch
a0921f0c35
Remove Finder (#1326)
* deprecate finder

* remove import

* add doc section for moving from finder to pipelines
2021-08-09 13:41:40 +02:00
Bishal gaire
4198dc6feb
Update docstring for RAG (#1149)
* Update 7.md

Initialize retriever in RAG generator

* update docstring

* Update 7.md

* Update 7.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-09 11:52:45 +02:00
Malte Pietsch
66b10a508b
Update TOC of readme 2021-08-09 11:40:20 +02:00
Malte Pietsch
fb4d6e0381
Update README.md 2021-08-09 11:25:47 +02:00
Malte Pietsch
5a3ea5843f
Fix Tutorial Links 2021-08-09 11:22:19 +02:00
Shahrukh Khan
f99c14268a
Update README.md for new tutorials 13 and 14 (#1325)
* Update README.md

* Update README.md
2021-08-09 10:44:42 +02:00
Shahrukh Khan
cc43502e7e
Add Tutorial about Query Classifier (#1324)
* add query classifier colab and jupyter notebook

* Delete Tutorial13_Query_Classifier.ipynb

* add query classifier tutorial with updated number

* add query classifier tutorial script

* Rename tutorial14_query_classifier.py to Tutorial14_Query_Classifier.py
2021-08-09 10:43:50 +02:00
Markus Sagen
60cce4e579
Allow to upload multiple files simultaneously in Haystack UI (#1323)
* Fix only one file upload for Haystack UI

When using Haystack UI and streamlit, the default behavior is to upload one file at at time and override the file you have already uploaded. Streamlit now supports uploading multiple files and may be more intuitive for users of Haystack to use it as the default behavior.

Return type for `st.sidebar.file_uploader` when `accept_multiple_files=True` is a list of the files and empty if no files are provided

* Update requirements.txt

* Update requirements.txt

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-06 17:45:36 +02:00
David Silva
fb4a417e53
Integrate filters with knn queries in OpenDistroElasticsearchDocumentStore (#1301)
* Integrate filters with knn queries in ODFE

Allows the use of filters coupled with knn similarity search on
OpenDistroElasticsearchDocumentStore instances. Fixes #1139.

* Satisfy mypy

Co-authored-by: David Silva <44328951+DavidSilva98@users.noreply.github.com>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-08-03 16:31:13 +02:00
Ikram Ali
6829446b0e
Transformer summarizer truncation bug fixed (#1309)
* [Summarizer] truncation bug fixed. fix #1296

* [Summarizer] Fixes #1296

* [Summarizer] warning added Fixes #1296

* [Summarizer] code refactor. Fixes #1296

* Avoid repeating warning message

* Add type check

* Format string

Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-08-02 18:02:39 +02:00
oryx1729
bafa1b46de
Add Ray integration for Pipelines (#1255) 2021-08-02 14:51:24 +02:00
oryx1729
3eaf9dfbca
Suppress FAISS logs & apex warnings (#1315) 2021-07-29 14:32:50 +02:00
oryx1729
849cf27e74
Pin Weaviate version (#1306) 2021-07-27 16:44:07 +02:00