2539 Commits

Author SHA1 Message Date
Julian Risch
90f826e95e
Add links to tutorial 12 to readme (#1274) 2021-07-13 11:23:10 +02:00
Julian Risch
2a90471c73
Encapsulate tutorial code in method (#1266) 2021-07-09 17:08:19 +02:00
Julian Risch
dbb9efbd39
Add SentenceTransformersRanker with pre-trained Cross-Encoder (#1209)
* Add SentenceTransformersRanker with pre-trained Cross-Encoder

* Add test cases for Ranker nodes and update documentation

* update docstring

* Update docstring

* Update __init__.py

* update import for test

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-07-07 17:31:45 +02:00
Moshe Berchansky
495f98deba
Add global_loss_buffer_size to the DensePassageRetriever, in order to fix 'encoded data exceeds max_size' error with DDP. (#1245) 2021-07-06 13:56:41 +02:00
Ikram Ali
f5a8d3cf45
Add id in write_labels() for SQLDocumentStore (#1253) 2021-07-05 14:13:21 +02:00
Ikram Ali
8e117f5e11
ElasticsearchDocumentStore get_label_count() bug fixed. (#1252) 2021-07-03 20:51:33 +02:00
Ikram Ali
04a470f890
SQLDocumentStore get_label_count() bug fixed. (#1251) 2021-07-03 14:02:44 +02:00
Michaël Bitard
aaed22304d
Fix convert integer CONCURRENT_REQUEST_PER_WORKER (#1247) 2021-07-02 20:38:15 +02:00
Ikram Ali
29e140196b
[pipeline] Allow for batch indexing when using Pipelines fix #1168 (#1231)
* [pipeline] Allow for batch indexing when using Pipelines fix #1168

* [pipeline] Test case fixed fix #1168

* [file_converter] Path.suffix updated #1168

* [file_converter] meta can be one of these three cases:
                 A single dict that is applied to all files
                 One dict for each file being converted
                 None #1168

* [file_converter] mypy error fixed.

* [file_converter] mypy error fixed.

* [rest_api] batch file upload introduced in indexing API.

* [test_case] Test_api file upload parameter name updated.

* [ui] Streamlit file upload parameter updated.
2021-06-30 14:13:46 +02:00
Malte Pietsch
5e23e72f31 Update issue templates 2021-06-30 12:12:07 +02:00
Guillim
73a4f9825a
Add env var CONCURRENT_REQUEST_PER_WORKER (#1235)
* we create an env var `CONCURRENT_REQUEST_PER_WORKER` following your naming convention, (I came a few commit backwards to find the original name)

* default to 4
2021-06-29 07:44:25 +02:00
Malte Pietsch
2c964db62d
Relax typing for meta data in REST API (#1224) 2021-06-24 12:34:42 +02:00
Malte Pietsch
2caeea000e
Small UI and REST API fixes (#1223)
* small fixes

* change default question
2021-06-24 09:53:08 +02:00
Julian Risch
17dcb8c23e
Use Reader's device by default (#1208)
* Use Reader's device by default

* Replace get_device with initialize_device_settings

* Add import statements for init_device_settings

* Remove unused get_device method
2021-06-24 09:22:34 +02:00
Branden Chan
10e332dabb
Fix Links (#1199)
* Fix link highlight

* Regen md files

* Remove duplicate

* Fix whitespace

* fixing strings for website

* Fix link

Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-06-23 19:07:54 +02:00
Branden Chan
efc03f72db
Make PreProcessor.process() work on lists of documents (#1163)
* Add process_batch method

* Rename methods

* Fix doc string, satisfy mypy

* Fix mypy CI

* Fix typp

* Update tutorial

* Fix argument name

* Change arg name

* Incorporate reviewer feedback
2021-06-23 18:13:51 +02:00
oryx1729
afee4f36ce
Add scaffold for defining custom components for Pipelines (#1205) 2021-06-23 12:01:54 +02:00
vblagoje
02fc4c7783
Improve document stores unit test parametrization (#1202) 2021-06-22 16:08:23 +02:00
Markus Paff
a8f3601e6a
Pin docs for 0.9.0 2021-06-22 10:38:08 +02:00
Ikram Ali
d835a9cdc5
[setup] version tag added to Haystack fix #1175 (#1216) 2021-06-22 09:43:26 +02:00
Stefano
66049abff0
Add arg to support different languages in PreProcessor's sentence segmentation (#1160)
* Add PreProcessor optional language parameter.

* Add iso639 to nltk languages.

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-21 18:53:19 +02:00
Julian Risch
9e4d7bf9be
Increase Haystack version to 0.9.0 (#1215) v0.9.0 2021-06-21 18:39:00 +02:00
oryx1729
0168f04385
Remove unused function _get_pseudo_prob (#1201) 2021-06-17 10:28:48 +02:00
C V Goudar
f9c4083006
Bugfix setting of device by defaulting to "cpu" (#1182)
* Defaulting the device to cpu in case gpu is not available and use_gpu is set to True

Co-authored-by: C V Goudar <cv.goudar@emplay.et>
2021-06-16 10:26:29 +02:00
Markus Paff
6cd49105e7
update api markdown files and add markdown file for ranker (#1198)
* update api markdown files and add markdown file for ranker

* added docstrings for weaviate

* new version of pydoc-markdown does not render arguments correctly. We used pydoc-markdown==3.11.0
2021-06-15 17:50:08 +02:00
Julian Risch
215c45eb8a
Remove quickfix from reader and ranker (#1196)
* Remove quickfix from ranker

* remove quickfix from reader

* Use inferencer's model instead of reloaded model
2021-06-15 09:46:11 +02:00
Branden Chan
7dbd58f6be
Add about sections (#1195) 2021-06-14 18:37:00 +02:00
vblagoje
2a5882578a
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086)
* Integrate LFQA with Haystack

* Integrate LFQA with Haystack - unit tests

* Properly initialize conftest default value for vector_dim

* Update PR after inital feedback

* Fix conftest.py import

* Seq2SeqGenerator uses Callables instead of subclasses for custom model input

* Update docstring

* Fix Callable use

* Add LFQA tutorials

* Improve type error reporting for invalid input converter Callable

* Generate docstrings

* Format comments in tutorial script

* Generate tutorial md

* Add usage page

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-06-14 17:53:43 +02:00
venuraja
ae55927f58
Weaviate: Update Embeddings - Use update instead of replace (#1181)
* Update Embeddings logic improved

* Update Embeddings logic improved
2021-06-14 17:50:55 +02:00
Shahrukh Khan
1a3b4b9c74
Fix typo in Query Classifier Exception Message(#1190) 2021-06-14 17:40:35 +02:00
Julian Risch
f6e70f0f3d
Removed single_model_path; added infer_tokenizer to dpr load() (#1060) 2021-06-14 14:14:46 +02:00
Julian Risch
1c31589b43
Bump to FARM 0.8.0, torch 1.8.1 and transformers 4.6.1 (#1192)
* bump to FARM 0.8.0, which in turn bumps torch 1.8.1 and transformers 4.6.1 (#1192)

* Replace deprecated force_bos_token_to_be_generated parameter
2021-06-14 13:00:41 +02:00
Bob van Luijt
f583d0bfaf
Minor change with a link to the Weaviate docs (#1180)
Super minor change, but in line with other DocumentStore's
2021-06-11 21:20:23 +02:00
Branden Chan
e7937ac5d7
Reformat FAQ page (#1177)
* Add faq page

* Update faq.md

* Fix mypy CI

* Add question

* Reformat faq
2021-06-11 11:59:52 +02:00
Branden Chan
783893c3d2
Tutorial update (#1166)
* Add header / footer

* Add Milvus example

* Generate md files

* Fix mypy CI
2021-06-11 11:09:15 +02:00
Branden Chan
13edff109d
Documentation update (#1162)
* Add content

* Add German BERT references

* Mention preprocessor language

* Fix mypy CI

* Add document length recommendation

* Add more languages
2021-06-11 11:06:57 +02:00
Branden Chan
41b537affe
Add FAQ page (#1151)
* Add faq page

* Update faq.md

* Fix mypy CI

* Add question
2021-06-10 17:29:14 +02:00
venuraja79
49886f88f0
Integrate Weaviate as another DocumentStore (#1064)
* Annotation Tool: data is not persisted when using local version #853

* First version of weaviate

* First version of weaviate

* First version of weaviate

* Updated comments

* Updated comments

* ran query, get and write tests

* update embeddings, dynamic schema and filters implemented

* Initial set of tests and fixes

* Tests added for update_embeddings and delete documents

* introduced duplicate documents fix

* fixed mypy errors

* Added Weaviate to requirements

* Fix the weaviate docker env variables

* Fixing test dependencies for now

* Created weaviate test marker and fixed query

* Update docstring

* Add documentation

* Bump up weaviate version

* Bump up weaviate version in documentation

* Bump up weaviate version in documentation

* Updgrade weaviate version

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-10 09:43:53 +02:00
Lalit Pagaria
db17d73a82
Fixing issues caused due to mypy upgrade (#1165) 2021-06-09 16:24:39 +02:00
Branden Chan
5f0f85989a
Refresh API docs (#1152) 2021-06-09 16:13:58 +02:00
Shahrukh Khan
545c625a37
Add QueryClassifier incl. baseline models (#1099)
* restructure query classifier code and add s3 based pickles

* make model and vectorizer optional in query classifier

* update query classifier as per init style

* add query classifiers sklearn/hf

* update docstrings for query classifiers

* add unit test for query classifier

* add type patch for sklearn classifier

* fix mypy type issue

* revert to pure formatting

* add query classifiers

* resolve conflict

* add output names for query classifier

* revert output and update docstring queryclassifier

* Update docstring for SklearnQueryClassifier

* update transformer query classifier docstring

* fix typo

* change arg names in query classifier classes

* add set_config(). rename attributes

* fix set_config()

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-08 15:20:13 +02:00
Malte Pietsch
600636e77b
Update README.md 2021-06-08 09:23:56 +02:00
Branden Chan
59e3c55c47
Add More top_k handling to EvalDocuments (#1133)
* Improve top_k support

* Adjust warning

* Satisfy mypy

* Reinit eval counts if top_k has changed

* Incorporate reviewer feedback
2021-06-07 12:11:00 +02:00
Branden Chan
c513865566
Add L2 support for FAISS HNSW (#1138) 2021-06-04 11:05:18 +02:00
Julian Risch
580e28344d
Add docu of confidence scores and calibration method (#1131)
* Add docu of confidence scores and calibration method
2021-06-03 15:49:07 +02:00
Malte Pietsch
a1472b040c
Add badges (#1136) 2021-06-03 14:47:08 +02:00
Malte Pietsch
b41719b7c8
Add config to JoinDocuments node to allow yaml export in pipelines (#1134)
* add config to JoinNode to allow yaml export

* remove test print
2021-06-03 11:03:25 +02:00
Julian Risch
8e3d0d1287
Distinguish labels for calculating similarity scores (#1124)
* Distinguish labels for calculating similarity scores

* Explain label "0" and "1" of TextPairClassifier in Ranker
2021-06-02 17:33:36 +02:00
Branden Chan
b555bc525c
Remove duplicate run (#1132) 2021-06-02 13:58:55 +02:00
Branden Chan
09ba75073c
Improve Milvus HNSW Performance (#1127)
* Add simplified script

* Optimize HNSW index creation

* Adjust benchmark order

* Rename script
2021-06-02 13:17:35 +02:00