3803 Commits

Author SHA1 Message Date
Branden Chan
937247d628
Add QuestionGenerator (#1267)
* Create basic Question Generation

* Split texts into 50 word chunks

* Allow prompt to be changed

* Implement iteration functionality in DS

* Add docstrings, create pipelines

* Make pipelines work

* Add comments

* Add tests

* Add tutorials and docs

* Add doc string
2021-07-26 17:20:43 +02:00
Branden Chan
363be65a78
Implement OpenSearch ANN (#1225)
* Simplify ODES init

* Add arguments to ES init and create script

* Rename similarity_fn_name and add util fn

* Create OpenSearchDocumentStore

* Specify params of Open Search HNSW

* Add better argument handling

* Update opensearch index mapping

* Edit opensearch default port

* Fix HNSW mapping

* Force small HNSW params

* Implement auto start and stopping of document store services

* Fix starting and stopping of ds service

* Restore HNSW params

* Add opensearch query benchmarks

* Add write wait time

* Revert wait time

* Add timeout

* Update benchmarks

* Update benchmarks

* Update benchmarks json

* Update documentation

* Update documentation

* Fix similarity name

* Improve argument passing

* Improve stopping and starting of service
2021-07-26 10:52:52 +02:00
Malte Pietsch
4c2a0b914a
Remove pipeline eval example script (#1297) 2021-07-21 11:12:04 +02:00
Srevin Saju
7d6548100a
Add support for elasticsearch to connect without any authentication (#1294) 2021-07-21 10:47:52 +02:00
oryx1729
e857233313
Add Header in sample REST API Search Request (#1293) 2021-07-19 12:57:43 +02:00
oryx1729
3f58d4c13b
Fix SQLAlchemy relationship warnings (#1289) 2021-07-15 17:59:59 +02:00
Bob van Luijt
8dae844447
Bump Weaviate version to 1.5 (#1287)
* bump Weaviate version to 1.5

* bump Weaviate version to 1.5
2021-07-15 08:26:22 +02:00
Ikram Ali
97c1e2cc90
[document_store] Raise warning when labels are overwritten (#1257)
* [document_store]SQLDocumentStore write_labels() overwrite warning added.

* [document_store]SQLDocumentStore write_labels() overwrite warning added.

* [document_store] bug fixed. #1140

* [document_store] bug fixed. #1140

* [document_store] get_labels_by_id() method removed. #1140

* [document_store] Code refactor. fix #1140

* [document_store] Code refactor. fix #1140

* [document_store] elasticsearch document store Code refactor. fix #1140

* [document_store] elasticsearch document store Code refactor. fix #1140

* [document_store] elasticsearch document store Code refactor. fix #1140

* [document_store] Code refactor for better visibility. fix #1140

* [document_store] Inmemory document store duplicate labels warning added fix #1140
2021-07-14 16:21:04 +02:00
Branden Chan
da97d81305
Change variable names (#1286) 2021-07-14 14:03:34 +02:00
Branden Chan
7717e81ecc
Improve preprocessing logging (#1263)
* Improve preprocessing logging

* Change variable names

* Change variable names

* Satisfy mypy
2021-07-14 14:03:13 +02:00
oryx1729
c318b5853b
Serialize crawler output to JSON (#1284) 2021-07-14 13:16:27 +02:00
Julian Risch
4e6f7f349d
Add FARMClassifier node for Document Classification (#1265)
* Add FARM classification node

* Add classification output to meta field of document

* Update usage example

* Add test case for FARMClassifier

* Replace FARMRanker with FARMClassifier in documentation strings

* Remove base method not implemented by any child class, etc.
2021-07-13 21:44:26 +02:00
Antonio De Marinis
f79d9bdca6
Upgrade streamlit and adjust height of result texts dynamically (#1279)
* update to latest streamlit and st-annotated-text

* improve ui results by passing dynamic height to annotated-text
2021-07-13 18:59:39 +02:00
threepointsomeone
2f93c2ddd5
Added explicit refresh call during refresh_type is false in update embedding. (#1259)
Co-authored-by: vishwaspai <vishwas.pai@emplay.net>
2021-07-13 16:59:09 +02:00
Julian Risch
90f826e95e
Add links to tutorial 12 to readme (#1274) 2021-07-13 11:23:10 +02:00
Julian Risch
2a90471c73
Encapsulate tutorial code in method (#1266) 2021-07-09 17:08:19 +02:00
Julian Risch
dbb9efbd39
Add SentenceTransformersRanker with pre-trained Cross-Encoder (#1209)
* Add SentenceTransformersRanker with pre-trained Cross-Encoder

* Add test cases for Ranker nodes and update documentation

* update docstring

* Update docstring

* Update __init__.py

* update import for test

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-07-07 17:31:45 +02:00
Moshe Berchansky
495f98deba
Add global_loss_buffer_size to the DensePassageRetriever, in order to fix 'encoded data exceeds max_size' error with DDP. (#1245) 2021-07-06 13:56:41 +02:00
Ikram Ali
f5a8d3cf45
Add id in write_labels() for SQLDocumentStore (#1253) 2021-07-05 14:13:21 +02:00
Ikram Ali
8e117f5e11
ElasticsearchDocumentStore get_label_count() bug fixed. (#1252) 2021-07-03 20:51:33 +02:00
Ikram Ali
04a470f890
SQLDocumentStore get_label_count() bug fixed. (#1251) 2021-07-03 14:02:44 +02:00
Michaël Bitard
aaed22304d
Fix convert integer CONCURRENT_REQUEST_PER_WORKER (#1247) 2021-07-02 20:38:15 +02:00
Ikram Ali
29e140196b
[pipeline] Allow for batch indexing when using Pipelines fix #1168 (#1231)
* [pipeline] Allow for batch indexing when using Pipelines fix #1168

* [pipeline] Test case fixed fix #1168

* [file_converter] Path.suffix updated #1168

* [file_converter] meta can be one of these three cases:
                 A single dict that is applied to all files
                 One dict for each file being converted
                 None #1168

* [file_converter] mypy error fixed.

* [file_converter] mypy error fixed.

* [rest_api] batch file upload introduced in indexing API.

* [test_case] Test_api file upload parameter name updated.

* [ui] Streamlit file upload parameter updated.
2021-06-30 14:13:46 +02:00
Malte Pietsch
5e23e72f31 Update issue templates 2021-06-30 12:12:07 +02:00
Guillim
73a4f9825a
Add env var CONCURRENT_REQUEST_PER_WORKER (#1235)
* we create an env var `CONCURRENT_REQUEST_PER_WORKER` following your naming convention, (I came a few commit backwards to find the original name)

* default to 4
2021-06-29 07:44:25 +02:00
Malte Pietsch
2c964db62d
Relax typing for meta data in REST API (#1224) 2021-06-24 12:34:42 +02:00
Malte Pietsch
2caeea000e
Small UI and REST API fixes (#1223)
* small fixes

* change default question
2021-06-24 09:53:08 +02:00
Julian Risch
17dcb8c23e
Use Reader's device by default (#1208)
* Use Reader's device by default

* Replace get_device with initialize_device_settings

* Add import statements for init_device_settings

* Remove unused get_device method
2021-06-24 09:22:34 +02:00
Branden Chan
10e332dabb
Fix Links (#1199)
* Fix link highlight

* Regen md files

* Remove duplicate

* Fix whitespace

* fixing strings for website

* Fix link

Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-06-23 19:07:54 +02:00
Branden Chan
efc03f72db
Make PreProcessor.process() work on lists of documents (#1163)
* Add process_batch method

* Rename methods

* Fix doc string, satisfy mypy

* Fix mypy CI

* Fix typp

* Update tutorial

* Fix argument name

* Change arg name

* Incorporate reviewer feedback
2021-06-23 18:13:51 +02:00
oryx1729
afee4f36ce
Add scaffold for defining custom components for Pipelines (#1205) 2021-06-23 12:01:54 +02:00
vblagoje
02fc4c7783
Improve document stores unit test parametrization (#1202) 2021-06-22 16:08:23 +02:00
Markus Paff
a8f3601e6a
Pin docs for 0.9.0 2021-06-22 10:38:08 +02:00
Ikram Ali
d835a9cdc5
[setup] version tag added to Haystack fix #1175 (#1216) 2021-06-22 09:43:26 +02:00
Stefano
66049abff0
Add arg to support different languages in PreProcessor's sentence segmentation (#1160)
* Add PreProcessor optional language parameter.

* Add iso639 to nltk languages.

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-21 18:53:19 +02:00
Julian Risch
9e4d7bf9be
Increase Haystack version to 0.9.0 (#1215) v0.9.0 2021-06-21 18:39:00 +02:00
oryx1729
0168f04385
Remove unused function _get_pseudo_prob (#1201) 2021-06-17 10:28:48 +02:00
C V Goudar
f9c4083006
Bugfix setting of device by defaulting to "cpu" (#1182)
* Defaulting the device to cpu in case gpu is not available and use_gpu is set to True

Co-authored-by: C V Goudar <cv.goudar@emplay.et>
2021-06-16 10:26:29 +02:00
Markus Paff
6cd49105e7
update api markdown files and add markdown file for ranker (#1198)
* update api markdown files and add markdown file for ranker

* added docstrings for weaviate

* new version of pydoc-markdown does not render arguments correctly. We used pydoc-markdown==3.11.0
2021-06-15 17:50:08 +02:00
Julian Risch
215c45eb8a
Remove quickfix from reader and ranker (#1196)
* Remove quickfix from ranker

* remove quickfix from reader

* Use inferencer's model instead of reloaded model
2021-06-15 09:46:11 +02:00
Branden Chan
7dbd58f6be
Add about sections (#1195) 2021-06-14 18:37:00 +02:00
vblagoje
2a5882578a
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086)
* Integrate LFQA with Haystack

* Integrate LFQA with Haystack - unit tests

* Properly initialize conftest default value for vector_dim

* Update PR after inital feedback

* Fix conftest.py import

* Seq2SeqGenerator uses Callables instead of subclasses for custom model input

* Update docstring

* Fix Callable use

* Add LFQA tutorials

* Improve type error reporting for invalid input converter Callable

* Generate docstrings

* Format comments in tutorial script

* Generate tutorial md

* Add usage page

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-06-14 17:53:43 +02:00
venuraja
ae55927f58
Weaviate: Update Embeddings - Use update instead of replace (#1181)
* Update Embeddings logic improved

* Update Embeddings logic improved
2021-06-14 17:50:55 +02:00
Shahrukh Khan
1a3b4b9c74
Fix typo in Query Classifier Exception Message(#1190) 2021-06-14 17:40:35 +02:00
Julian Risch
f6e70f0f3d
Removed single_model_path; added infer_tokenizer to dpr load() (#1060) 2021-06-14 14:14:46 +02:00
Julian Risch
1c31589b43
Bump to FARM 0.8.0, torch 1.8.1 and transformers 4.6.1 (#1192)
* bump to FARM 0.8.0, which in turn bumps torch 1.8.1 and transformers 4.6.1 (#1192)

* Replace deprecated force_bos_token_to_be_generated parameter
2021-06-14 13:00:41 +02:00
Bob van Luijt
f583d0bfaf
Minor change with a link to the Weaviate docs (#1180)
Super minor change, but in line with other DocumentStore's
2021-06-11 21:20:23 +02:00
Branden Chan
e7937ac5d7
Reformat FAQ page (#1177)
* Add faq page

* Update faq.md

* Fix mypy CI

* Add question

* Reformat faq
2021-06-11 11:59:52 +02:00
Branden Chan
783893c3d2
Tutorial update (#1166)
* Add header / footer

* Add Milvus example

* Generate md files

* Fix mypy CI
2021-06-11 11:09:15 +02:00
Branden Chan
13edff109d
Documentation update (#1162)
* Add content

* Add German BERT references

* Mention preprocessor language

* Fix mypy CI

* Add document length recommendation

* Add more languages
2021-06-11 11:06:57 +02:00