21 Commits

Author SHA1 Message Date
Malte Pietsch
183fd5ae5a
Simplify tests & allow running on individual doc stores (#1487)
* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* update readme and contributing.md

* update contributing

* adjust example

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-27 10:52:07 +02:00
Ikram Ali
f186d6327d
Add MostSimilarDocumentsPipeline (#1413)
* [pipeline] MostSimilarDocumentsPipeline added

* [pipeline] mypy bug fixed.

* [pipeline] mypy bug fixed.

* [pipeline] test cases added.

* [pipeline] test cases added.

* [pipeline] set return_embedding back to false.

* [pipeline] return a list of Documents

* [pipeline] define the ids

* [pipeline] code refactor.

* [pipeline] code refactor.

* [pipeline] test case improved.

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-13 12:43:45 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Julian Risch
eb990c9688
Removing probability field from answers in favor of score field (#1340)
* Removing probability field from reader and from test cases

* Add switch to FARMReader to choose score/probability

* Remove probability field from doc returned by doc store

* Relax assertion testing joined es and dpr predictions

* Use switch for confidence scores also for no_answer

* Add test that checks switching to old answer scores > 10

* Normalize score in elastic doc store and reset reader.md

* Scale weights of JoinDocuments to sum to 1 and adapt test case
2021-08-17 10:27:11 +02:00
oryx1729
bafa1b46de
Add Ray integration for Pipelines (#1255) 2021-08-02 14:51:24 +02:00
Ikram Ali
29e140196b
[pipeline] Allow for batch indexing when using Pipelines fix #1168 (#1231)
* [pipeline] Allow for batch indexing when using Pipelines fix #1168

* [pipeline] Test case fixed fix #1168

* [file_converter] Path.suffix updated #1168

* [file_converter] meta can be one of these three cases:
                 A single dict that is applied to all files
                 One dict for each file being converted
                 None #1168

* [file_converter] mypy error fixed.

* [file_converter] mypy error fixed.

* [rest_api] batch file upload introduced in indexing API.

* [test_case] Test_api file upload parameter name updated.

* [ui] Streamlit file upload parameter updated.
2021-06-30 14:13:46 +02:00
Shahrukh Khan
545c625a37
Add QueryClassifier incl. baseline models (#1099)
* restructure query classifier code and add s3 based pickles

* make model and vectorizer optional in query classifier

* update query classifier as per init style

* add query classifiers sklearn/hf

* update docstrings for query classifiers

* add unit test for query classifier

* add type patch for sklearn classifier

* fix mypy type issue

* revert to pure formatting

* add query classifiers

* resolve conflict

* add output names for query classifier

* revert output and update docstring queryclassifier

* Update docstring for SklearnQueryClassifier

* update transformer query classifier docstring

* fix typo

* change arg names in query classifier classes

* add set_config(). rename attributes

* fix set_config()

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-08 15:20:13 +02:00
oryx1729
99990e7249
Add export of Pipeline YAML config (#1003) 2021-04-30 12:23:29 +02:00
oryx1729
7269530e45
Add validation for root node in Pipeline (#987) 2021-04-21 12:18:33 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
oryx1729
e9f0076dbd
Fix execution of Pipelines with parallel nodes (#901) 2021-03-18 12:41:30 +01:00
oryx1729
e0a118fd9a
Add support for parallel paths in Pipeline (#884) 2021-03-10 18:17:23 +01:00
Tanay Soni
07907f9eac
Add support for indexing pipelines (#816) 2021-02-16 16:24:28 +01:00
Lalit Pagaria
5bd94ac5f7
Adding Translator (standalone component & wrapper for pipelines) (#782)
* Adding translator with many generic input parameter support

* Making dict_key as generic

* Fixing mypy issue

* Adding pipeline and using opus models

* Add latest docstring and tutorial changes

* Adding test cases for end-to-end translation for generator, summerizer etc

* raise error join and merge nodes

* Fix test failure

* add docstrings. add usage documentation. rm skip_special_tokens param

* Add latest docstring and tutorial changes

* fix code snippets in md

* Adding few extra configuration parameters and fixing tests

* Fixingmypy issue and updating usage document

* fix for mypy issue in pipeline.py

* reverting renaming of pytest_collection_modifyitems method

* Addressing review comments

* setting skip_special_tokens to True

* removing model_max_length argument as None type is not supported to many models

* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 15:58:26 +01:00
Tanay Soni
8a5dc8f826
Load Pipeline with YAML config file (#785) 2021-02-02 17:32:17 +01:00
Lalit Pagaria
9f7f95221f
Milvus integration (#771)
* Initial commit for Milvus integration

* Add latest docstring and tutorial changes

* Updating implementation of Milvus document store

* Add latest docstring and tutorial changes

* Adding tests and updating doc string

* Add latest docstring and tutorial changes

* Fixing issue caught by tests

* Addressing review comments

* Fixing mypy detected issue

* Fixing issue caught in test about sorting of vector ids

* fixing test

* Fixing generator test failure

* update docstrings

* Addressing review comments about multiple network call while fetching embedding from milvus server

* Add latest docstring and tutorial changes

* Ignoring mypy issue while converting vector_id to int

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-29 13:29:12 +01:00
Tanay Soni
4c2804e38e
Add support for aggregating scores in JoinDocuments node (#683) 2020-12-16 15:54:58 +01:00
Tanay Soni
33fe597949
Cleanup Pytest Fixtures (#639) 2020-12-14 18:15:44 +01:00
Tanay Soni
8e52b48e1d
Add pipelines for GenerativeQA & FAQs (#645) 2020-12-03 10:27:06 +01:00
Tanay Soni
5e62e54875
Rename question parameter to query (#614) 2020-11-30 17:50:04 +01:00
Tanay Soni
e3a68aedaf
Add support for building custom Search Pipelines (#596) 2020-11-20 17:41:08 +01:00