601 Commits

Author SHA1 Message Date
tstadel
fd46a42130
Allow to deploy and undeploy Pipelines on Deepset Cloud (#2285)
* add deploy_on_deepset_cloud and undeploy_on_deepset_cloud

* increase polling interval to 5 seconds

* Update Documentation & Code Style

* improve logging

* move transitioning logic to PipelineClient

* use enum for Pipeline states

* improve docstrings

* Update Documentation & Code Style

* tests added

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-10 09:49:28 +01:00
Sara Zan
e85b948a4c
Fix PreProcessor test (#2290)
* Adding Document import, missing from recent PR

* Fix mypy signature warning too

* reduce diff to minimum

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-09 13:46:47 +01:00
Dmitry Goryunov
ecec9b4e2c
Remove substrings basic implementation (#2152)
* Remove substrings basic implementation

* Update Documentation & Code Style

* Remove substrings basic tests

* Simplify test
2022-03-08 15:49:56 +01:00
Vladimir Blagojevic
6c0094b5ad
Update LFQA with the latest LFQA seq2seq and retriever models (#2210)
* Register BartEli5Converter for vblagoje/bart_lfqa model

* Update LFQA unit tests

* Update LFQA tutorials
2022-03-08 15:11:41 +01:00
tstadel
4b46f2047b
save_to_deepset_cloud: automatically convert document stores (#2283)
* automatically convert to DeepsetCloudDocumentStore

* shorten info text.

* fix typo

* the -> this

* add test

* ensure request body has only DeepsetCloudDocumentStores

* mark test as elasticsearch to fix milvus1 ci
2022-03-07 22:35:15 +01:00
tstadel
dde9d59271
fix pip backtracking issue (#2281)
* fix pip backtracking issue

* restrict azure-core version

* Remove the trailing comma

* Add skip_magic_trailing_comma in pyproject.toml for pydoc compatibility

* Pin pydoc-markdown _again_

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-07 19:25:33 +01:00
bogdankostic
447baf77ef
Fix skipping of tests using document stores (#2268)
* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 15:19:27 +01:00
tstadel
f7a01624e0
Refactor Pipeline peripherals (#2253)
* move peripheral stuff to utils, add more and better tests

* Update Documentation & Code Style

* move config related peripherals to config module, fix tests

* Update Documentation & Code Style

* remove unnecessary list comprehensions

* apply ZanSara's feedback

* remove classes in pipeline utils

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 14:14:42 +01:00
bogdankostic
c5542bd3fb
Add RouteDocuments and JoinAnswers nodes (#2256)
* Add SplitDocumentList and JoinAnswer nodes

* Update Documentation & Code Style

* Add tests + adapt tutorial

* Update Documentation & Code Style

* Remove branch from installation path in Tutorial

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Change name of SplitDocumentList to RouteDocuments

* Update Documentation & Code Style

* Adapt tutorials to new name

* Add test for JoinAnswers

* Update Documentation & Code Style

* Adapt name of test for JoinAnswers node

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-01 17:42:11 +01:00
MichelBartels
2c423ba063
Introduce support for pymilvus>=2.0.0 (#2126)
* update remaining occurences of get_connection

* fix milvus2 import and fix wrong extra references

* change MilvusDocumentStore to Milvus1DocumentStore

* update milvus docstrings to reflect updated dependency management

* enable milvus 2 tests

* fix milvus2 env variable processing

* fix dropping collections for each milvus 2 test

* make Milvus 2 doc store tests work

* allow user to specify consistency level

* Fist attempt at running Milvus2 in the CI

* Install the correct pymilvus

* add batch deletion for milvus2

* change default from milvus 1 to milvus 2

* make milvus2 the default in the docstores extra

* Switch milvus1 and milvus2 in base test run on CI

* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'

* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py

* Enable autogenerated docs for Milvus1 and 2 separately

* Partial fix to docstring of Milvus2DocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-24 17:43:38 +01:00
bogdankostic
b03e9f5872
Fix surrounding context extraction in ParsrConverter (#2162)
* Fix surrounding context extraction

* Update Documentation & Code Style

* Unify Parsr and Azure + add test

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-24 14:58:36 +01:00
tstadel
e20f2e0d54
Generate code from pipeline (pipeline.to_code()) (#2214)
* pipeline.to_code() with jupyter support

* Update Documentation & Code Style

* add imports

* refactoring

* Update Documentation & Code Style

* docstrings added and refactoring

* Update Documentation & Code Style

* improve imports code generation

* add comment param

* Update Documentation & Code Style

* add simple test

* add to_notebook_cell()

* Update Documentation & Code Style

* introduce helper classes for code gen and eval report gen

* add more tests

* Update Documentation & Code Style

* fix Dict typings

* Update Documentation & Code Style

* validate user input before code gen

* enable urls for to_code()

* Update Documentation & Code Style

* remove all chars except colon from validation regex

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-23 11:08:57 +01:00
bogdankostic
4bad21e961
Add Brownfield Support of existing Elasticsearch indices (#2229)
* Add method to transform existing ES index

* Add possibility to regularly add new records

* Fix types

* Restructure import statement

* Add use_system_proxy param

* Update Documentation & Code Style

* Change location and name + add test

* Update Documentation & Code Style

* Add test cases for metadata fields

* Fix linter

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 20:58:57 +01:00
tstadel
965cc86b24
Fix ef_search param for hnsw in OpenSearchDocumentStore (#2227)
* fix ef_search param for hnsw

* Update Documentation & Code Style

* adjust ef_search param if index exists

* run black

* Fix label index recreation

* fix merge conflict

* merge source branch 'master' into fix_ef_search_param

* fix pylint issue

* fix test_pipeline_components test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 20:33:21 +01:00
MichelBartels
6918d5b79e
Fix missing embeddings not skipped if filters are used (#2230)
* fix skip embeddings param for elasticsearch when filters are specified

* Update Documentation & Code Style

* add test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 18:40:34 +01:00
MichelBartels
f4efc008f4
Adding extended meta data filtering support for InMemoryDocumenStore (#2120)
* add filter classes

* update filter comments

* Add util classes for converting filters (#2123)

* Apply Black

* reintroduce eval functions to filter ops

* Update documentation

* update to latest pymilvus version

* Apply Black

* fixing type hints

* Apply Black

* update write_documents method of milvus2 doc store

* remove unnecessary method

* update init

* remove changes to milvus 2 as they are part of other PR

* remove changes to milvus 2 as they are part of other PR

* updating doc strings to match elastic search filter doc

* Update Documentation & Code Style

* add support for case where there is no meta data defined for key

* update behaviour in case of field not existing in entry

* Update Documentation & Code Style

* add test for InMemoryDocumentStore extended meta data filtering

* make type hint more precise

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-02-22 17:44:58 +01:00
tstadel
fe03ca70de
Fix Pipeline.components (#2215)
* add components property, improve get_document_store()

* Update Documentation & Code Style

* use pipeline.get_document_store() instead of retriever.document_store

* add tests

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 15:01:07 +01:00
MichelBartels
116fe2db26
Extend meta data support for SQLDocumentStore (#2199)
* update remaining occurences of get_connection

* first commit to add extended metadata filtering support to sql

* fix bugs

* adding sql doc store instead of milvus

* removing updates to milvus2 from other PR

* fixing not operator

* delete left over line

* remove unnecessary import

* Update Documentation & Code Style

* fix circular import

* fix left over merge conflict

* Update Documentation & Code Style

* fix abstract class

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 20:40:32 +01:00
Sara Zan
2a840ee248
YAML versioning (#2209)
* Make YAML files get the same version as Haystack and throw warning at load in case of mismatch

* Update version of most YAMLs in the codebase (aesthethic chamge, only to avoid the warning).

* Remove quotes from version in tests

* Fix version in generate_json_schema.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 12:22:37 +01:00
bogdankostic
2a674eaff7
Support more data types and extended filters in WeaviateDocStore (#2143)
* Support more data types and extended filters in WeaviateDocStore

* Adapt types to extended filters

* Update Documentation & Code Style

* Fix mypy

* Fix type of filters

* Update Documentation & Code Style

* Add Docstrings for BaseDocStore

* Update Documentation & Code Style

* Add + prettify DocStrings

* Update Documentation & Code Style

* Fix types

* Update Documentation & Code Style

* Remove import of TypedDict

* Fix tests

* Update Documentation & Code Style

* Fix circular import

* Fix inversion of not operation + add test case

* Fix mypy

* Update Documentation & Code Style

* Apply black

* Use convert_date_to_rfc3339 instead of datetime.fromisoformat

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-18 08:55:17 +01:00
tstadel
ed6e64494e
Fix typo in save_to_deepset_cloud() (#2189)
* fix typo in method name

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 09:07:58 +01:00
tstadel
db4d6f43ba
Add tests on MultiLabel's meta and filter aggregation (#2169) 2022-02-11 17:42:47 +01:00
tstadel
9e18239e3b
pipeline.save_to_deepset_cloud() (#2145)
* add list_pipelines_on_deepset_cloud()

* add Pipeline.save_to_deepset_cloud()

* apply black

* fix imports

* Update Documentation & Code Style

* add load_from_config

* Update Documentation & Code Style

* fix pipeline name for indexing pipeline

* add tests

* Update Documentation & Code Style

* handle deployed pipelines

* make single pipeline config info requests instead of loading all infos

* make ROOT_NODE_TO_PIPELINE_NAME global

* better response validation for saving and updating pipeline configs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-11 12:50:53 +01:00
mathislucka
f11494a9d0
Join node should allow reciprocal rank fusion as additional merging method (#2133)
* join node should allow reciprocal rank fusion

* Update Documentation & Code Style

* add missing merging mode

* tuples are immutable

* take correct results from pipeline

* Update Documentation & Code Style

* Simple docstrings, use ValueError

* Use K=60

* Minor refactoring

* precalculate expected result in test

* Update Documentation & Code Style

* refactor to make more clear

* rm unused imports

* tests should test only one thing

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dmigo <d.f.goryunov@gmail.com>
2022-02-10 16:58:40 +01:00
tstadel
1bdd1f48fd
Fix windows ci tests (#2144)
* move commandline args to global conftest

* correct test exclude paths

* Update Documentation & Code Style

* exclude test_generator_pipeline_with_translator from windows ci

* exclude further oom tests

* enable log_cli

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 21:29:05 +01:00
Julian Risch
7fab027bf0
Evaluating a pipeline consisting only of a reader node (#2132)
* pass documents as extra param to eval

* pass documents via labels to eval

* rename param in docs

* Update Documentation & Code Style

* Revert "rename param in docs"

This reverts commit 2f4c2ec79575e9dd33a8300785f789a327df36f4.

* Revert "pass documents via labels to eval"

This reverts commit dcc51e41f2637d093d81c7d193b873c17c36b174.

* simplify iterating through labels and docs

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 09:18:58 +01:00
tstadel
1e3edef803
List all pipeline(_configs) on Deepset Cloud (#2102)
* add list_pipelines_on_deepset_cloud()

* Apply Black

* refactor auto paging and throw DeepsetCloudErrors

* Apply Black

* fix mypy findings

* Update documentation

* Fix merge error on pipelines.md

* Update Documentation & Code Style

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-08 20:35:25 +01:00
Sara Zan
ffbba90323
Move pytest configuration into pyproject.toml (#2141)
* Move pytest configuration into pyproject.toml

* Fix markers format

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-08 17:23:59 +01:00
Sara Zan
957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00
bogdankostic
f062911040
Extend metadata filtering support in ElasticsearchDocumentStore (#2108)
* Add extended filtering to ESDocumentStore

* Add Docstrings

* Fix definition of filter queries

* Fix mypy

* Add tests

* Add latest docstring and tutorial changes

* Adapt Docstrings

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Add filtering utils for same representation in all doc stores

* Apply balck formatting

* Update documentation

* Fix mypy

* Apply Black

* Fix mypy

* Adopt Doc Strings

* Add more tests

* Apply Black

* Allow filtering in OpenSearchDocStore

* Update documentation

* Adapt Docstrings

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 13:43:12 +01:00
mathislucka
34f9308e1a
Simplify SQuAD data to df conversion (#2124)
* Conversion to df does not need initialization

* Apply Black

* fix test case

* Apply Black

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 12:37:56 +01:00
Julian Risch
53decdcefb
Allow different filters per query in pipeline evaluation (#2068)
* add filters attribute to labels and use in eval

* Add latest docstring and tutorial changes

* overwrite params if None

* populate filters from Label to MultiLabel

* add query_id in eval df and deepcopy params for each label

* fix mypy

* add test for aggregating filters in multilabel

* use query ids also in answers df

* loop through unique query_ids

* hash filters and query text as id

* Add latest docstring and tutorial changes

* fix top_k reader eval

* Apply Black

* rename query_id to id/multilabel_id

* Apply Black

* json dump filters in dataframe

* add filters and id to wrong_examples()

* Apply Black

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-03 19:19:05 +01:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
tstadel
9974593c5e
Fix Seq2SeqGenerator return type (#2099)
* return proper Answer objs

* fix docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 00:20:24 +01:00
Sara Zan
3a6e64b2a3
Make FileTypeClassifier more flexible (#2101)
* Make FileTypeClassifier more flexible

* Make supported_types a init parameter

* Add tests and fix a couple of bugs

* Formatting

* Fix mypy

* Implement feedback
2022-02-02 17:51:04 +01:00
mathislucka
88771b2bee
Provide option to recreate es doc store on initialization (#2084)
* provide option to recreate es doc store on initialization

* Add latest docstring and tutorial changes

* Label expects more arguments

* Label expects also an answer

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-02 11:03:15 +01:00
Sowmiya Jaganathan
7d769d8bf1
Fixed the Search Field mapping in ElasticSearch DocumentStore (#2080)
* Review changes

* Added the synonym analyser for search fields

* Added the review requests.

* Added the synonyms the OpenSearchDocumentStore and review requests.
2022-01-31 11:11:20 +01:00
Kristof Herrmann
7764b6992c
DC SDK - load pipeline from deepset cloud (#2013)
* initial load_from_dc

* typo

* adjusted api endpoint

* removed kwargs

* added _load_from_dict

* refactor pipeline loading mechanism

* renaming load_from_dc api

* renaming

* fixed errors

* fix comments and environment variable overrides

* Add latest docstring and tutorial changes

* fix outdated YAML examples

* Add latest docstring and tutorial changes

* Introduce readonly DCDocumentStore (without labels support) (#1991)

* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>

* introduce DeepsetCloudAdapter

* Add latest docstring and tutorial changes

* introduce DeepsetCloudClient

* Add latest docstring and tutorial changes

* use json api for pipeline_config

* indexing pipeline test added

* pseudo change to force cache eviction

* revert pseudo change to force cache eviction

* remove conftest duplicates

* minor formatting and docstring fixes

* fix tests when MOCK_DC=False

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-28 17:32:56 +01:00
Sara Zan
d470b9d0bd
Improve dependency management (#1994)
* Fist attempt at using setup.cfg for dependency management

* Trying the new package on the CI and in Docker too

* Add composite extras_require

* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports

* Fix bug on class import and rephrase error message

* Introduce typing for optional modules and add type: ignore in sparse.py

* Include importlib_metadata backport for py3.7

* Add colab group to extra_requires

* Fix pillow version

* Fix grpcio

* Separate out the crawler as another extra

* Make paths relative in rest_api and ui

* Update the test matrix in the CI

* Add try catch statements around the optional imports too to account for direct imports

* Never mix direct deps with self-references and add ES deps to the base install

* Refactor several paths in tests to make them insensitive to the execution path

* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix

* Wrap pdf conversion utils into safe_import

* Update some tutorials and rever Milvus1 as default for now, see #2067

* Fix mypy config


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 18:12:55 +01:00
Sowmiya Jaganathan
c4fff19018
Supported Highlighting in Elasticsearch (#1930)
* Supported Highlighting

* Review changes

* add example to docstrings

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-26 17:35:33 +01:00
Adrien Wald
2edc421a09
Add top_k_join parameter to JoinDocuments.run (#2065)
* add top_k_join parameter to JoinDocuments.run

* test JoinDocuments concatenate with top_k_join parameter

* test two different top_k_join parameters
2022-01-26 17:30:16 +01:00
mathislucka
5b7e906e85
fix: get_documents_by_id should return docs for all passed ids (#2064)
* doc store should return all documents matching ids passed to get_documents_by_id

* test for get_document_by_id should be named correctly

* add test for get_documents_by_id

* Add latest docstring and tutorial changes

* document es query limit

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 12:39:04 +01:00
tstadel
8a32d8da92
Introduce readonly DCDocumentStore (without labels support) (#1991)
* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-25 20:36:28 +01:00
MichelBartels
5b6b0cef77
Add UnlabeledTextProcessor (#2054)
* add UnlabeledTextProcessor

* allow choosing processor when finetuning or distilling

* fix type hint

* Add latest docstring and tutorial changes

* improve segment id computation for UnlabeledTextProcessor

* add text and documentation

* change batch size parameter for intermediate layer distillation

* Add latest docstring and tutorial changes

* fix distillation dim mapping

* remove unnecessary changes

* removed confusing parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-25 14:54:34 +01:00
MichelBartels
e8cd5ea943
Add distillation to finetuning tutorial (#2025)
* Add finetuning tutorial

* Add latest docstring and tutorial changes

* fix typo

* Add latest docstring and tutorial changes

* improve distillation explanation in finetuning tutorial

* Add latest docstring and tutorial changes

* allow augment_squad.py to be easier to call from within python

* Update Tutorial2_Finetune_a_model_on_your_data.py

* fix squad augmentation test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-20 12:18:32 +01:00
MichelBartels
0cca2b97cd
distinguish intermediate layer & prediction layer distillation phases with different parameters (#2001)
* add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation

* Add latest docstring and tutorial changes

* improve default parameters

* Add latest docstring and tutorial changes

* split up distillation method

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 20:40:38 +01:00
tstadel
f42d2e8ba0
Add nDCG to pipeline.eval()'s document metrics (#2008)
* add ndcg metric

* fix merge

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 18:36:41 +01:00
Julian Risch
a3147cae47
Add isolated node eval mode in pipeline eval (#1962)
* run predictions on ground-truth docs in reader

* build dataframe for closed/open domain eval

* fix looping through multilabel

* fix looping through multilabel's list of labels

* simplify collecting relevant docs

* switch closed-domain eval off by default

* Add latest docstring and tutorial changes

* handle edge case params not given

* renaming & generate pipeline eval report

* add test case for closed-domain eval metrics

* Add latest docstring and tutorial changes

* test  report of closed-domain eval

* report closed-domain metrics only for answer metrics not doc metrics

* refactoring

* fix mypy & remove comment

* add second for-loop & use answer as method input

* renaming & add separate loop building docs eval df

* Add latest docstring and tutorial changes

* source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957)
conda activate haystack-dev2

* change column order for evaluatation dataframe

* added missing eval column node_input

* generic order for both document and answer returning nodes; ensure no columns get lost

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>

* fix column reordering after renaming of node_input

* simplify tests &  add docu

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-01-14 14:37:16 +01:00
Sara Zan
e28bf618d7
Implement proper FK in MetaDocumentORM and MetaLabelORM to work on PostgreSQL (#1990)
* Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints

* update_document_meta() was not using index properly

* Exclude ES and Memory from the cosine_sanity_check test

* move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 13:48:58 +01:00
MichelBartels
3e4dbbb32c
Align similarity scores across document stores (#1967)
* align document store similarity functions

* remove unnecessary imports

* undone accidental change

* stopped weaviate from pretending to support dot product similarity

* stopped weaviate from pretending to support dot product similarity

* Add latest docstring and tutorial changes

* fix fixture params for document stores

* use cosine similarity for most tests

* fix cosine similarity test

* fix faiss test

* fix weaviate test

* fix accidental deletion

* fix document_store fixture

* test fix; shouldn't be merged

* fix test_normalize_embeddings_diff_shapes

* probably a better fix

* fix for parameter combinations

* revert new pytest_generate_tests functionality

* simplify pytest_generate_tests

* normalize embeddings for test_dpr_embedding

* add to faiss doc that embeddings are normalized

* Add latest docstring and tutorial changes

* remove unnecessary parameters and add comments

* simplify two lines of memory.py into one

* test similarity scores with smaller language model

* fix test_similarity_score


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-12 19:28:20 +01:00