1077 Commits

Author SHA1 Message Date
Sara Zan
13a9bc6a99
Fix bug on REST API for queries on empty document stores (#2161)
* Handle no answers and no documents scenarios in '_process_request'

* Fix tests

* Change return type in '_process_request'

* Return to use dicts

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 11:10:59 +01:00
Sara Zan
00795bd71e
Add type check for meta on REST API & add tests (#2184)
* Add type check for meta & add tests

* Improve tests

* Handle properly the ValueError ad an HTTPException

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 10:32:22 +01:00
tstadel
ed6e64494e
Fix typo in save_to_deepset_cloud() (#2189)
* fix typo in method name

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 09:07:58 +01:00
Dmitry Goryunov
548c285f8d
Add who uses Haystack section (#1975)
* Add Airbus, Alcatel-Lucent, Etlab, Deepset
* Add BetterUp, Sooth.ai, and Infineon as users

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2022-02-14 16:21:41 +01:00
Julian Risch
25d0f96ae2
Apply filter in eval only if no gold docs are given as input (#2154)
* Apply filter in eval only if no gold documents are given as input

* change type annotation of input documents in eval

* Update Documentation & Code Style

* fix mypy

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-14 15:43:12 +01:00
Sara Zan
be8f50c9e3
Add DELETE /feedback for testing and make the label's id generate server-side (#2159)
* Add DELETE /feedback for testing and make the ID generate server-side

* Make sure to delete only user generated labels

* Reduce fixture scope, was too broad

* Make test a bit more generic

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-14 11:43:26 +01:00
tstadel
db4d6f43ba
Add tests on MultiLabel's meta and filter aggregation (#2169) 2022-02-11 17:42:47 +01:00
Sara Zan
fdc36292f1
Align REST API and Haystack versions (#2164)
* Align REST API and Haystack versions

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-11 14:17:26 +01:00
tstadel
9e18239e3b
pipeline.save_to_deepset_cloud() (#2145)
* add list_pipelines_on_deepset_cloud()

* add Pipeline.save_to_deepset_cloud()

* apply black

* fix imports

* Update Documentation & Code Style

* add load_from_config

* Update Documentation & Code Style

* fix pipeline name for indexing pipeline

* add tests

* Update Documentation & Code Style

* handle deployed pipelines

* make single pipeline config info requests instead of loading all infos

* make ROOT_NODE_TO_PIPELINE_NAME global

* better response validation for saving and updating pipeline configs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-11 12:50:53 +01:00
tstadel
85d309f05e
Fix MultiLabel creation with aggregate_by_meta (#2165)
* fix MultiLabel population  with aggregate_by_meta

* fix comments

* Update Documentation & Code Style

* fix docid in filters and improve naming

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-10 21:47:58 +01:00
mathislucka
f11494a9d0
Join node should allow reciprocal rank fusion as additional merging method (#2133)
* join node should allow reciprocal rank fusion

* Update Documentation & Code Style

* add missing merging mode

* tuples are immutable

* take correct results from pipeline

* Update Documentation & Code Style

* Simple docstrings, use ValueError

* Use K=60

* Minor refactoring

* precalculate expected result in test

* Update Documentation & Code Style

* refactor to make more clear

* rm unused imports

* tests should test only one thing

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dmigo <d.f.goryunov@gmail.com>
2022-02-10 16:58:40 +01:00
Sara Zan
3cfdf88063
Make openapi.json multiline so the diff is parsable (#2163) 2022-02-10 16:25:00 +01:00
Sara Zan
795c7c8a47
Fix dependency management in Tutorial 6 (#2148)
* Fix dependency issue in Tutorial 6

* Remove faiss from first install block

* move faiss group back to main installation step

* Comment out Milvus cell

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-10 15:07:02 +01:00
Branden Chan
9551523ebb
Update README.md (#2160)
Add rest api and ui info
2022-02-10 15:00:09 +01:00
Branden Chan
287314b2d2
Update Readme to reflect changes to installation procedure (#2157)
* Update README.md

* change milvus to milvus1
2022-02-10 11:54:06 +01:00
tstadel
1bdd1f48fd
Fix windows ci tests (#2144)
* move commandline args to global conftest

* correct test exclude paths

* Update Documentation & Code Style

* exclude test_generator_pipeline_with_translator from windows ci

* exclude further oom tests

* enable log_cli

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 21:29:05 +01:00
Sara Zan
40328a57b6
Introduce pylint & other improvements on the CI (#2130)
* Make mypy check also ui and rest_api, fix ui

* Remove explicit type packages from extras, mypy now downloads them

* Make pylint and mypy run on every file except tests

* Rename tasks

* Change cache key

* Fix mypy errors in rest_api

* Normalize python versions to avoid cache misses

* Add all exclusions to make pylint pass

* Run mypy on rest_api and ui as well

* test if installing the package really changes outcome

* Comment out installation of packages

* Experiment: randomize tests

* Add fallback installation steps on cache misses

* Remove randomization

* Add comment on cache

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 18:27:12 +01:00
Sara Zan
9dc89d2bd2
Fix dependency related build issues in Dockerfiles (#2135)
* Fix a path issue in Dockerfile-GPU

* Fix paths in Dockerfile-GPU

* Add workflow_dispatch to docker build task

* Remove reference to optional component from ui/, not needed anymore

* Move pytorch installation last to avoid replacing it later

* Remove optional import from rest_api too, no more needed

* Change path in ui/Dockerfile

* ui container works again

* Complete review of import paths

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 17:35:18 +01:00
Sara Zan
aca52ea39c
Add aiorwlock to 'ray' extra & fix maximum version for some dependencies (#2140)
* Add aiorwlock to 'ray' extra

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 16:32:52 +01:00
Julian Risch
4b0ff830ca
fix type annotation (#2147)
* fix type annotation

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 09:50:37 +01:00
Julian Risch
7fab027bf0
Evaluating a pipeline consisting only of a reader node (#2132)
* pass documents as extra param to eval

* pass documents via labels to eval

* rename param in docs

* Update Documentation & Code Style

* Revert "rename param in docs"

This reverts commit 2f4c2ec79575e9dd33a8300785f789a327df36f4.

* Revert "pass documents via labels to eval"

This reverts commit dcc51e41f2637d093d81c7d193b873c17c36b174.

* simplify iterating through labels and docs

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 09:18:58 +01:00
tstadel
1e3edef803
List all pipeline(_configs) on Deepset Cloud (#2102)
* add list_pipelines_on_deepset_cloud()

* Apply Black

* refactor auto paging and throw DeepsetCloudErrors

* Apply Black

* fix mypy findings

* Update documentation

* Fix merge error on pipelines.md

* Update Documentation & Code Style

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-08 20:35:25 +01:00
Sara Zan
ffbba90323
Move pytest configuration into pyproject.toml (#2141)
* Move pytest configuration into pyproject.toml

* Fix markers format

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-08 17:23:59 +01:00
Sara Zan
692cde11e7
Change docstores_gpu into docstores-gpu in Dockerfile-GPU (#2129) 2022-02-07 15:10:17 +01:00
Sara Zan
a095aea21e
Reintroduce push on master trigger for Linux CI (#2127)
* Reintroduce push on master trigger with Linux CI

* Reintroduce trigger for freshly opened PRs too
2022-02-04 18:06:23 +01:00
Sara Zan
859a87f71a
Remove requirements file (#2128) 2022-02-04 18:05:47 +01:00
Buruk Aregawi
d3c776843f
Speed up query_by_embedding in InMemoryDocumentStore. (#2091)
* Speed up query_by_embedding in InMemoryDocumentStore.

* Make sure query and document embeddings are of the same dtype since they can vary.

* Handle cases where there are 0 and 1 documents.

* Don't put entire embedding matrix on GPU at once. Use separate get_score
functions for the CPU and GPU.

* Norm the vectors in get_scores_numpy in a safer way.

* Apply Black

* Incorporate missing factor of 4 in memory use calculation.

* Apply Black

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 17:17:17 +01:00
tstadel
518a439482
OpenSearchDocumentStore: Extend similarity support (#2070)
* get rid of global space_type setting

* full_similarity_support

* fallback to exact vector similarity

* cone_embedding_field() instead of full_similarity_support

* multiple embedding fields handling

* update documentation and messages

* revert unnecessary changes

* Add latest docstring and tutorial changes

* typo

* Add latest docstring and tutorial changes

* update docs

* Add latest docstring and tutorial changes

* improve messages

* further improve messages

* support l2 in ElasticsearchDocumentStore

* Apply Black

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-04 16:37:08 +01:00
Sara Zan
c6bfb1c1d4
Remove rest_api extra frpom Dockerfile-GPU (#2122) 2022-02-04 16:06:40 +01:00
Sara Zan
957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00
bogdankostic
f062911040
Extend metadata filtering support in ElasticsearchDocumentStore (#2108)
* Add extended filtering to ESDocumentStore

* Add Docstrings

* Fix definition of filter queries

* Fix mypy

* Add tests

* Add latest docstring and tutorial changes

* Adapt Docstrings

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Add filtering utils for same representation in all doc stores

* Apply balck formatting

* Update documentation

* Fix mypy

* Apply Black

* Fix mypy

* Adopt Doc Strings

* Add more tests

* Apply Black

* Allow filtering in OpenSearchDocStore

* Update documentation

* Adapt Docstrings

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 13:43:12 +01:00
mathislucka
34f9308e1a
Simplify SQuAD data to df conversion (#2124)
* Conversion to df does not need initialization

* Apply Black

* fix test case

* Apply Black

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 12:37:56 +01:00
Julian Risch
53decdcefb
Allow different filters per query in pipeline evaluation (#2068)
* add filters attribute to labels and use in eval

* Add latest docstring and tutorial changes

* overwrite params if None

* populate filters from Label to MultiLabel

* add query_id in eval df and deepcopy params for each label

* fix mypy

* add test for aggregating filters in multilabel

* use query ids also in answers df

* loop through unique query_ids

* hash filters and query text as id

* Add latest docstring and tutorial changes

* fix top_k reader eval

* Apply Black

* rename query_id to id/multilabel_id

* Apply Black

* json dump filters in dataframe

* add filters and id to wrong_examples()

* Apply Black

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-03 19:19:05 +01:00
Buruk Aregawi
1fa682ac73
Fixed performance bug. Using a list where a set is needed. (#2125) 2022-02-03 18:58:28 +01:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
tstadel
9974593c5e
Fix Seq2SeqGenerator return type (#2099)
* return proper Answer objs

* fix docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 00:20:24 +01:00
Sara Zan
3a6e64b2a3
Make FileTypeClassifier more flexible (#2101)
* Make FileTypeClassifier more flexible

* Make supported_types a init parameter

* Add tests and fix a couple of bugs

* Formatting

* Fix mypy

* Implement feedback
2022-02-02 17:51:04 +01:00
Sara Zan
767f0025c6
Make ui and rest proper packages (#2098)
* Adding simple setup.py to ui/ and rest_api and remove respective extras from main setup.cfg

* Make 'pip install rest_api/' fetch the local Haystack instead of downloading from pypi

* Add some comments to the new setup.py files and fix the Dockerfiles

* Add version info to 'farm-haystack-ui'

* Fix the OpenAPI Specs workflow

* Install rest_api and ui properly on the CI too

* Make the workflow see changes on every setup file

* Fix workflow cache keys

* Add license to rest_api and ui
2022-02-02 16:14:12 +01:00
Sara Zan
009c89fc53
Revert "Make the docstring bot work only on master" (#2114)
* Revert "Make the docstring bot work only on master (#2078)"

This reverts commit 649d07405770cd59696d0120107a3b2f0aafe7c2.

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-02 16:08:34 +01:00
Sebastián Ramírez
3c768071d5
Add JSON Schema autogeneration for Pipeline YAML files (#2020)
* 🎨 Update type annotations to allow their extraction for JSON Schema

*  Add main script doing all the work to generate the JSON Schema

*  Add GitHub Action dependency to generate JSON Schema

*  Update JSON Schema generation script to allow easily generating the schema without making a PR

* 👷 Add GitHub Action to generate JSON Schema

* 💚 Fix CI GitHub Action

* 💚 Update GitHub Action environment variables

*  Add initial JSON Schema

* Add latest docstring and tutorial changes

* 🐛 Do not allow extra params not defined in each model

* ♻️ Make any additional properties invalid

*  Make other additional properties invalid in all the levels in pipelines

* ♻️ Do not include Base classes as possible nodes

* 🍱 Update JSON Schema

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-02 15:00:41 +01:00
Julian Risch
3245cdef1d
Add faiss dependency to tutorial 12 (#2109) 2022-02-02 14:19:08 +01:00
mathislucka
88771b2bee
Provide option to recreate es doc store on initialization (#2084)
* provide option to recreate es doc store on initialization

* Add latest docstring and tutorial changes

* Label expects more arguments

* Label expects also an answer

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-02 11:03:15 +01:00
Sara Zan
649d074057
Make the docstring bot work only on master (#2078) 2022-02-01 14:09:55 +01:00
MichelBartels
525884e4cf
do not apply data parallel twice (#2095) 2022-02-01 12:24:51 +01:00
MichelBartels
e0c072d6fd
Distribute intermediate layer distillation loss calculation over multiple GPUs (#2090)
* distribute tinybert loss calculation

* improve doc string

* undo unnecessary change

* fix for only one gpu

* adding type hints

* making sure model distillation still works without gpu

* fix bug

* fixing type hints
2022-02-01 09:47:00 +01:00
Sowmiya Jaganathan
7d769d8bf1
Fixed the Search Field mapping in ElasticSearch DocumentStore (#2080)
* Review changes

* Added the synonym analyser for search fields

* Added the review requests.

* Added the synonyms the OpenSearchDocumentStore and review requests.
2022-01-31 11:11:20 +01:00
bogdankostic
bbb65a19bd
Add Tapas reader with scores (#1997)
* Add Tapas reader with scores

* Adapt possible answer spans

* Add latest docstring and tutorial changes

* Remove unused imports

* Adapt scoring

* Add latest docstring and tutorial changes

* Fix mypy

* Infer model architecture from config

* Adapt answer score calculation

* Add latest docstring and tutorial changes

* Fix mypy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-31 10:23:12 +01:00
Malte Pietsch
ee6b8d0688
Add ADR template for transparent architecture decisions (#2072)
* add adr template for decisions

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-01-28 17:33:53 +01:00
Kristof Herrmann
7764b6992c
DC SDK - load pipeline from deepset cloud (#2013)
* initial load_from_dc

* typo

* adjusted api endpoint

* removed kwargs

* added _load_from_dict

* refactor pipeline loading mechanism

* renaming load_from_dc api

* renaming

* fixed errors

* fix comments and environment variable overrides

* Add latest docstring and tutorial changes

* fix outdated YAML examples

* Add latest docstring and tutorial changes

* Introduce readonly DCDocumentStore (without labels support) (#1991)

* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>

* introduce DeepsetCloudAdapter

* Add latest docstring and tutorial changes

* introduce DeepsetCloudClient

* Add latest docstring and tutorial changes

* use json api for pipeline_config

* indexing pipeline test added

* pseudo change to force cache eviction

* revert pseudo change to force cache eviction

* remove conftest duplicates

* minor formatting and docstring fixes

* fix tests when MOCK_DC=False

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-28 17:32:56 +01:00
Sara Zan
07cf3c614a
Disable cache on the CI (#2083)
* Disable cache on the CI

* Reintroduce paths

* Add most files to the cache key

* remove date and path from cache key

* Try double install with cache

* Try to cache more stuff, on a per-commit basis

* Fix windows CI too

* Add comment on how to speed up the CI with better caching
2022-01-28 17:21:23 +01:00