454 Commits

Author SHA1 Message Date
Tuana Celik
6fb58d09a9
'os' wrapper to function for brownfield support (#2282)
* 'os' wrapper to function for brownfield support

* Changing function names and fixing default parameter values

* Including parameter keys

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-16 11:53:55 +01:00
Sara Zan
11cf94a965
Pipeline's YAML: syntax validation (#2226)
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes

* Make error composition work properly

* Clarify typing

* Help mypy a bit more

* Update Documentation & Code Style

* Enable autogenerated docs for Milvus1 and 2 separately

* Revert "Enable autogenerated docs for Milvus1 and 2 separately"

This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.

* Update Documentation & Code Style

* Re-enable 'additionalProperties: False'

* Add pipeline.type to JSON Schema, was somehow forgotten

* Disable additionalProperties on the pipeline properties too

* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)

* Cal super in PipelineValidationError

* Improve _read_pipeline_config_from_yaml's error handling

* Fix generate_json_schema.py to include document stores

* Fix json schemas (retro-fix 1.1.0 again)

* Improve custom errors printing, add link to docs

* Add function in BaseComponent to list its subclasses in a module

* Make some document stores base classes abstract

* Add marker 'integration' in pytest flags

* Slighly improve validation of pipelines at load

* Adding tests for YAML loading and validation

* Make custom_query Optional for validation issues

* Fix bug in _read_pipeline_config_from_yaml

* Improve error handling in BasePipeline and Pipeline and add DAG check

* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)

* Simplify errors slightly

* Add some YAML validation tests

* Remove load_from_config from BasePipeline, it was never used anyway

* Improve tests

* Include json-schemas in package

* Fix conftest imports

* Make BasePipeline abstract

* Improve mocking by making the test independent from the YAML version

* Add exportable_to_yaml decorator to forget about set_config on mock nodes

* Fix mypy errors

* Comment out one monkeypatch

* Fix typing again

* Improve error message for validation

* Add required properties to pipelines

* Fix YAML version for REST API YAMLs to 1.2.0

* Fix load_from_yaml call in load_from_deepset_cloud

* fix HaystackError.__getattr__

* Add super().__init__()in most nodes and docstore, comment set_config

* Remove type from REST API pipelines

* Remove useless init from doc2answers

* Call super in Seq3SeqGenerator

* Typo in deepsetcloud.py

* Fix rest api indexing error mismatch and mock version of JSON schema in all tests

* Working on pipeline tests

* Improve errors printing slightly

* Add back test_pipeline.yaml

* _json_schema.py supports different versions with identical schemas

* Add type to 0.7 schema for backwards compatibility

* Fix small bug in _json_schema.py

* Try alternative to generate json schemas on the CI

* Update Documentation & Code Style

* Make linux CI match autoformat CI

* Fix super-init-not-called

* Accidentally committed file

* Update Documentation & Code Style

* fix test_summarizer_translation.py's import

* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args

* Fix json schema for ray tests too

* Update Documentation & Code Style

* Reintroduce validation

* Usa unstable version in tests and rest api

* Make unstable support the latest versions

* Update Documentation & Code Style

* Remove needless fixture

* Make type in pipeline optional in the strings validation

* Fix schemas

* Fix string validation for pipeline type

* Improve validate_config_strings

* Remove type from test p[ipelines

* Update Documentation & Code Style

* Fix test_pipeline

* Removing more type from pipelines

* Temporary CI patc

* Fix issue with exportable_to_yaml never invoking the wrapped init

* rm stray file

* pipeline tests are green again

* Linux CI now needs .[all] to generate the schema

* Bugfixes, pipeline tests seems to be green

* Typo in version after merge

* Implement missing methods in Weaviate

* Trying to avoid FAISS tests from running in the Milvus1 test suite

* Fix some stray test paths and faiss index dumping

* Fix pytest markers list

* Temporarily disable cache to be able to see tests failures

* Fix pyproject.toml syntax

* Use only tmp_path

* Fix preprocessor signature after merge

* Fix faiss bug

* Fix Ray test

* Fix documentation issue by removing quotes from faiss type

* Update Documentation & Code Style

* use document properly in preprocessor tests

* Update Documentation & Code Style

* make preprocessor capable of handling documents

* import document

* Revert support for documents in preprocessor, do later

* Fix bug in _json_schema.py that was breaking validation

* re-enable cache

* Update Documentation & Code Style

* Simplify calling _json_schema.py from the CI

* Remove redundant ABC inheritance

* Ensure exportable_to_yaml works only on implementations

* Rename subclass to class_ in Meta

* Make run() and get_config() abstract in BasePipeline

* Revert unintended change in preprocessor

* Move outgoing_edges_input_node check inside try block

* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX

* Add check for a RecursionError on validate_config_strings

* Address usages of _pipeline_config in data silo and elasticsearch

* Rename _pipeline_config into _init_parameters

* Fix pytest marker and remove unused imports

* Remove most redundant ABCs

* Rename _init_parameters into _component_configuration

* Remove set_config and type from _component_configuration's dict

* Remove last instances of set_config and replace with super().__init__()

* Implement __init_subclass__ approach

* Simplify checks on the existence of _component_configuration

* Fix faiss issue

* Dynamic generation of node schemas & weed out old schemas

* Add debatable test

* Add docstring to debatable test

* Positive diff between schemas implemented

* Improve diff printing

* Rename REST API YAML files to trigger IDE validation

* Fix typing issues

* Fix more typing

* Typo in YAML filename

* Remove needless type:ignore

* Add tests

* Fix tests & validation feedback for accessory classes in custom nodes

* Refactor RAGeneratorType out

* Fix broken import in conftest

* Improve source error handling

* Remove unused import in test_eval.py breaking tests

* Fix changed error message in tests matches too

* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions

* Fix path to generate_openapi_specs.py in autoformat.yml

* Update Documentation & Code Style

* Add test for FAISSDocumentStore-like situations (superclass with init params)

* Update Documentation & Code Style

* Fix indentation

* Remove commented set_config

* Store model_name_or_path in FARMReader to use in DistillationDataSilo

* Rename _component_configuration into _component_config

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:17:26 +01:00
tstadel
65295bc386
Prevent preprocessor from changing existing documents (#2297) 2022-03-10 14:56:51 +01:00
tstadel
fd46a42130
Allow to deploy and undeploy Pipelines on Deepset Cloud (#2285)
* add deploy_on_deepset_cloud and undeploy_on_deepset_cloud

* increase polling interval to 5 seconds

* Update Documentation & Code Style

* improve logging

* move transitioning logic to PipelineClient

* use enum for Pipeline states

* improve docstrings

* Update Documentation & Code Style

* tests added

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-10 09:49:28 +01:00
Sara Zan
e85b948a4c
Fix PreProcessor test (#2290)
* Adding Document import, missing from recent PR

* Fix mypy signature warning too

* reduce diff to minimum

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-09 13:46:47 +01:00
Dmitry Goryunov
ecec9b4e2c
Remove substrings basic implementation (#2152)
* Remove substrings basic implementation

* Update Documentation & Code Style

* Remove substrings basic tests

* Simplify test
2022-03-08 15:49:56 +01:00
Vladimir Blagojevic
6c0094b5ad
Update LFQA with the latest LFQA seq2seq and retriever models (#2210)
* Register BartEli5Converter for vblagoje/bart_lfqa model

* Update LFQA unit tests

* Update LFQA tutorials
2022-03-08 15:11:41 +01:00
tstadel
4b46f2047b
save_to_deepset_cloud: automatically convert document stores (#2283)
* automatically convert to DeepsetCloudDocumentStore

* shorten info text.

* fix typo

* the -> this

* add test

* ensure request body has only DeepsetCloudDocumentStores

* mark test as elasticsearch to fix milvus1 ci
2022-03-07 22:35:15 +01:00
tstadel
dde9d59271
fix pip backtracking issue (#2281)
* fix pip backtracking issue

* restrict azure-core version

* Remove the trailing comma

* Add skip_magic_trailing_comma in pyproject.toml for pydoc compatibility

* Pin pydoc-markdown _again_

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-07 19:25:33 +01:00
bogdankostic
447baf77ef
Fix skipping of tests using document stores (#2268)
* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 15:19:27 +01:00
tstadel
f7a01624e0
Refactor Pipeline peripherals (#2253)
* move peripheral stuff to utils, add more and better tests

* Update Documentation & Code Style

* move config related peripherals to config module, fix tests

* Update Documentation & Code Style

* remove unnecessary list comprehensions

* apply ZanSara's feedback

* remove classes in pipeline utils

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 14:14:42 +01:00
bogdankostic
c5542bd3fb
Add RouteDocuments and JoinAnswers nodes (#2256)
* Add SplitDocumentList and JoinAnswer nodes

* Update Documentation & Code Style

* Add tests + adapt tutorial

* Update Documentation & Code Style

* Remove branch from installation path in Tutorial

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Change name of SplitDocumentList to RouteDocuments

* Update Documentation & Code Style

* Adapt tutorials to new name

* Add test for JoinAnswers

* Update Documentation & Code Style

* Adapt name of test for JoinAnswers node

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-01 17:42:11 +01:00
MichelBartels
2c423ba063
Introduce support for pymilvus>=2.0.0 (#2126)
* update remaining occurences of get_connection

* fix milvus2 import and fix wrong extra references

* change MilvusDocumentStore to Milvus1DocumentStore

* update milvus docstrings to reflect updated dependency management

* enable milvus 2 tests

* fix milvus2 env variable processing

* fix dropping collections for each milvus 2 test

* make Milvus 2 doc store tests work

* allow user to specify consistency level

* Fist attempt at running Milvus2 in the CI

* Install the correct pymilvus

* add batch deletion for milvus2

* change default from milvus 1 to milvus 2

* make milvus2 the default in the docstores extra

* Switch milvus1 and milvus2 in base test run on CI

* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'

* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py

* Enable autogenerated docs for Milvus1 and 2 separately

* Partial fix to docstring of Milvus2DocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-24 17:43:38 +01:00
bogdankostic
b03e9f5872
Fix surrounding context extraction in ParsrConverter (#2162)
* Fix surrounding context extraction

* Update Documentation & Code Style

* Unify Parsr and Azure + add test

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-24 14:58:36 +01:00
tstadel
e20f2e0d54
Generate code from pipeline (pipeline.to_code()) (#2214)
* pipeline.to_code() with jupyter support

* Update Documentation & Code Style

* add imports

* refactoring

* Update Documentation & Code Style

* docstrings added and refactoring

* Update Documentation & Code Style

* improve imports code generation

* add comment param

* Update Documentation & Code Style

* add simple test

* add to_notebook_cell()

* Update Documentation & Code Style

* introduce helper classes for code gen and eval report gen

* add more tests

* Update Documentation & Code Style

* fix Dict typings

* Update Documentation & Code Style

* validate user input before code gen

* enable urls for to_code()

* Update Documentation & Code Style

* remove all chars except colon from validation regex

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-23 11:08:57 +01:00
bogdankostic
4bad21e961
Add Brownfield Support of existing Elasticsearch indices (#2229)
* Add method to transform existing ES index

* Add possibility to regularly add new records

* Fix types

* Restructure import statement

* Add use_system_proxy param

* Update Documentation & Code Style

* Change location and name + add test

* Update Documentation & Code Style

* Add test cases for metadata fields

* Fix linter

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 20:58:57 +01:00
tstadel
965cc86b24
Fix ef_search param for hnsw in OpenSearchDocumentStore (#2227)
* fix ef_search param for hnsw

* Update Documentation & Code Style

* adjust ef_search param if index exists

* run black

* Fix label index recreation

* fix merge conflict

* merge source branch 'master' into fix_ef_search_param

* fix pylint issue

* fix test_pipeline_components test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 20:33:21 +01:00
MichelBartels
6918d5b79e
Fix missing embeddings not skipped if filters are used (#2230)
* fix skip embeddings param for elasticsearch when filters are specified

* Update Documentation & Code Style

* add test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 18:40:34 +01:00
MichelBartels
f4efc008f4
Adding extended meta data filtering support for InMemoryDocumenStore (#2120)
* add filter classes

* update filter comments

* Add util classes for converting filters (#2123)

* Apply Black

* reintroduce eval functions to filter ops

* Update documentation

* update to latest pymilvus version

* Apply Black

* fixing type hints

* Apply Black

* update write_documents method of milvus2 doc store

* remove unnecessary method

* update init

* remove changes to milvus 2 as they are part of other PR

* remove changes to milvus 2 as they are part of other PR

* updating doc strings to match elastic search filter doc

* Update Documentation & Code Style

* add support for case where there is no meta data defined for key

* update behaviour in case of field not existing in entry

* Update Documentation & Code Style

* add test for InMemoryDocumentStore extended meta data filtering

* make type hint more precise

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-02-22 17:44:58 +01:00
tstadel
fe03ca70de
Fix Pipeline.components (#2215)
* add components property, improve get_document_store()

* Update Documentation & Code Style

* use pipeline.get_document_store() instead of retriever.document_store

* add tests

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 15:01:07 +01:00
MichelBartels
116fe2db26
Extend meta data support for SQLDocumentStore (#2199)
* update remaining occurences of get_connection

* first commit to add extended metadata filtering support to sql

* fix bugs

* adding sql doc store instead of milvus

* removing updates to milvus2 from other PR

* fixing not operator

* delete left over line

* remove unnecessary import

* Update Documentation & Code Style

* fix circular import

* fix left over merge conflict

* Update Documentation & Code Style

* fix abstract class

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 20:40:32 +01:00
Sara Zan
2a840ee248
YAML versioning (#2209)
* Make YAML files get the same version as Haystack and throw warning at load in case of mismatch

* Update version of most YAMLs in the codebase (aesthethic chamge, only to avoid the warning).

* Remove quotes from version in tests

* Fix version in generate_json_schema.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 12:22:37 +01:00
bogdankostic
2a674eaff7
Support more data types and extended filters in WeaviateDocStore (#2143)
* Support more data types and extended filters in WeaviateDocStore

* Adapt types to extended filters

* Update Documentation & Code Style

* Fix mypy

* Fix type of filters

* Update Documentation & Code Style

* Add Docstrings for BaseDocStore

* Update Documentation & Code Style

* Add + prettify DocStrings

* Update Documentation & Code Style

* Fix types

* Update Documentation & Code Style

* Remove import of TypedDict

* Fix tests

* Update Documentation & Code Style

* Fix circular import

* Fix inversion of not operation + add test case

* Fix mypy

* Update Documentation & Code Style

* Apply black

* Use convert_date_to_rfc3339 instead of datetime.fromisoformat

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-18 08:55:17 +01:00
tstadel
ed6e64494e
Fix typo in save_to_deepset_cloud() (#2189)
* fix typo in method name

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 09:07:58 +01:00
tstadel
db4d6f43ba
Add tests on MultiLabel's meta and filter aggregation (#2169) 2022-02-11 17:42:47 +01:00
tstadel
9e18239e3b
pipeline.save_to_deepset_cloud() (#2145)
* add list_pipelines_on_deepset_cloud()

* add Pipeline.save_to_deepset_cloud()

* apply black

* fix imports

* Update Documentation & Code Style

* add load_from_config

* Update Documentation & Code Style

* fix pipeline name for indexing pipeline

* add tests

* Update Documentation & Code Style

* handle deployed pipelines

* make single pipeline config info requests instead of loading all infos

* make ROOT_NODE_TO_PIPELINE_NAME global

* better response validation for saving and updating pipeline configs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-11 12:50:53 +01:00
mathislucka
f11494a9d0
Join node should allow reciprocal rank fusion as additional merging method (#2133)
* join node should allow reciprocal rank fusion

* Update Documentation & Code Style

* add missing merging mode

* tuples are immutable

* take correct results from pipeline

* Update Documentation & Code Style

* Simple docstrings, use ValueError

* Use K=60

* Minor refactoring

* precalculate expected result in test

* Update Documentation & Code Style

* refactor to make more clear

* rm unused imports

* tests should test only one thing

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dmigo <d.f.goryunov@gmail.com>
2022-02-10 16:58:40 +01:00
tstadel
1bdd1f48fd
Fix windows ci tests (#2144)
* move commandline args to global conftest

* correct test exclude paths

* Update Documentation & Code Style

* exclude test_generator_pipeline_with_translator from windows ci

* exclude further oom tests

* enable log_cli

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 21:29:05 +01:00
Julian Risch
7fab027bf0
Evaluating a pipeline consisting only of a reader node (#2132)
* pass documents as extra param to eval

* pass documents via labels to eval

* rename param in docs

* Update Documentation & Code Style

* Revert "rename param in docs"

This reverts commit 2f4c2ec79575e9dd33a8300785f789a327df36f4.

* Revert "pass documents via labels to eval"

This reverts commit dcc51e41f2637d093d81c7d193b873c17c36b174.

* simplify iterating through labels and docs

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 09:18:58 +01:00
tstadel
1e3edef803
List all pipeline(_configs) on Deepset Cloud (#2102)
* add list_pipelines_on_deepset_cloud()

* Apply Black

* refactor auto paging and throw DeepsetCloudErrors

* Apply Black

* fix mypy findings

* Update documentation

* Fix merge error on pipelines.md

* Update Documentation & Code Style

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-08 20:35:25 +01:00
Sara Zan
ffbba90323
Move pytest configuration into pyproject.toml (#2141)
* Move pytest configuration into pyproject.toml

* Fix markers format

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-08 17:23:59 +01:00
Sara Zan
957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00
bogdankostic
f062911040
Extend metadata filtering support in ElasticsearchDocumentStore (#2108)
* Add extended filtering to ESDocumentStore

* Add Docstrings

* Fix definition of filter queries

* Fix mypy

* Add tests

* Add latest docstring and tutorial changes

* Adapt Docstrings

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Add filtering utils for same representation in all doc stores

* Apply balck formatting

* Update documentation

* Fix mypy

* Apply Black

* Fix mypy

* Adopt Doc Strings

* Add more tests

* Apply Black

* Allow filtering in OpenSearchDocStore

* Update documentation

* Adapt Docstrings

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 13:43:12 +01:00
mathislucka
34f9308e1a
Simplify SQuAD data to df conversion (#2124)
* Conversion to df does not need initialization

* Apply Black

* fix test case

* Apply Black

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 12:37:56 +01:00
Julian Risch
53decdcefb
Allow different filters per query in pipeline evaluation (#2068)
* add filters attribute to labels and use in eval

* Add latest docstring and tutorial changes

* overwrite params if None

* populate filters from Label to MultiLabel

* add query_id in eval df and deepcopy params for each label

* fix mypy

* add test for aggregating filters in multilabel

* use query ids also in answers df

* loop through unique query_ids

* hash filters and query text as id

* Add latest docstring and tutorial changes

* fix top_k reader eval

* Apply Black

* rename query_id to id/multilabel_id

* Apply Black

* json dump filters in dataframe

* add filters and id to wrong_examples()

* Apply Black

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-03 19:19:05 +01:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
tstadel
9974593c5e
Fix Seq2SeqGenerator return type (#2099)
* return proper Answer objs

* fix docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 00:20:24 +01:00
Sara Zan
3a6e64b2a3
Make FileTypeClassifier more flexible (#2101)
* Make FileTypeClassifier more flexible

* Make supported_types a init parameter

* Add tests and fix a couple of bugs

* Formatting

* Fix mypy

* Implement feedback
2022-02-02 17:51:04 +01:00
mathislucka
88771b2bee
Provide option to recreate es doc store on initialization (#2084)
* provide option to recreate es doc store on initialization

* Add latest docstring and tutorial changes

* Label expects more arguments

* Label expects also an answer

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-02 11:03:15 +01:00
Sowmiya Jaganathan
7d769d8bf1
Fixed the Search Field mapping in ElasticSearch DocumentStore (#2080)
* Review changes

* Added the synonym analyser for search fields

* Added the review requests.

* Added the synonyms the OpenSearchDocumentStore and review requests.
2022-01-31 11:11:20 +01:00
Kristof Herrmann
7764b6992c
DC SDK - load pipeline from deepset cloud (#2013)
* initial load_from_dc

* typo

* adjusted api endpoint

* removed kwargs

* added _load_from_dict

* refactor pipeline loading mechanism

* renaming load_from_dc api

* renaming

* fixed errors

* fix comments and environment variable overrides

* Add latest docstring and tutorial changes

* fix outdated YAML examples

* Add latest docstring and tutorial changes

* Introduce readonly DCDocumentStore (without labels support) (#1991)

* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>

* introduce DeepsetCloudAdapter

* Add latest docstring and tutorial changes

* introduce DeepsetCloudClient

* Add latest docstring and tutorial changes

* use json api for pipeline_config

* indexing pipeline test added

* pseudo change to force cache eviction

* revert pseudo change to force cache eviction

* remove conftest duplicates

* minor formatting and docstring fixes

* fix tests when MOCK_DC=False

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-28 17:32:56 +01:00
Sara Zan
d470b9d0bd
Improve dependency management (#1994)
* Fist attempt at using setup.cfg for dependency management

* Trying the new package on the CI and in Docker too

* Add composite extras_require

* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports

* Fix bug on class import and rephrase error message

* Introduce typing for optional modules and add type: ignore in sparse.py

* Include importlib_metadata backport for py3.7

* Add colab group to extra_requires

* Fix pillow version

* Fix grpcio

* Separate out the crawler as another extra

* Make paths relative in rest_api and ui

* Update the test matrix in the CI

* Add try catch statements around the optional imports too to account for direct imports

* Never mix direct deps with self-references and add ES deps to the base install

* Refactor several paths in tests to make them insensitive to the execution path

* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix

* Wrap pdf conversion utils into safe_import

* Update some tutorials and rever Milvus1 as default for now, see #2067

* Fix mypy config


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 18:12:55 +01:00
Sowmiya Jaganathan
c4fff19018
Supported Highlighting in Elasticsearch (#1930)
* Supported Highlighting

* Review changes

* add example to docstrings

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-26 17:35:33 +01:00
Adrien Wald
2edc421a09
Add top_k_join parameter to JoinDocuments.run (#2065)
* add top_k_join parameter to JoinDocuments.run

* test JoinDocuments concatenate with top_k_join parameter

* test two different top_k_join parameters
2022-01-26 17:30:16 +01:00
mathislucka
5b7e906e85
fix: get_documents_by_id should return docs for all passed ids (#2064)
* doc store should return all documents matching ids passed to get_documents_by_id

* test for get_document_by_id should be named correctly

* add test for get_documents_by_id

* Add latest docstring and tutorial changes

* document es query limit

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 12:39:04 +01:00
tstadel
8a32d8da92
Introduce readonly DCDocumentStore (without labels support) (#1991)
* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-25 20:36:28 +01:00
MichelBartels
5b6b0cef77
Add UnlabeledTextProcessor (#2054)
* add UnlabeledTextProcessor

* allow choosing processor when finetuning or distilling

* fix type hint

* Add latest docstring and tutorial changes

* improve segment id computation for UnlabeledTextProcessor

* add text and documentation

* change batch size parameter for intermediate layer distillation

* Add latest docstring and tutorial changes

* fix distillation dim mapping

* remove unnecessary changes

* removed confusing parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-25 14:54:34 +01:00
MichelBartels
e8cd5ea943
Add distillation to finetuning tutorial (#2025)
* Add finetuning tutorial

* Add latest docstring and tutorial changes

* fix typo

* Add latest docstring and tutorial changes

* improve distillation explanation in finetuning tutorial

* Add latest docstring and tutorial changes

* allow augment_squad.py to be easier to call from within python

* Update Tutorial2_Finetune_a_model_on_your_data.py

* fix squad augmentation test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-20 12:18:32 +01:00
MichelBartels
0cca2b97cd
distinguish intermediate layer & prediction layer distillation phases with different parameters (#2001)
* add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation

* Add latest docstring and tutorial changes

* improve default parameters

* Add latest docstring and tutorial changes

* split up distillation method

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 20:40:38 +01:00
tstadel
f42d2e8ba0
Add nDCG to pipeline.eval()'s document metrics (#2008)
* add ndcg metric

* fix merge

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 18:36:41 +01:00