1524 Commits

Author SHA1 Message Date
Sara Zan
d98883b79d
Add tests for missing __init__ and super().__init__() in custom nodes (#2350)
* Add tests for missing init and super

* Update Documentation & Code Style

* change in with endswith

* Move test in pipeline.py and change test in pipeline_yaml.py

* Update Documentation & Code Style

* Use caplog to test the warning

* Update Documentation & Code Style

* move tests into test_pipeline and use get_config

* Update Documentation & Code Style

* Unmock version name

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-13 14:29:05 +02:00
Sara Zan
96a538b182
Pylint (import related warnings) and REST API improvements (#2326)
* remove duplicate imports

* fix ungrouped-imports

* Fix wrong-import-position

* Fix unused-import

* pyproject.toml

* Working on wrong-import-order

* Solve wrong-import-order

* fix Pool import

* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py

* remove Converter from modeling

* Fix mypy issues on adaptive_model.py

* create es_converter.py

* remove converter import

* change import path in tests

* Restructure REST API to not rely on global vars from search.apy and improve tests

* Fix openapi generator

* Move variable initialization

* Change type of FilterRequest.filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 16:41:05 +02:00
Branden Chan
75dcfd3fab
Delete files in docs/_src (#2322)
* Delete files in _src

* Filter unused images and re-add images that were in use in docs/img

* Remove all usages of user-images.githubusercontent.com

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-04-12 16:19:03 +02:00
Michele Pangrazzi
dd4361c129
Print warning in EmbeddingRetriever if sentence-transformers model used with different model format (#2377)
* ensure correct embedding_encoder is loaded when embedding_model is a sentence-transformers model but model_format is missing or wrong

* minor refactoring

* do not update model_format and ensure a warning is logged when it could be wrong

* Apply black

* Apply black

Co-authored-by: Michele Pangrazzi <michele@wonderflow.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-04-12 11:52:27 +02:00
tstadel
8342a6c1d6
Fix eval discrepancies (#2381)
* fix eval discrepancies

* Update Documentation & Code Style

* fix reader eval comparison

* Update Documentation & Code Style

* slightly improve messed up top_n_f1 func

* add no_answer hint to reader.eval metrics

* fix tut5

* Update Documentation & Code Style

* correct doc_relevance_col in tests

* Update Documentation & Code Style

* redefine recall metrics for no_answers

* fix bugs in EvalAnswers

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 09:24:22 +02:00
Giannis Kitsos Kalyvianakis
b94d9effaf
extract extension based on file's content (#2330)
* extract extension based on file's content

* Add python-magic dependency

* fix the _estimate_extension function and lowercase the file extensions

* check if the FileTypeClassifier can be imported

* add test and new file types

* fix typing

* import Optional

* revert Optional and make sure a string is always returned

* fix test so that it skips markdown files

* Emulate Code & Docs action

* Generate schemas

* Tidy up test code & extensioness files

* Improve error messages

* Revert schema changes

* Emulate black and docs CI again
2022-04-11 09:16:30 +02:00
Sara Zan
ae712fe6bf
Upgrade weaviate-client to 3.3.3 and fix get_all_documents (#1895)
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents

* Add type

* Update Weaviate version on the CI

* Fix bug on get_document_count where there are no documents

* Add more info in the docstrings of get_all_documents and get_all_documents_generator

* Add latest docstring and tutorial changes

* Apply Black

* Update Documentation & Code Style

* Trigger pipeline

* Update Documentation & Code Style

* Include StefanBogdan feedback

* Fix mypy issues and LogicalFilterClause

* Add more types

* Update Documentation & Code Style

* update setup.cfg

* Upgrade weaviate containers too

* Allow to filter for content field in Weaviate

* Use convert_to_weaviate instead of convert_to_pinecone

* Fix _get_all_documents_in_index

* Update docstrings and docs

* Catching an exception in get_document(s)_by_id

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-04-01 15:37:34 +03:00
tstadel
3561037e82
Use cache for hf requests during CI (#2379)
* increase all_close tolerance for milvus2, improve assertion infos

* use request-cache for huggingface
2022-03-31 12:36:45 +02:00
tstadel
5b52690c5c
Increase all_close tolerance for milvus2, improve assertion infos (#2375) 2022-03-31 11:41:13 +02:00
Florian Hardow
a273c3a51d
EvaluationSetClient for deepset cloud to fetch evaluation sets and la… (#2345)
* EvaluationSetClient for deepset cloud to fetch evaluation sets and labels for one specific evaluation set

* make DeepsetCloudDocumentStore able to fetch uploaded evaluation set names

* fix missing renaming of get_evaluation_set_names in DeepsetCloudDocumentStore

* update documentation for evaluation set functionality in deepset cloud document store

* DeepsetCloudDocumentStore tests for evaluation set functionality

* rename index to evaluation_set_name for DeepsetCloudDocumentStore evaluation set functionality

* raise DeepsetCloudError when no labels were found for evaluation set

* make use of .get_with_auto_paging in EvaluationSetClient

* Return result of get_with_auto_paging() as it parses the response already

* Make schema import source more specific

* fetch all evaluation sets for a workspace in deepset Cloud

* Rename evaluation_set_name to label_index

* make use of generator functionality for fetching labels

* Update Documentation & Code Style

* Adjust function input for DeepsetCloudDocumentStore.get_all_labels, adjust tests for it, fix typos, make linter happy

* Match error message with pytest.raises

* Update Documentation & Code Style

* DeepsetCloudDocumentStore.get_labels_count raises DeepsetCloudError when no evaluation set was found to count labels on

* remove unneeded import in tests

* DeepsetCloudDocumentStore tests, make reponse bodies a string through json.dumps

* DeepsetcloudDocumentStore.get_label_count - move raise to return

* stringify uuid before json.dump as uuid is not serilizable

* DeepsetcloudDocumentStore - adjust response mocking in tests

* DeepsetcloudDocumentStore - json dump response body in test

* DeepsetCloudDocumentStore introduce label_index, EvaluationSetClient rename label_index to evaluation_set

* Update Documentation & Code Style

* DeepsetCloudDocumentStore rename evaluation_set to evaluation_set_response as there is a name clash with the input variable

* DeepsetCloudDocumentStore - rename missed variable in test

* DeepsetCloudDocumentStore - rename missed label_index to index in doc string, rename label_index to evaluation_set in EvaluationSetClient

* Update Documentation & Code Style

* DeepsetCloudDocumentStore - update docstrings for EvaluationSetClient

* DeepsetCloudDocumentStore - fix typo in doc string

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-31 08:59:58 +02:00
bogdankostic
ca988917c9
Fix TableReader for tables without rows (#2369)
* Skip tables without rows

* Update Documentation & Code Style

* Add tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-30 17:02:39 +02:00
bogdankostic
834f8c4902
Change return types of indexing pipeline nodes (#2342)
* Change return types of file converters

* Change return types of preprocessor

* Change return types of crawler

* Adapt utils to functions to new return types

* Adapt __init__.py to new method names

* Prevent circular imports

* Update Documentation & Code Style

* Let DocStores' run method accept Documents

* Adapt tests to new return types

* Update Documentation & Code Style

* Put "# type: ignore" to right place

* Remove id_hash_keys property from Document primitive

* Update Documentation & Code Style

* Adapt tests to new return types and missing id_hash_keys property

* Fix mypy

* Fix mypy

* Adapt PDFToTextOCRConverter

* Remove id_hash_keys from RestAPI tests

* Update Documentation & Code Style

* Rename tests

* Remove redundant setting of content_type="text"

* Add DeprecationWarning

* Add id_hash_keys to elasticsearch_index_to_document_store

* Change document type from dict to Docuemnt in PreProcessor test

* Fix file path in Tutorial 5

* Remove added output in Tutorial 5

* Update Documentation & Code Style

* Fix file_paths in Tutorial 9 + fix gz files in fetch_archive_from_http

* Adapt tutorials to new return types

* Adapt tutorial 14 to new return types

* Update Documentation & Code Style

* Change assertions to HaystackErrors

* Import HaystackError correctly

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-29 13:53:35 +02:00
tstadel
a73717b2ea
Support conjunctive queries in sparse retrieval (#2361)
* support conjunctive queries in sparse retrieval

* fix typo

* test added

* Update Documentation & Code Style

* fix test_DeepsetCloudDocumentStore_query

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-28 22:10:50 +02:00
tstadel
b20a1f874b
Fix sparse retrieval with filters returns results without any text-match (#2359)
* use "must" instead of "should" for query-matching

* Update Documentation & Code Style

* fix mypy issue

* fix finding of new pylint version

* add test

* fix test_retrieval

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-25 17:53:42 +01:00
tstadel
ca86cc834d
Integrate BEIR (#2333)
* introduce eval_beir() to Pipeline

* add beir dependency

* Update Documentation & Code Style

* top_k_values added + refactoring

* Update Documentation & Code Style

* enable titles during beir eval

* Update Documentation & Code Style

* raise HaystackError instead of PipelineError

* get rid of forced dedicated index

* minor docstring and comment fixes

* show warning on default index deletion

* Update Documentation & Code Style

* add delete_index to MockDocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 19:04:28 +01:00
James Briggs
8cd73a9d20
Add PineconeDocumentStore (#2254)
* added core install and functionality of pinecone doc store (init, upsert, query, delete)

* implemented core functionality of Pinecone doc store

* Update Documentation & Code Style

* updated filtering to use Haystack filtering and reduced default batch_size

* Update Documentation & Code Style

* removed debugging code

* updated Pinecone filtering to use filter_utils

* removed uneeded methods and minor tweaks to current methods

* fixed typing issues

* Update Documentation & Code Style

* Allow filters in al methods except get_embedding_count

* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

* Add pinecone to test suite

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Add pinecone to docstores dependendcy

* Add PineconeDocStore to API Documentation

* Add missing comma

* Update Documentation & Code Style

* Adapt format of doc strings

* Update Documentation & Code Style

* Set API key as environment variable

* Skip Pinecone tests in forks

* Add sleep after deleting index

* Add sleep after deleting index

* Add sleep after creating index

* Add check if index ready

* Remove printing of index stats

* Create new index for each pinecone test

* Use RestAPI instead of Python API for describe_index_stats

* Fix accessing describe_index_stats

* Remove usages of describe_index_stats

* Run pinecone tests separately

* Update Documentation & Code Style

* Add pdftotext to pinecone tests

* Remove sleep from doc store fixture

* Add describe_index_stats

* Remove unused imports

* Use pull_request_target trigger

* Revert use pull_request_target trigger

* Remove set_config

* Add os to conftest

* Integrate review comments

* Set include_values to False

* Remove quotation marks from pinecone.Index type

* Update Documentation & Code Style

* Update Documentation & Code Style

* Fix number of args in error messages

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-03-21 16:24:09 +01:00
Sara Zan
7261377643
Improve error message for nodes failing validation (#2313)
* Similar test case seems to pass

* Update Documentation & Code Style

* Improve error message

* Slightly clarify info message

* Fix mismatch between node and node_class in the schema generation

* Remove condition that node class names cannot begin with Base and update tests

* Indentation

* Update Documentation & Code Style

* feedback

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 14:47:24 +01:00
Julian Risch
ac5617e757
Add basic telemetry features (#2314)
* add basic telemetry features

* change pipeline_config to _component_config

* Update Documentation & Code Style

* add super().__init__() calls to error classes

* make posthog mock work with python 3.7

* Update Documentation & Code Style

* update link to docs web page

* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)

* add comment on send_event in BaseComponent.init() and fix mypy

* mock NonPrivateParameters and fix pylint undefined-variable

* Update Documentation & Code Style

* check model path contains multiple /

* add test for writing to file

* add test for en-/disable telemetry

* Update Documentation & Code Style

* merge file deletion methods and ignore pylint global statement

* Update Documentation & Code Style

* set env variable in demo to activate telemetry

* fix mock of HAYSTACK_TELEMETRY_ENABLED

* fix mypy and linter

* add CI as env variable to execution contexts

* remove threading, add test for custom error event

* Update Documentation & Code Style

* simplify config/log file deletion

* add test for final event being sent

* force writing config file in test

* make test compatible with python 3.7

* switch to posthog production server

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 11:58:51 +01:00
tstadel
e13df4b22b
Implement Context Matching (#2293)
* first context_matching impl

* Update Documentation & Code Style

* sort matches

* fix matching bugs

* Update Documentation & Code Style

* add match_contexts

* min_words added

* Update Documentation & Code Style

* rename matching.py to context_matching.py

* fix mypy

* added tests and heuristic for one-sided overlaps

* Update Documentation & Code Style

* add another noise test

* Update Documentation & Code Style

* improve boosting split overlaps

* add non parallel versions of match_context and match_contexts

* Update Documentation & Code Style

* fix pylint finding

* add tests for match_context and match_contexts

* fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 10:35:12 +01:00
tstadel
8f7dd13eb9
Fix dependency graph for indexing pipelines during codegen (#2311)
* fix dependency graph for indexing pipelines

* Update Documentation & Code Style

* add test and fix get_config for existing components

* Update Documentation & Code Style

* fix mypy finding

* refactored Pipeline.get_config

* Update Documentation & Code Style

* split to_code test into get_config test and generate_code test

* fix child component handling in get_config()

* Update Documentation & Code Style

* fix get_params

* make get_config fully recursive

* add multi level dependency test

* Update Documentation & Code Style

* add some review feedback

* fix multiple dependent components of same type

* fix mypy finding

* rename dependencies to utilized_components

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-17 22:03:39 +01:00
Tuana Celik
6fb58d09a9
'os' wrapper to function for brownfield support (#2282)
* 'os' wrapper to function for brownfield support

* Changing function names and fixing default parameter values

* Including parameter keys

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-16 11:53:55 +01:00
Sara Zan
11cf94a965
Pipeline's YAML: syntax validation (#2226)
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes

* Make error composition work properly

* Clarify typing

* Help mypy a bit more

* Update Documentation & Code Style

* Enable autogenerated docs for Milvus1 and 2 separately

* Revert "Enable autogenerated docs for Milvus1 and 2 separately"

This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.

* Update Documentation & Code Style

* Re-enable 'additionalProperties: False'

* Add pipeline.type to JSON Schema, was somehow forgotten

* Disable additionalProperties on the pipeline properties too

* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)

* Cal super in PipelineValidationError

* Improve _read_pipeline_config_from_yaml's error handling

* Fix generate_json_schema.py to include document stores

* Fix json schemas (retro-fix 1.1.0 again)

* Improve custom errors printing, add link to docs

* Add function in BaseComponent to list its subclasses in a module

* Make some document stores base classes abstract

* Add marker 'integration' in pytest flags

* Slighly improve validation of pipelines at load

* Adding tests for YAML loading and validation

* Make custom_query Optional for validation issues

* Fix bug in _read_pipeline_config_from_yaml

* Improve error handling in BasePipeline and Pipeline and add DAG check

* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)

* Simplify errors slightly

* Add some YAML validation tests

* Remove load_from_config from BasePipeline, it was never used anyway

* Improve tests

* Include json-schemas in package

* Fix conftest imports

* Make BasePipeline abstract

* Improve mocking by making the test independent from the YAML version

* Add exportable_to_yaml decorator to forget about set_config on mock nodes

* Fix mypy errors

* Comment out one monkeypatch

* Fix typing again

* Improve error message for validation

* Add required properties to pipelines

* Fix YAML version for REST API YAMLs to 1.2.0

* Fix load_from_yaml call in load_from_deepset_cloud

* fix HaystackError.__getattr__

* Add super().__init__()in most nodes and docstore, comment set_config

* Remove type from REST API pipelines

* Remove useless init from doc2answers

* Call super in Seq3SeqGenerator

* Typo in deepsetcloud.py

* Fix rest api indexing error mismatch and mock version of JSON schema in all tests

* Working on pipeline tests

* Improve errors printing slightly

* Add back test_pipeline.yaml

* _json_schema.py supports different versions with identical schemas

* Add type to 0.7 schema for backwards compatibility

* Fix small bug in _json_schema.py

* Try alternative to generate json schemas on the CI

* Update Documentation & Code Style

* Make linux CI match autoformat CI

* Fix super-init-not-called

* Accidentally committed file

* Update Documentation & Code Style

* fix test_summarizer_translation.py's import

* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args

* Fix json schema for ray tests too

* Update Documentation & Code Style

* Reintroduce validation

* Usa unstable version in tests and rest api

* Make unstable support the latest versions

* Update Documentation & Code Style

* Remove needless fixture

* Make type in pipeline optional in the strings validation

* Fix schemas

* Fix string validation for pipeline type

* Improve validate_config_strings

* Remove type from test p[ipelines

* Update Documentation & Code Style

* Fix test_pipeline

* Removing more type from pipelines

* Temporary CI patc

* Fix issue with exportable_to_yaml never invoking the wrapped init

* rm stray file

* pipeline tests are green again

* Linux CI now needs .[all] to generate the schema

* Bugfixes, pipeline tests seems to be green

* Typo in version after merge

* Implement missing methods in Weaviate

* Trying to avoid FAISS tests from running in the Milvus1 test suite

* Fix some stray test paths and faiss index dumping

* Fix pytest markers list

* Temporarily disable cache to be able to see tests failures

* Fix pyproject.toml syntax

* Use only tmp_path

* Fix preprocessor signature after merge

* Fix faiss bug

* Fix Ray test

* Fix documentation issue by removing quotes from faiss type

* Update Documentation & Code Style

* use document properly in preprocessor tests

* Update Documentation & Code Style

* make preprocessor capable of handling documents

* import document

* Revert support for documents in preprocessor, do later

* Fix bug in _json_schema.py that was breaking validation

* re-enable cache

* Update Documentation & Code Style

* Simplify calling _json_schema.py from the CI

* Remove redundant ABC inheritance

* Ensure exportable_to_yaml works only on implementations

* Rename subclass to class_ in Meta

* Make run() and get_config() abstract in BasePipeline

* Revert unintended change in preprocessor

* Move outgoing_edges_input_node check inside try block

* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX

* Add check for a RecursionError on validate_config_strings

* Address usages of _pipeline_config in data silo and elasticsearch

* Rename _pipeline_config into _init_parameters

* Fix pytest marker and remove unused imports

* Remove most redundant ABCs

* Rename _init_parameters into _component_configuration

* Remove set_config and type from _component_configuration's dict

* Remove last instances of set_config and replace with super().__init__()

* Implement __init_subclass__ approach

* Simplify checks on the existence of _component_configuration

* Fix faiss issue

* Dynamic generation of node schemas & weed out old schemas

* Add debatable test

* Add docstring to debatable test

* Positive diff between schemas implemented

* Improve diff printing

* Rename REST API YAML files to trigger IDE validation

* Fix typing issues

* Fix more typing

* Typo in YAML filename

* Remove needless type:ignore

* Add tests

* Fix tests & validation feedback for accessory classes in custom nodes

* Refactor RAGeneratorType out

* Fix broken import in conftest

* Improve source error handling

* Remove unused import in test_eval.py breaking tests

* Fix changed error message in tests matches too

* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions

* Fix path to generate_openapi_specs.py in autoformat.yml

* Update Documentation & Code Style

* Add test for FAISSDocumentStore-like situations (superclass with init params)

* Update Documentation & Code Style

* Fix indentation

* Remove commented set_config

* Store model_name_or_path in FARMReader to use in DistillationDataSilo

* Rename _component_configuration into _component_config

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:17:26 +01:00
tstadel
65295bc386
Prevent preprocessor from changing existing documents (#2297) 2022-03-10 14:56:51 +01:00
tstadel
fd46a42130
Allow to deploy and undeploy Pipelines on Deepset Cloud (#2285)
* add deploy_on_deepset_cloud and undeploy_on_deepset_cloud

* increase polling interval to 5 seconds

* Update Documentation & Code Style

* improve logging

* move transitioning logic to PipelineClient

* use enum for Pipeline states

* improve docstrings

* Update Documentation & Code Style

* tests added

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-10 09:49:28 +01:00
Sara Zan
e85b948a4c
Fix PreProcessor test (#2290)
* Adding Document import, missing from recent PR

* Fix mypy signature warning too

* reduce diff to minimum

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-09 13:46:47 +01:00
Dmitry Goryunov
ecec9b4e2c
Remove substrings basic implementation (#2152)
* Remove substrings basic implementation

* Update Documentation & Code Style

* Remove substrings basic tests

* Simplify test
2022-03-08 15:49:56 +01:00
Vladimir Blagojevic
6c0094b5ad
Update LFQA with the latest LFQA seq2seq and retriever models (#2210)
* Register BartEli5Converter for vblagoje/bart_lfqa model

* Update LFQA unit tests

* Update LFQA tutorials
2022-03-08 15:11:41 +01:00
tstadel
4b46f2047b
save_to_deepset_cloud: automatically convert document stores (#2283)
* automatically convert to DeepsetCloudDocumentStore

* shorten info text.

* fix typo

* the -> this

* add test

* ensure request body has only DeepsetCloudDocumentStores

* mark test as elasticsearch to fix milvus1 ci
2022-03-07 22:35:15 +01:00
tstadel
dde9d59271
fix pip backtracking issue (#2281)
* fix pip backtracking issue

* restrict azure-core version

* Remove the trailing comma

* Add skip_magic_trailing_comma in pyproject.toml for pydoc compatibility

* Pin pydoc-markdown _again_

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-07 19:25:33 +01:00
bogdankostic
447baf77ef
Fix skipping of tests using document stores (#2268)
* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 15:19:27 +01:00
tstadel
f7a01624e0
Refactor Pipeline peripherals (#2253)
* move peripheral stuff to utils, add more and better tests

* Update Documentation & Code Style

* move config related peripherals to config module, fix tests

* Update Documentation & Code Style

* remove unnecessary list comprehensions

* apply ZanSara's feedback

* remove classes in pipeline utils

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 14:14:42 +01:00
bogdankostic
c5542bd3fb
Add RouteDocuments and JoinAnswers nodes (#2256)
* Add SplitDocumentList and JoinAnswer nodes

* Update Documentation & Code Style

* Add tests + adapt tutorial

* Update Documentation & Code Style

* Remove branch from installation path in Tutorial

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Change name of SplitDocumentList to RouteDocuments

* Update Documentation & Code Style

* Adapt tutorials to new name

* Add test for JoinAnswers

* Update Documentation & Code Style

* Adapt name of test for JoinAnswers node

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-01 17:42:11 +01:00
MichelBartels
2c423ba063
Introduce support for pymilvus>=2.0.0 (#2126)
* update remaining occurences of get_connection

* fix milvus2 import and fix wrong extra references

* change MilvusDocumentStore to Milvus1DocumentStore

* update milvus docstrings to reflect updated dependency management

* enable milvus 2 tests

* fix milvus2 env variable processing

* fix dropping collections for each milvus 2 test

* make Milvus 2 doc store tests work

* allow user to specify consistency level

* Fist attempt at running Milvus2 in the CI

* Install the correct pymilvus

* add batch deletion for milvus2

* change default from milvus 1 to milvus 2

* make milvus2 the default in the docstores extra

* Switch milvus1 and milvus2 in base test run on CI

* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'

* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py

* Enable autogenerated docs for Milvus1 and 2 separately

* Partial fix to docstring of Milvus2DocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-24 17:43:38 +01:00
bogdankostic
b03e9f5872
Fix surrounding context extraction in ParsrConverter (#2162)
* Fix surrounding context extraction

* Update Documentation & Code Style

* Unify Parsr and Azure + add test

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-24 14:58:36 +01:00
tstadel
e20f2e0d54
Generate code from pipeline (pipeline.to_code()) (#2214)
* pipeline.to_code() with jupyter support

* Update Documentation & Code Style

* add imports

* refactoring

* Update Documentation & Code Style

* docstrings added and refactoring

* Update Documentation & Code Style

* improve imports code generation

* add comment param

* Update Documentation & Code Style

* add simple test

* add to_notebook_cell()

* Update Documentation & Code Style

* introduce helper classes for code gen and eval report gen

* add more tests

* Update Documentation & Code Style

* fix Dict typings

* Update Documentation & Code Style

* validate user input before code gen

* enable urls for to_code()

* Update Documentation & Code Style

* remove all chars except colon from validation regex

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-23 11:08:57 +01:00
bogdankostic
4bad21e961
Add Brownfield Support of existing Elasticsearch indices (#2229)
* Add method to transform existing ES index

* Add possibility to regularly add new records

* Fix types

* Restructure import statement

* Add use_system_proxy param

* Update Documentation & Code Style

* Change location and name + add test

* Update Documentation & Code Style

* Add test cases for metadata fields

* Fix linter

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 20:58:57 +01:00
tstadel
965cc86b24
Fix ef_search param for hnsw in OpenSearchDocumentStore (#2227)
* fix ef_search param for hnsw

* Update Documentation & Code Style

* adjust ef_search param if index exists

* run black

* Fix label index recreation

* fix merge conflict

* merge source branch 'master' into fix_ef_search_param

* fix pylint issue

* fix test_pipeline_components test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 20:33:21 +01:00
MichelBartels
6918d5b79e
Fix missing embeddings not skipped if filters are used (#2230)
* fix skip embeddings param for elasticsearch when filters are specified

* Update Documentation & Code Style

* add test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 18:40:34 +01:00
MichelBartels
f4efc008f4
Adding extended meta data filtering support for InMemoryDocumenStore (#2120)
* add filter classes

* update filter comments

* Add util classes for converting filters (#2123)

* Apply Black

* reintroduce eval functions to filter ops

* Update documentation

* update to latest pymilvus version

* Apply Black

* fixing type hints

* Apply Black

* update write_documents method of milvus2 doc store

* remove unnecessary method

* update init

* remove changes to milvus 2 as they are part of other PR

* remove changes to milvus 2 as they are part of other PR

* updating doc strings to match elastic search filter doc

* Update Documentation & Code Style

* add support for case where there is no meta data defined for key

* update behaviour in case of field not existing in entry

* Update Documentation & Code Style

* add test for InMemoryDocumentStore extended meta data filtering

* make type hint more precise

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-02-22 17:44:58 +01:00
tstadel
fe03ca70de
Fix Pipeline.components (#2215)
* add components property, improve get_document_store()

* Update Documentation & Code Style

* use pipeline.get_document_store() instead of retriever.document_store

* add tests

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 15:01:07 +01:00
MichelBartels
116fe2db26
Extend meta data support for SQLDocumentStore (#2199)
* update remaining occurences of get_connection

* first commit to add extended metadata filtering support to sql

* fix bugs

* adding sql doc store instead of milvus

* removing updates to milvus2 from other PR

* fixing not operator

* delete left over line

* remove unnecessary import

* Update Documentation & Code Style

* fix circular import

* fix left over merge conflict

* Update Documentation & Code Style

* fix abstract class

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 20:40:32 +01:00
Sara Zan
2a840ee248
YAML versioning (#2209)
* Make YAML files get the same version as Haystack and throw warning at load in case of mismatch

* Update version of most YAMLs in the codebase (aesthethic chamge, only to avoid the warning).

* Remove quotes from version in tests

* Fix version in generate_json_schema.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 12:22:37 +01:00
bogdankostic
2a674eaff7
Support more data types and extended filters in WeaviateDocStore (#2143)
* Support more data types and extended filters in WeaviateDocStore

* Adapt types to extended filters

* Update Documentation & Code Style

* Fix mypy

* Fix type of filters

* Update Documentation & Code Style

* Add Docstrings for BaseDocStore

* Update Documentation & Code Style

* Add + prettify DocStrings

* Update Documentation & Code Style

* Fix types

* Update Documentation & Code Style

* Remove import of TypedDict

* Fix tests

* Update Documentation & Code Style

* Fix circular import

* Fix inversion of not operation + add test case

* Fix mypy

* Update Documentation & Code Style

* Apply black

* Use convert_date_to_rfc3339 instead of datetime.fromisoformat

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-18 08:55:17 +01:00
tstadel
ed6e64494e
Fix typo in save_to_deepset_cloud() (#2189)
* fix typo in method name

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 09:07:58 +01:00
tstadel
db4d6f43ba
Add tests on MultiLabel's meta and filter aggregation (#2169) 2022-02-11 17:42:47 +01:00
tstadel
9e18239e3b
pipeline.save_to_deepset_cloud() (#2145)
* add list_pipelines_on_deepset_cloud()

* add Pipeline.save_to_deepset_cloud()

* apply black

* fix imports

* Update Documentation & Code Style

* add load_from_config

* Update Documentation & Code Style

* fix pipeline name for indexing pipeline

* add tests

* Update Documentation & Code Style

* handle deployed pipelines

* make single pipeline config info requests instead of loading all infos

* make ROOT_NODE_TO_PIPELINE_NAME global

* better response validation for saving and updating pipeline configs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-11 12:50:53 +01:00
mathislucka
f11494a9d0
Join node should allow reciprocal rank fusion as additional merging method (#2133)
* join node should allow reciprocal rank fusion

* Update Documentation & Code Style

* add missing merging mode

* tuples are immutable

* take correct results from pipeline

* Update Documentation & Code Style

* Simple docstrings, use ValueError

* Use K=60

* Minor refactoring

* precalculate expected result in test

* Update Documentation & Code Style

* refactor to make more clear

* rm unused imports

* tests should test only one thing

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dmigo <d.f.goryunov@gmail.com>
2022-02-10 16:58:40 +01:00
tstadel
1bdd1f48fd
Fix windows ci tests (#2144)
* move commandline args to global conftest

* correct test exclude paths

* Update Documentation & Code Style

* exclude test_generator_pipeline_with_translator from windows ci

* exclude further oom tests

* enable log_cli

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 21:29:05 +01:00
Julian Risch
7fab027bf0
Evaluating a pipeline consisting only of a reader node (#2132)
* pass documents as extra param to eval

* pass documents via labels to eval

* rename param in docs

* Update Documentation & Code Style

* Revert "rename param in docs"

This reverts commit 2f4c2ec79575e9dd33a8300785f789a327df36f4.

* Revert "pass documents via labels to eval"

This reverts commit dcc51e41f2637d093d81c7d193b873c17c36b174.

* simplify iterating through labels and docs

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 09:18:58 +01:00
tstadel
1e3edef803
List all pipeline(_configs) on Deepset Cloud (#2102)
* add list_pipelines_on_deepset_cloud()

* Apply Black

* refactor auto paging and throw DeepsetCloudErrors

* Apply Black

* fix mypy findings

* Update documentation

* Fix merge error on pipelines.md

* Update Documentation & Code Style

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-08 20:35:25 +01:00