106 Commits

Author SHA1 Message Date
Zoltan Fedor
f4128d3581
Adding support for additional distance/similarity metrics for Weaviate (#3001)
* Adding support for additional distance metrics for Weaviate

Fixes #3000

* Updating the docs

* Fixing error texts

* Fixing issues raised by the review

* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`

* [EMPTY] Re-trigger CI

* Fixing things based on review

* [EMPTY] Re-trigger CI
2022-08-11 09:48:21 +02:00
Massimiliano Pippi
e7627c3f8b
Use opensearch-py in OpenSearchDocumentStore (#2691)
* add Opensearch extras

* let OpenSearchDocumentStore use opensearch-py

* Update Documentation & Code Style

* fix a bug found after adding tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-28 10:04:49 +02:00
Sara Zan
6b39fbd39c
Mocking Pinecone tests (#2778)
* Integrating the mock into conftest.py

* re-enable workflow

* delete_all

* Update Documentation & Code Style

* remove ValueError

* Add empty response

* wrong condition

* return response

* revert removal of delete_all

* change mock

* Update Documentation & Code Style

* test for rest api, to revert

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-14 20:03:33 +02:00
Malte Pietsch
ba08fc86f5
Add node to use OpenAI's GPT-3 for QA (#2605)
* first draft of openai node for QA

* Update Documentation & Code Style

* fix mypy. add node to inits

* Update Documentation & Code Style

* fix linter

* Adapt OpenAIGenerator to completions endpoint

* Update Documentation & Code Style

* Fix pylint

* Fix doc strings

* Make use of temperature

* Make use of api key in tests

* Adapt doc strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-07-08 13:59:27 +02:00
bogdankostic
195aed942f
Add update_document_meta to InMemoryDocumentStore (#2689)
* Add update_document_meta to InMemoryDocumentStore

* Fix typo

* Update Documentation & Code Style

* Add update_document_meta to BaseDocumentStore

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Add update_document_meta to MockDocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-07 15:44:07 +02:00
Patrick Deutschmann
1db3fd0942
Add support for Multi-Hop Dense Retrieval (#2571)
* Implement MDR

* Adapt conftest to new MDR signature

* Update Documentation & Code Style

* Change signature of queries param in batch methods of MDR like in #2575

* Update Documentation & Code Style

* Rename MultihopDenseRetriever to MultihopEmbeddingRetriever

* Fix filters in retrieve_batch

* Add docstring for MultihopEmbeddingRetriever.__init__

* Update Documentation & Code Style

* Revert forward signature of TextSimilarityHead

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-05 11:31:11 +02:00
Sara Zan
54518ac790
[CI Refactoring] Refactor Document fixtures in tests (#2577)
* Refactor document fixtures

* Add embedding files

* Update Documentation & Code Style

* Indentation issue

* Update Documentation & Code Style

* Fix type conversion in conftest.py

* Update Documentation & Code Style

* mypy on sql.py

* mypy on crawler.py

* mypy on pinecone.py

* Adapt retriever tests

* Update Documentation & Code Style

* mypy on crawler.py

* Update Documentation & Code Style

* mypy on crawler.py again

* Update Documentation & Code Style

* mypy fix was too rough

* Fix some more tests

* Update Documentation & Code Style

* Skip meaningless test on FilterRetriever

* Make embedding values less specific

* Update Documentation & Code Style

* Use stable IDs in retriever tests that depend on it

* Remove needless fixtures

* docs_with_ids

* Update Documentation & Code Style

* Typo

* Fix retriever tests

* Fix reader tests

* Update Documentation & Code Style

* Workaround #2626

* Update Documentation & Code Style

* Fix label generator tests

* Reorder vectors

* remove print

* Update Documentation & Code Style

* Update Documentation & Code Style

* git tags leftover

* Update Documentation & Code Style

* fix last failing test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 18:22:48 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring (#2576)
* Unify CI tests (from #2466)

* Update Documentation & Code Style

* Change folder names

* Fix markers list

* Remove marker 'slow', replaced with 'integration'

* Soften children check

* Start ES first so it has time to boot while Python is setup

* Run the full workflow

* Try to make pip upgrade on Windows

* Set KG tests as integration

* Update Documentation & Code Style

* typo

* faster pylint

* Make Pylint use the cache

* filter diff files for pylint

* debug pylint statement

* revert pylint changes

* Remove path from asserted log (fails on Windows)

* Skip preprocessor test on Windows

* Tackling Windows specific failures

* Fix pytest command for windows suites

* Remove \ from command

* Move poppler test into integration

* Skip opensearch test on windows

* Add tolerance in reader sas score for Windows

* Another pytorch approx

* Raise time limit for unit tests :(

* Skip poppler test on Windows CI

* Specify to pull with FF only in docs check

* temporarily run the docs check immediately

* Allow merge commit for now

* Try without fetch depth

* Accelerating test

* Accelerating test

* Add repository and ref alongside fetch-depth

* Separate out code&docs check from tests

* Use setup-python cache

* Delete custom action

* Remove the pull step in the docs check, will find a way to run on bot commits

* Add requirements.txt in .github for caching

* Actually install dependencies

* Change deps group for pylint

* Unclear why the requirements.txt is still required :/

* Fix the code check python setup

* Install all deps for pylint

* Make the autoformat check depend on tests and doc updates workflows

* Try installing dependencies in another order

* Try again to install the deps

* quoting the paths

* Ad back the requirements

* Try again to install rest_api and ui

* Change deps group

* Duplicate haystack install line

* See if the cache is the problem

* Disable also in mypy, who knows

* split the install step

* Split install step everywhere

* Revert "Separate out code&docs check from tests"

This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.

* Add back the action

* Proactive support for audio (see text2speech branch)

* Fix label generator tests

* Remove install of libsndfile1 on win temporarily

* exclude audio tests on win

* install ffmpeg for integration tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00
Vladimir Blagojevic
e10a3fba74
Add Generative Pseudo Labeling (#2388) 2022-06-02 10:12:47 -04:00
bogdankostic
61d9429c25
Simplify loading of EmbeddingRetriever (#2619)
* Infer model format for EmbeddingRetriever automatically

* Update Documentation & Code Style

* Adapt conftest to automatic inference of model_format

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Adapt tutorials

* Update Documentation & Code Style

* Add test for similarity scores with sentence transformers

* Adapt doc string and warning message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 15:05:29 +02:00
bogdankostic
867695ad0c
Change signature of queries param in batch methods (#2575)
* Change signature of queries param in batch methods

* Update Documentation & Code Style

* Fix mypy

* Remove unused import

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 12:33:45 +02:00
bogdankostic
738e008020
Add run_batch method to all nodes and Pipeline to allow batch querying (#2481)
* Add run_batch methods for batch querying

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Fix mypy

* Fix linter

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix mypy

* Fix rest api test

* Update Documentation & Code Style

* Add Doc strings

* Update Documentation & Code Style

* Add batch_size as attribute to nodes supporting batching

* Adapt error messages

* Adapt type of filters in retrievers

* Revert change about truncation_warning in summarizer

* Unify multiple_doc_lists tests

* Use smaller models in extractor tests

* Add return types to JoinAnswers and RouteDocuments

* Adapt return statements in reader's run_batch method

* Allow list of filters

* Adapt error messages

* Update Documentation & Code Style

* Fix tests

* Fix mypy

* Adapt print_questions

* Remove disabling warning about too many public methods

* Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py

* Add type check

* Update Documentation & Code Style

* Adapt tutorial 11

* Update Documentation & Code Style

* Add query_batch method for DCDocStore

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 11:11:00 +02:00
bogdankostic
4581b91e83
Make DeepsetCloudDocumentStore work with non-existing index (#2513)
* Make DeepsetCloudDocumentStore work with non-existing index

* Update Documentation & Code Style

* Add tests

* Update Documentation & Code Style

* Fix tests, adapt warning messages + lowercase deepset

* Update Documentation & Code Style

* Fix typo in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 15:21:35 +02:00
Sara Zan
f8e02310bf
Validate YAML files without loading the nodes (#2438)
* Remove BasePipeline and make a module for RayPipeline

* Can load pipelines from yaml, plenty of issues left

* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it

* Fix pipeline tests

* Move some tests out of test_pipeline.py and create MockDenseRetriever

* myoy and pylint (silencing too-many-public-methods)

* Fix issue found in some yaml files and in schema files

* Fix paths to YAML and fix some typos in Ray

* Fix eval tests

* Simplify MockDenseRetriever

* Fix Ray test

* Accidentally pushed merge coinflict, fixed

* Typo in schemas

* Typo in _json_schema.py

* Slightly reduce noisyness of version validation warnings

* Fix version logs tests

* Fix version logs tests again

* remove seemingly unused file

* Add check and test to avoid adding the same node to the pipeline twice

* Update Documentation & Code Style

* Revert config to pipeline_config

* Remo0ve unused import

* Complete reverting to pipeline_config

* Some more stray config=

* Update Documentation & Code Style

* Feedback

* Move back other_nodes tests into pipeline tests temporarily

* Update Documentation & Code Style

* Fixing tests

* Update Documentation & Code Style

* Fixing ray and standard pipeline tests

* Rename colliding load() methods in dense retrievers and faiss

* Update Documentation & Code Style

* Fix mypy on ray.py as well

* Add check for no root node

* Fix tests to use load_from_directory and load_index

* Try to workaround the disabled add_node of RayPipeline

* Update Documentation & Code Style

* Fix Ray test

* Fix FAISS tests

* Relax class check in _add_node_to_pipeline_graph

* Update Documentation & Code Style

* Try to fix mypy in ray.py

* unused import

* Try another fix for Ray

* Fix connector tests

* Update Documentation & Code Style

* Fix ray

* Update Documentation & Code Style

* use BaseComponent.load() in pipelines/base.py

* another round of feedback

* stray BaseComponent.load()

* Update Documentation & Code Style

* Fix FAISS tests too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-04 17:39:06 +02:00
Tuana Celik
e2b85e2913
Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever (#2461)
* Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever

* adding missed init file

* Update Documentation & Code Style

* fixed docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-29 10:16:02 +02:00
tstadel
7498c7c6fb
Fix and use delete_index instead of delete_documents in tests (#2453)
* use delete_index instead of delete_documents in tests

* fix delete_index

* fix  delete_index() in memory and milvus

* fix imports

* fix memory keyerrors

* Update Documentation & Code Style

* increase timeout for pinecone tests to 60 minutes

* clean get_document_store()

* use recreate_index in tests

* Update Documentation & Code Style

* fix tests

* fix remaining tests

* log index deleted

* fix test_eval_pipeline

* simplify existing index detection in weaviate

* delete label_index on recreate_index for pinecone and milvus

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 19:06:30 +02:00
Tuana Celik
d49e92e21c
ElasticsearchRetriever to BM25Retriever (#2423)
* change class names to bm25

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update Documentation & Code Style

* Add back all_terms_must_match

* fix syntax

* Update Documentation & Code Style

* Update Documentation & Code Style

* Creating a wrapper for old ES retriever with deprecated wrapper

* Update Documentation & Code Style

* New method for deprecating old ESRetriever

* New attempt for deprecating the ESRetriever

* Reverting to the simplest solution - warning logged

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-04-26 16:09:39 +02:00
Sara Zan
929c685cda
Forbid usage of *args and **kwargs in any node's __init__ (#2362)
* Add failing test

* Remove `**kwargs` from docstores' `__init__` functions (#2407)

* Remove kwargs from ESDocStore subclasses

* Remove kwargs from subclasses of SQLDocumentStore

* Remove kwargs from Weaviate

* Revert change in pinecone

* Fix tests

* Fix retriever test wirh weaviate

* Change Exception into DocumentStoreError

* Update Documentation & Code Style

* Remove `**kwargs` from `FARMReader` (#2413)

* Remove FARMReader kwargs without trying to replace them functionally

* Update Documentation & Code Style

* enforce same index values before and after saving/loading eval dataframes (#2398)

* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)

* Add tests for missing init and super

* Update Documentation & Code Style

* change in with endswith

* Move test in pipeline.py and change test in pipeline_yaml.py

* Update Documentation & Code Style

* Use caplog to test the warning

* Update Documentation & Code Style

* move tests into test_pipeline and use get_config

* Update Documentation & Code Style

* Unmock version name

* Improve variadic args test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-14 16:42:02 +02:00
Sara Zan
ae712fe6bf
Upgrade weaviate-client to 3.3.3 and fix get_all_documents (#1895)
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents

* Add type

* Update Weaviate version on the CI

* Fix bug on get_document_count where there are no documents

* Add more info in the docstrings of get_all_documents and get_all_documents_generator

* Add latest docstring and tutorial changes

* Apply Black

* Update Documentation & Code Style

* Trigger pipeline

* Update Documentation & Code Style

* Include StefanBogdan feedback

* Fix mypy issues and LogicalFilterClause

* Add more types

* Update Documentation & Code Style

* update setup.cfg

* Upgrade weaviate containers too

* Allow to filter for content field in Weaviate

* Use convert_to_weaviate instead of convert_to_pinecone

* Fix _get_all_documents_in_index

* Update docstrings and docs

* Catching an exception in get_document(s)_by_id

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-04-01 15:37:34 +03:00
tstadel
3561037e82
Use cache for hf requests during CI (#2379)
* increase all_close tolerance for milvus2, improve assertion infos

* use request-cache for huggingface
2022-03-31 12:36:45 +02:00
tstadel
ca86cc834d
Integrate BEIR (#2333)
* introduce eval_beir() to Pipeline

* add beir dependency

* Update Documentation & Code Style

* top_k_values added + refactoring

* Update Documentation & Code Style

* enable titles during beir eval

* Update Documentation & Code Style

* raise HaystackError instead of PipelineError

* get rid of forced dedicated index

* minor docstring and comment fixes

* show warning on default index deletion

* Update Documentation & Code Style

* add delete_index to MockDocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 19:04:28 +01:00
James Briggs
8cd73a9d20
Add PineconeDocumentStore (#2254)
* added core install and functionality of pinecone doc store (init, upsert, query, delete)

* implemented core functionality of Pinecone doc store

* Update Documentation & Code Style

* updated filtering to use Haystack filtering and reduced default batch_size

* Update Documentation & Code Style

* removed debugging code

* updated Pinecone filtering to use filter_utils

* removed uneeded methods and minor tweaks to current methods

* fixed typing issues

* Update Documentation & Code Style

* Allow filters in al methods except get_embedding_count

* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

* Add pinecone to test suite

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Add pinecone to docstores dependendcy

* Add PineconeDocStore to API Documentation

* Add missing comma

* Update Documentation & Code Style

* Adapt format of doc strings

* Update Documentation & Code Style

* Set API key as environment variable

* Skip Pinecone tests in forks

* Add sleep after deleting index

* Add sleep after deleting index

* Add sleep after creating index

* Add check if index ready

* Remove printing of index stats

* Create new index for each pinecone test

* Use RestAPI instead of Python API for describe_index_stats

* Fix accessing describe_index_stats

* Remove usages of describe_index_stats

* Run pinecone tests separately

* Update Documentation & Code Style

* Add pdftotext to pinecone tests

* Remove sleep from doc store fixture

* Add describe_index_stats

* Remove unused imports

* Use pull_request_target trigger

* Revert use pull_request_target trigger

* Remove set_config

* Add os to conftest

* Integrate review comments

* Set include_values to False

* Remove quotation marks from pinecone.Index type

* Update Documentation & Code Style

* Update Documentation & Code Style

* Fix number of args in error messages

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-03-21 16:24:09 +01:00
Julian Risch
ac5617e757
Add basic telemetry features (#2314)
* add basic telemetry features

* change pipeline_config to _component_config

* Update Documentation & Code Style

* add super().__init__() calls to error classes

* make posthog mock work with python 3.7

* Update Documentation & Code Style

* update link to docs web page

* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)

* add comment on send_event in BaseComponent.init() and fix mypy

* mock NonPrivateParameters and fix pylint undefined-variable

* Update Documentation & Code Style

* check model path contains multiple /

* add test for writing to file

* add test for en-/disable telemetry

* Update Documentation & Code Style

* merge file deletion methods and ignore pylint global statement

* Update Documentation & Code Style

* set env variable in demo to activate telemetry

* fix mock of HAYSTACK_TELEMETRY_ENABLED

* fix mypy and linter

* add CI as env variable to execution contexts

* remove threading, add test for custom error event

* Update Documentation & Code Style

* simplify config/log file deletion

* add test for final event being sent

* force writing config file in test

* make test compatible with python 3.7

* switch to posthog production server

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 11:58:51 +01:00
Sara Zan
11cf94a965
Pipeline's YAML: syntax validation (#2226)
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes

* Make error composition work properly

* Clarify typing

* Help mypy a bit more

* Update Documentation & Code Style

* Enable autogenerated docs for Milvus1 and 2 separately

* Revert "Enable autogenerated docs for Milvus1 and 2 separately"

This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.

* Update Documentation & Code Style

* Re-enable 'additionalProperties: False'

* Add pipeline.type to JSON Schema, was somehow forgotten

* Disable additionalProperties on the pipeline properties too

* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)

* Cal super in PipelineValidationError

* Improve _read_pipeline_config_from_yaml's error handling

* Fix generate_json_schema.py to include document stores

* Fix json schemas (retro-fix 1.1.0 again)

* Improve custom errors printing, add link to docs

* Add function in BaseComponent to list its subclasses in a module

* Make some document stores base classes abstract

* Add marker 'integration' in pytest flags

* Slighly improve validation of pipelines at load

* Adding tests for YAML loading and validation

* Make custom_query Optional for validation issues

* Fix bug in _read_pipeline_config_from_yaml

* Improve error handling in BasePipeline and Pipeline and add DAG check

* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)

* Simplify errors slightly

* Add some YAML validation tests

* Remove load_from_config from BasePipeline, it was never used anyway

* Improve tests

* Include json-schemas in package

* Fix conftest imports

* Make BasePipeline abstract

* Improve mocking by making the test independent from the YAML version

* Add exportable_to_yaml decorator to forget about set_config on mock nodes

* Fix mypy errors

* Comment out one monkeypatch

* Fix typing again

* Improve error message for validation

* Add required properties to pipelines

* Fix YAML version for REST API YAMLs to 1.2.0

* Fix load_from_yaml call in load_from_deepset_cloud

* fix HaystackError.__getattr__

* Add super().__init__()in most nodes and docstore, comment set_config

* Remove type from REST API pipelines

* Remove useless init from doc2answers

* Call super in Seq3SeqGenerator

* Typo in deepsetcloud.py

* Fix rest api indexing error mismatch and mock version of JSON schema in all tests

* Working on pipeline tests

* Improve errors printing slightly

* Add back test_pipeline.yaml

* _json_schema.py supports different versions with identical schemas

* Add type to 0.7 schema for backwards compatibility

* Fix small bug in _json_schema.py

* Try alternative to generate json schemas on the CI

* Update Documentation & Code Style

* Make linux CI match autoformat CI

* Fix super-init-not-called

* Accidentally committed file

* Update Documentation & Code Style

* fix test_summarizer_translation.py's import

* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args

* Fix json schema for ray tests too

* Update Documentation & Code Style

* Reintroduce validation

* Usa unstable version in tests and rest api

* Make unstable support the latest versions

* Update Documentation & Code Style

* Remove needless fixture

* Make type in pipeline optional in the strings validation

* Fix schemas

* Fix string validation for pipeline type

* Improve validate_config_strings

* Remove type from test p[ipelines

* Update Documentation & Code Style

* Fix test_pipeline

* Removing more type from pipelines

* Temporary CI patc

* Fix issue with exportable_to_yaml never invoking the wrapped init

* rm stray file

* pipeline tests are green again

* Linux CI now needs .[all] to generate the schema

* Bugfixes, pipeline tests seems to be green

* Typo in version after merge

* Implement missing methods in Weaviate

* Trying to avoid FAISS tests from running in the Milvus1 test suite

* Fix some stray test paths and faiss index dumping

* Fix pytest markers list

* Temporarily disable cache to be able to see tests failures

* Fix pyproject.toml syntax

* Use only tmp_path

* Fix preprocessor signature after merge

* Fix faiss bug

* Fix Ray test

* Fix documentation issue by removing quotes from faiss type

* Update Documentation & Code Style

* use document properly in preprocessor tests

* Update Documentation & Code Style

* make preprocessor capable of handling documents

* import document

* Revert support for documents in preprocessor, do later

* Fix bug in _json_schema.py that was breaking validation

* re-enable cache

* Update Documentation & Code Style

* Simplify calling _json_schema.py from the CI

* Remove redundant ABC inheritance

* Ensure exportable_to_yaml works only on implementations

* Rename subclass to class_ in Meta

* Make run() and get_config() abstract in BasePipeline

* Revert unintended change in preprocessor

* Move outgoing_edges_input_node check inside try block

* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX

* Add check for a RecursionError on validate_config_strings

* Address usages of _pipeline_config in data silo and elasticsearch

* Rename _pipeline_config into _init_parameters

* Fix pytest marker and remove unused imports

* Remove most redundant ABCs

* Rename _init_parameters into _component_configuration

* Remove set_config and type from _component_configuration's dict

* Remove last instances of set_config and replace with super().__init__()

* Implement __init_subclass__ approach

* Simplify checks on the existence of _component_configuration

* Fix faiss issue

* Dynamic generation of node schemas & weed out old schemas

* Add debatable test

* Add docstring to debatable test

* Positive diff between schemas implemented

* Improve diff printing

* Rename REST API YAML files to trigger IDE validation

* Fix typing issues

* Fix more typing

* Typo in YAML filename

* Remove needless type:ignore

* Add tests

* Fix tests & validation feedback for accessory classes in custom nodes

* Refactor RAGeneratorType out

* Fix broken import in conftest

* Improve source error handling

* Remove unused import in test_eval.py breaking tests

* Fix changed error message in tests matches too

* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions

* Fix path to generate_openapi_specs.py in autoformat.yml

* Update Documentation & Code Style

* Add test for FAISSDocumentStore-like situations (superclass with init params)

* Update Documentation & Code Style

* Fix indentation

* Remove commented set_config

* Store model_name_or_path in FARMReader to use in DistillationDataSilo

* Rename _component_configuration into _component_config

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:17:26 +01:00
Vladimir Blagojevic
6c0094b5ad
Update LFQA with the latest LFQA seq2seq and retriever models (#2210)
* Register BartEli5Converter for vblagoje/bart_lfqa model

* Update LFQA unit tests

* Update LFQA tutorials
2022-03-08 15:11:41 +01:00
tstadel
dde9d59271
fix pip backtracking issue (#2281)
* fix pip backtracking issue

* restrict azure-core version

* Remove the trailing comma

* Add skip_magic_trailing_comma in pyproject.toml for pydoc compatibility

* Pin pydoc-markdown _again_

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-07 19:25:33 +01:00
bogdankostic
447baf77ef
Fix skipping of tests using document stores (#2268)
* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 15:19:27 +01:00
MichelBartels
2c423ba063
Introduce support for pymilvus>=2.0.0 (#2126)
* update remaining occurences of get_connection

* fix milvus2 import and fix wrong extra references

* change MilvusDocumentStore to Milvus1DocumentStore

* update milvus docstrings to reflect updated dependency management

* enable milvus 2 tests

* fix milvus2 env variable processing

* fix dropping collections for each milvus 2 test

* make Milvus 2 doc store tests work

* allow user to specify consistency level

* Fist attempt at running Milvus2 in the CI

* Install the correct pymilvus

* add batch deletion for milvus2

* change default from milvus 1 to milvus 2

* make milvus2 the default in the docstores extra

* Switch milvus1 and milvus2 in base test run on CI

* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'

* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py

* Enable autogenerated docs for Milvus1 and 2 separately

* Partial fix to docstring of Milvus2DocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-24 17:43:38 +01:00
bogdankostic
2a674eaff7
Support more data types and extended filters in WeaviateDocStore (#2143)
* Support more data types and extended filters in WeaviateDocStore

* Adapt types to extended filters

* Update Documentation & Code Style

* Fix mypy

* Fix type of filters

* Update Documentation & Code Style

* Add Docstrings for BaseDocStore

* Update Documentation & Code Style

* Add + prettify DocStrings

* Update Documentation & Code Style

* Fix types

* Update Documentation & Code Style

* Remove import of TypedDict

* Fix tests

* Update Documentation & Code Style

* Fix circular import

* Fix inversion of not operation + add test case

* Fix mypy

* Update Documentation & Code Style

* Apply black

* Use convert_date_to_rfc3339 instead of datetime.fromisoformat

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-18 08:55:17 +01:00
tstadel
1bdd1f48fd
Fix windows ci tests (#2144)
* move commandline args to global conftest

* correct test exclude paths

* Update Documentation & Code Style

* exclude test_generator_pipeline_with_translator from windows ci

* exclude further oom tests

* enable log_cli

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 21:29:05 +01:00
Sara Zan
957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00
bogdankostic
f062911040
Extend metadata filtering support in ElasticsearchDocumentStore (#2108)
* Add extended filtering to ESDocumentStore

* Add Docstrings

* Fix definition of filter queries

* Fix mypy

* Add tests

* Add latest docstring and tutorial changes

* Adapt Docstrings

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Adapt tests to added test_docs

* Add filtering utils for same representation in all doc stores

* Apply balck formatting

* Update documentation

* Fix mypy

* Apply Black

* Fix mypy

* Adopt Doc Strings

* Add more tests

* Apply Black

* Allow filtering in OpenSearchDocStore

* Update documentation

* Adapt Docstrings

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 13:43:12 +01:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
Sara Zan
3a6e64b2a3
Make FileTypeClassifier more flexible (#2101)
* Make FileTypeClassifier more flexible

* Make supported_types a init parameter

* Add tests and fix a couple of bugs

* Formatting

* Fix mypy

* Implement feedback
2022-02-02 17:51:04 +01:00
Sara Zan
d470b9d0bd
Improve dependency management (#1994)
* Fist attempt at using setup.cfg for dependency management

* Trying the new package on the CI and in Docker too

* Add composite extras_require

* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports

* Fix bug on class import and rephrase error message

* Introduce typing for optional modules and add type: ignore in sparse.py

* Include importlib_metadata backport for py3.7

* Add colab group to extra_requires

* Fix pillow version

* Fix grpcio

* Separate out the crawler as another extra

* Make paths relative in rest_api and ui

* Update the test matrix in the CI

* Add try catch statements around the optional imports too to account for direct imports

* Never mix direct deps with self-references and add ES deps to the base install

* Refactor several paths in tests to make them insensitive to the execution path

* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix

* Wrap pdf conversion utils into safe_import

* Update some tutorials and rever Milvus1 as default for now, see #2067

* Fix mypy config


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 18:12:55 +01:00
tstadel
8a32d8da92
Introduce readonly DCDocumentStore (without labels support) (#1991)
* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-25 20:36:28 +01:00
Sara Zan
e28bf618d7
Implement proper FK in MetaDocumentORM and MetaLabelORM to work on PostgreSQL (#1990)
* Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints

* update_document_meta() was not using index properly

* Exclude ES and Memory from the cosine_sanity_check test

* move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 13:48:58 +01:00
MichelBartels
3e4dbbb32c
Align similarity scores across document stores (#1967)
* align document store similarity functions

* remove unnecessary imports

* undone accidental change

* stopped weaviate from pretending to support dot product similarity

* stopped weaviate from pretending to support dot product similarity

* Add latest docstring and tutorial changes

* fix fixture params for document stores

* use cosine similarity for most tests

* fix cosine similarity test

* fix faiss test

* fix weaviate test

* fix accidental deletion

* fix document_store fixture

* test fix; shouldn't be merged

* fix test_normalize_embeddings_diff_shapes

* probably a better fix

* fix for parameter combinations

* revert new pytest_generate_tests functionality

* simplify pytest_generate_tests

* normalize embeddings for test_dpr_embedding

* add to faiss doc that embeddings are normalized

* Add latest docstring and tutorial changes

* remove unnecessary parameters and add comments

* simplify two lines of memory.py into one

* test similarity scores with smaller language model

* fix test_similarity_score


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-12 19:28:20 +01:00
Mathew Kuriakose
a44b6c18c0
Unify vector_dim and embedding_dim parameter in Document Store (#1922)
* Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores

* Unit test cases updated to use `embedding_dim` instead of `vector_dim`

* Unit test case update to use embedding_dim instead of vector_dim

* Add latest docstring and tutorial changes

* Put usage of `vector_dim` param in same if-block as corresponding warning

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-01-10 18:10:32 +01:00
bogdankostic
45df18c416
Add RCIReader for TableQA (#1909)
* Add RCIReader

* Add latest docstring and tutorial changes

* Add Doc Strings

* Add latest docstring and tutorial changes

* Add Tests

* Add Doc Strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 16:59:24 +01:00
tstadel
fc8df2163d
Fix Windows CI OOM (#1878)
* set fixture scope to "function"

* run FARMReader without multiprocessing

* dispose off ray after tests

* run most expensive tasks first in test files

* run expensive tests first

* run garbage collector between tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-22 17:20:23 +01:00
KUNPENG GUO
160f81aaa3
Fix bug ranker: wrong lambda function (#1824)
* Fix bug ranker: wrong lambda function

The zip function used in line 110 intends to choose the logits array to be the key for the lambda function while it should be the first/second logit of the logit array which corresponds to the classification label (has_answer)

* Use label 1 as has_answer label

* generic ranker (add if-cond for logits vector shape)

* remove test code

* remove test code...

* add two_logits test case for ranker module.

* complete the documentation of ranker, support rankers with 1 or 2 logits as output
2021-12-06 17:13:57 +01:00
bogdankostic
5e36988b31
Support Tables in all DocumentStores (#1744)
* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore

* Add support for WeaviateDocumentStore

* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config

* Add latest docstring and tutorial changes

* Represent tables in WeaviateDocumentStore as nested lists

* Fix mypy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-17 16:41:04 +01:00
Sara Zan
ea3abd305b
Fix a few details of some tutorials (#1733)
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15

* Add a type check in 'print_answers'

* Add same checks to print_documents and print_questions

* Make RAGenerator return Answers instead of dictionaries

* Fix RAGenerator tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-12 16:44:28 +01:00
tstadel
14515a861b
Tutorial for DocumentClassifier at Index Time (#1697)
* basic example of document classifier in preprocessing logic

* add batch_size to TransformersDocumentClassifier

* complete tutorial16

* Add latest docstring and tutorial changes

* fix missing batch_size

* add notebook

* test for batch_size use added

* add tutorial 16 to headers.py

* Add latest docstring and tutorial changes

* make DocumentClassifier indexing pipeline rdy

* Add latest docstring and tutorial changes

* flexibility improvements for DocumentClassifier in Pipelines

* Add latest docstring and tutorial changes

* fix index time usage

* remove query from documentclassifier tests

* improve classification_field resolving + minor fixes

* Add latest docstring and tutorial changes

* tutorial 16 extended with zero shot and pipelines

* Add latest docstring and tutorial changes

* install graphviz in notebook

* Add latest docstring and tutorial changes

* remove convert_to_dicts

* Add latest docstring and tutorial changes

* Fix typo

* Add latest docstring and tutorial changes

* remove retriever from indexing pipeline

* Add latest docstring and tutorial changes

* fix save_to_yaml when using FileTypeClassifier

* emphasize the impact with zero shot classification

* Add latest docstring and tutorial changes

* adjust use_gpu to boolean in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-09 18:43:00 +01:00
Julian Risch
892ce4a760
Make weaviate more compliant to other doc stores (UUIDs and dummy embedddings) (#1656)
* create uuid and dummy embeddding in weaviate doc store

* handle and test for duplicate non-uuid-formatted ids in weaviate

* add uuid and dummy embedding to doc strings

* Add latest docstring and tutorial changes

* Upgrade weaviate

* Include weaviate in common doc store test cases

* Add latest docstring and tutorial changes

* Exclude weaviate doc store from eval tests

* Incorporate index name in uuid generation

* Ignore mypy error

* Fix typo

* Restore DOCS without uuid and embeddings generated by weaviate

* Supply docs for retriever tests as fixture

* Limit scope of fixture to function instead of session

* Add comments

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-04 09:27:12 +01:00
fingoldo
27793814cf
Cosine similarity for the rest of DocStores. (#1569)
* Added uniform normalization method to each of the DocStores (implemented), so that now Milvus and Weaviate doc stores can use cosine similarity, plus future method for making existing embeddings normaziled (empty for now).

* Fixed a typo.

* Fixed lots of stuff. Performed local tests.

* Fixed scores representation for cosine. Assuming Weavieate's rep needs no change.

* fixes as per discussion

* Trigger CI

* resolving conflicts

* small typo

* fixed a param type

* cleaned some conflicts resolving left overs

* commented out fastmath for a moment

* fixing tests

* added docstore for small vectors

* test

* fixed document_store_cosine_small

* cosine tests fixes

* fixed document_store_cosine_small

* fixed weaviate index name and lowered rtol for ES

* increased rtol

* added explicit doc_ids for weaviate, excluded ES, included Inmemory

* resolving mismatch

* fixing a typo

* flatten normalize_embedding()

* fix import for test

* standardize normalize_embeddings across doc stores

* Add latest docstring and tutorial changes

* going for the faster plain dot prod

Co-authored-by: fingoldo <fingoldo@gmail.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-01 13:42:32 +01:00
Lalit Pagaria
e5b4b62d75
Add CI for windows runner (#1458)
* Feat: Removing use of temp file while downloading archive from url along with adding CI for windows and mac platform

* Windows CI by default installing pytorch gpu hence updating CI to pick cpu version

* fixing mac cache build issue

* updating windows pip install command for torch

* another attempt

* updating ci

* Adding sudo

* fixing ls failure on windows

* another attempt to fix build issue

* Saving env variable of test files

* Adding debug log

* Github action differ on windows

* adding debug

* anohter attempt

* Windows have different ways to receive env

* fixing template

* minor fx

* Adding debug

* Removing use of json

* Adding back fromJson

* addin toJson

* removing print

* anohter attempt

* disabling parallel run at least for testing

* installing docker for mac runner

* correcting docker install command

* Linux dockers are not suported in windows

* Removing mac changes

* Upgrading pytorch

* using lts pytorch

* Separating win and ubuntu

* Install java 11

* enabling linux container env

* docker cli command

* docker cli command

* start elastic service

* List all service

* correcting service name

* Attempt to fix multiple test run

* convert to json

* another attempt to check

* Updating build cache step

* attempt

* Add tika

* Separating windows CI

* Changing CI name

* Skipping test which does not work in windows

* Skipping tests for windows

* create cleanup function in conftest

* adding skipif marker on tests

* Run windows PR on only push to master

* Addressing review comments

* Enabling windows ci for this PR

* Tika init is being called when importing tika function

* handling tika import issue

* handling tika import issue in test

* Fixing import issue

* removing tika fixure

* Removing fixture from tests

* Disable windows ci on pull request

* Add back extra pytorch install step

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-29 10:22:28 +02:00
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
bogdankostic
51acf779f2
Add TableTextRetriever (#1529)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* renam label.question -> label.query. Allow sorting of Answers.

* WIP primitives

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* Add MultimodalRetriever

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add methods to MultimodalRetriever

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* keep meta data of answers in reader.run()

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* make document=None for open domain labels

* add import

* fix print utils

* fix rest api

* Add methods and tests

* Add latest docstring and tutorial changes

* Fix mypy

* Add latest docstring and tutorial changes

* Add type hints and doc strings

* Make use of initialize_device_settings

* Move serialization of pd.DataFrame to schema.py

* Fix mypy

* Adapt Document's from_dict method

* Update docstrings

* Add latest docstring and tutorial changes

* Fix mypy

* Fix mypy

* Fix Document's from_dict method

* Fix Document's to_dict method

* Change handling of table metadata

* Add latest docstring and tutorial changes

* Change naming from Multimodal to TableText

* Turn off tokenizers_parallelism in retriever tests

* Add latest docstring and tutorial changes

* Remove turning off tokenizers_parallelism in retriever tests

* Adapt convert_es_hit_to_document

* Change embed_surrounding_context to embed_meta_fields

* Add latest docstring and tutorial changes

* Add check if torch.distributed is available

* Set n_gpu to 0 in training test

* Set HIP_LAUNCH_BLOCKING to 1

* Set HIP_LAUNCH_BLOCKING to "1"

* Set use_gpu to False

* Use DataParallel only if more than one device

* Remove --find-links=https://download.pytorch.org/whl/torch_stable.html

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 12:27:02 +02:00