188 Commits

Author SHA1 Message Date
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
ZanSara
90c877a559
bug: mypy should ignore files in test/ (#3894)
* exclude files in test/

* verify that the CI ignores test files

* dont fail in case of no files
2023-01-19 18:12:26 +01:00
ZanSara
9e457db2e9
test: add version deprecation fixture (#3851)
* add fixture

* Update test/conftest.py

* remove +2 and add tests

* few typos

* more cases

* Update test/conftest.py
2023-01-16 15:36:14 +01:00
Stefano Fiorucci
136928714c
refactor: remove deprecated parameters from Summarizer (#3740)
* remove deprecated parameters

* remove deprecation/removal test
2022-12-29 15:37:47 +05:30
Vladimir Blagojevic
9ebf164cfd
feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate (#3667)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-12-20 11:21:26 +01:00
Sebastian
25bf95d47f
Update table reader tests to include checking the score of answers. (#3641) 2022-12-07 07:30:49 -08:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever in InMemoryDocumentStore (#3561)
* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents
2022-11-22 09:24:52 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier output in SQLDocumentStore; remove _sql_session_rollback hack in tests (#3273)
* first draft

* fix

* fix

* move test to test_sql
2022-11-16 12:20:57 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x (#3552)
* remove milvus1

* leftover

* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests (#3541)
* refactor tests

* fix job

* revert

* revert

* revert

* use latest weaviate

* fix abstract methods signatures

* pass class_name to all the CRUD methods

* finish moving all the tests

* bump weaviate version

* raise, don't pass
2022-11-14 09:57:30 +01:00
Sara Zan
43b24fd1a7
fix: strip whitespaces safely from FARMReader's answers (#3526)
* remove .strip()

* check for right-side offset

* return the whitespace-cleaned answer

* lstrip, not rstrip :D

* remove int

* left_offset

* slightly refactor reader fixture

* extend test_output
2022-11-08 09:26:47 +01:00
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job (#3529)
* move dC tests to their own module and job

* restore global var

* revert
2022-11-04 17:05:10 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests (#3517)
* port SQL tests

* cleanup document_store_tests.py from sql tests

* leftover

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_base.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Sara Zan
f0be78c6a6
bug: remove useless import in conftest.py (#3362)
* Remove useless milvus import in conftest

* schemas

* schemas
2022-11-02 19:22:24 +05:30
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring (#3449)
* add new marker

* start using test hierarchies

* move ES tests into their own class

* refactor test workflow

* job steps

* add more tests

* move more tests

* more tests

* test labels

* add more tests

* Update tests.yml

* Update tests.yml

* fix

* typo

* fix es image tag

* map es ports

* try

* fix

* default port

* remove opensearch from the markers sorcery

* revert

* skip new tests in old jobs

* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Sebastian
8db7dfb884
refactor: TableReader (#3456)
* Refactoring table reader
2022-10-26 20:57:28 +02:00
Sebastian
59857cb492
feat: Speed up reader tests (#3476)
* Use a smaller reader where possible

* Change scope to module of reader to get faster load times
2022-10-26 19:04:18 +02:00
Sara Zan
05c68b6624
feat: add document_store to all BaseRetriever.retrieve() and BaseRetriever.retrieve_batch() implementations (#3379)
* add document_store to retrieve()]

* mypy & pylint

* pass docstore to embedding encoders

* schemas

* mypy and pylint

* fix tfidfretriever

* pylint

* mypy

* pylint

* fix tfidf

* mypy

* pylint

* schemas

* another fix for tfidf

* fix question generation tests

* remove docstore from embedding encoder signature

* pylint

* revert accidental test changes

* Apply suggestions from code review

* check for docstore similarity function only if the docstore is present

* check for docstore similarity function only if the docstore is present
2022-10-26 15:47:06 +02:00
Vladimir Blagojevic
5ca96357ff
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever (#3453) 2022-10-25 17:52:29 +02:00
Sebastian
93817f63b4
feat: Speed up integration tests (nodes) (#3408)
* Changed summarizer model to a smaller one (2GB to 500MB) to save on space and speed up the tests.

* Removed google pegasus from cache
2022-10-18 16:23:57 +02:00
Vladimir Blagojevic
159cd5a666
feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever (#3356) 2022-10-14 15:01:03 +02:00
Stefano Fiorucci
7290196c32
fix: allow same vector_id in different indexes for SQL-based Document stores (#3383)
* fix_multiple_indexes

* improve test names
2022-10-14 09:55:56 +02:00
Vladimir Blagojevic
6cb4e93965
refactor: remove Inferencer multiprocessing (#3283) 2022-10-04 14:08:23 +02:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine (#3217)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* split PR

* update docs

* manually revert tutorial doc change

* Fix embedding type

* set integration marker correctly

* make BaseDocumentStore.normalize_embedding static

* format

* fix handling of opensearch_faiss param

* fix merge

* add DenseRetriever typing

* organize imports in conftest.py

* organize imports in conftest.py (2)

* fix DenseRetriever import

* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running (#3263)
* fix milvus and faiss tests not running

* fix schema manually

* fix test_dpr_embedding test for milvus

* pip freeze on milvus tests

* fix milvus1 tests being executed: fix all_doc_stores order

* Revert "pip freeze on milvus tests"

This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.

* make infer_required_doc_store more robust

* don't skip tests without docstore requirements

* use markers for docstore tests
2022-09-22 17:46:49 +02:00
tstadel
b10e2c392e
chore: add DenseRetriever abstraction (#3252)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* update docs

* fix imports

* import  DenseRetriever normally

* update docs

* fix deepcopy of documents

* update schema

* Revert "update schema"

This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f.

* fix schema for ci manually
2022-09-21 19:08:54 +02:00
Vladimir Blagojevic
938e6fda5b
Classify pipeline's type based on its components (#3132)
* Add pipeline get_type mehod

* Add pipeline uptime

* Add pipeline telemetry event sending

* Send pipeline telemetry once a day (at most)

* Add pipeline invocation counter, change invocation counter logic

* Update allowed telemetry parameters - allow pipeline parameters

* PR review: add unit test
2022-09-21 14:53:42 +02:00
Stefano Fiorucci
89247b804c
refactor: make TransformersDocumentClassifier output consistent between different types of classification (#3224)
* make output consistent

* make output consistent

* added tests for details

* better tests

* Update test_document_classifier.py

* make black happy

* Update test_document_classifier.py

* Update test_document_classifier.py
2022-09-21 13:16:03 +02:00
Sara Zan
dcb132ba59
chore: remove f-strings from logs for performance reasons (#3212)
* Use the %s syntax on all debug messages

* Use the %s syntax on some more debug messages

* Use the %s syntax on info messages

* Use the %s syntax on warning messages

* Use the %s syntax on error and exception messages

* mypy

* pylint

* trogger tutorials execution in CI

* trigger tutorials execution on CI

* black

* remove embeddings from repr

* fix Document `__repr__`

* address feedback

* mypy
2022-09-19 18:18:32 +02:00
Daniel Bichuetti
e1f399284f
refactor: update dependencies and remove pins (#3147)
* refactor: remove azure-core, pydoc and hf-hub pins

* fix: remove extra-comma

* fix: force minimum version of azure forms recognizer

* refactor: allow newer ocr libs

* refactor: update more dependencies and container versions

* refactor: remove extra comment

* docs: pre-commit manual run

* refactor: remove unnecessary dependency

* tests: update weaviate container image version
2022-09-05 14:30:35 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db (#2749)
* update to PineconeDocumentStore to remove dependency on SQL db

* Update Documentation & Code Style

* typing fixes

* Update Documentation & Code Style

* fixed embedding generator to yield Documents

* Update Documentation & Code Style

* fixes for final typing issues

* fixes for pylint

* Update Documentation & Code Style

* uncomment pinecone tests

* added new params to docstrings

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* changes based on comments, updated errors and install

* Update Documentation & Code Style

* mypy

* implement simple filtering in pinecone mock

* typo

* typo in reverse

* account for missing meta key in filtering

* typo

* added metadata filtering to describe index

* added handling for users switching indexes in same doc store, and handling duplicate docs in write

* syntax tweaks

* added index option to document/embedding count calls

* labels implementation in progress

* added metadata fields to be indexed for pinecone tests

* further changes to mock

* WIP implementation of labels+multilabels

* switched to rely on labels namespace rather than filter

* simpler delete_labels

* label fixes, remove debug code

* Apply dostring fixes

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* pylint

* docs

* temporarily un-mock Pinecone

* Small Pinecone test suite

* pylint

* Add fake test key to pass the None check

* Add again fake test key to pass the None check

* Add Pinecone to default docstores and fix filters

* Fix field name

* Change field name

* Change field value

* Remove comments

* forgot to upgrade pyproject.toml

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
James Briggs
26c938a8e6
test: add meta fields for meta_config to be used during testing (#3021)
* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore

* Add documentation on metadata filtering in  docstring

* docs

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-12 10:27:56 +02:00
Zoltan Fedor
f4128d3581
Adding support for additional distance/similarity metrics for Weaviate (#3001)
* Adding support for additional distance metrics for Weaviate

Fixes #3000

* Updating the docs

* Fixing error texts

* Fixing issues raised by the review

* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`

* [EMPTY] Re-trigger CI

* Fixing things based on review

* [EMPTY] Re-trigger CI
2022-08-11 09:48:21 +02:00
Massimiliano Pippi
e7627c3f8b
Use opensearch-py in OpenSearchDocumentStore (#2691)
* add Opensearch extras

* let OpenSearchDocumentStore use opensearch-py

* Update Documentation & Code Style

* fix a bug found after adding tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-28 10:04:49 +02:00
Sara Zan
6b39fbd39c
Mocking Pinecone tests (#2778)
* Integrating the mock into conftest.py

* re-enable workflow

* delete_all

* Update Documentation & Code Style

* remove ValueError

* Add empty response

* wrong condition

* return response

* revert removal of delete_all

* change mock

* Update Documentation & Code Style

* test for rest api, to revert

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-14 20:03:33 +02:00
Malte Pietsch
ba08fc86f5
Add node to use OpenAI's GPT-3 for QA (#2605)
* first draft of openai node for QA

* Update Documentation & Code Style

* fix mypy. add node to inits

* Update Documentation & Code Style

* fix linter

* Adapt OpenAIGenerator to completions endpoint

* Update Documentation & Code Style

* Fix pylint

* Fix doc strings

* Make use of temperature

* Make use of api key in tests

* Adapt doc strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-07-08 13:59:27 +02:00
bogdankostic
195aed942f
Add update_document_meta to InMemoryDocumentStore (#2689)
* Add update_document_meta to InMemoryDocumentStore

* Fix typo

* Update Documentation & Code Style

* Add update_document_meta to BaseDocumentStore

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Add update_document_meta to MockDocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-07 15:44:07 +02:00
Patrick Deutschmann
1db3fd0942
Add support for Multi-Hop Dense Retrieval (#2571)
* Implement MDR

* Adapt conftest to new MDR signature

* Update Documentation & Code Style

* Change signature of queries param in batch methods of MDR like in #2575

* Update Documentation & Code Style

* Rename MultihopDenseRetriever to MultihopEmbeddingRetriever

* Fix filters in retrieve_batch

* Add docstring for MultihopEmbeddingRetriever.__init__

* Update Documentation & Code Style

* Revert forward signature of TextSimilarityHead

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-05 11:31:11 +02:00
Sara Zan
54518ac790
[CI Refactoring] Refactor Document fixtures in tests (#2577)
* Refactor document fixtures

* Add embedding files

* Update Documentation & Code Style

* Indentation issue

* Update Documentation & Code Style

* Fix type conversion in conftest.py

* Update Documentation & Code Style

* mypy on sql.py

* mypy on crawler.py

* mypy on pinecone.py

* Adapt retriever tests

* Update Documentation & Code Style

* mypy on crawler.py

* Update Documentation & Code Style

* mypy on crawler.py again

* Update Documentation & Code Style

* mypy fix was too rough

* Fix some more tests

* Update Documentation & Code Style

* Skip meaningless test on FilterRetriever

* Make embedding values less specific

* Update Documentation & Code Style

* Use stable IDs in retriever tests that depend on it

* Remove needless fixtures

* docs_with_ids

* Update Documentation & Code Style

* Typo

* Fix retriever tests

* Fix reader tests

* Update Documentation & Code Style

* Workaround #2626

* Update Documentation & Code Style

* Fix label generator tests

* Reorder vectors

* remove print

* Update Documentation & Code Style

* Update Documentation & Code Style

* git tags leftover

* Update Documentation & Code Style

* fix last failing test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 18:22:48 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring (#2576)
* Unify CI tests (from #2466)

* Update Documentation & Code Style

* Change folder names

* Fix markers list

* Remove marker 'slow', replaced with 'integration'

* Soften children check

* Start ES first so it has time to boot while Python is setup

* Run the full workflow

* Try to make pip upgrade on Windows

* Set KG tests as integration

* Update Documentation & Code Style

* typo

* faster pylint

* Make Pylint use the cache

* filter diff files for pylint

* debug pylint statement

* revert pylint changes

* Remove path from asserted log (fails on Windows)

* Skip preprocessor test on Windows

* Tackling Windows specific failures

* Fix pytest command for windows suites

* Remove \ from command

* Move poppler test into integration

* Skip opensearch test on windows

* Add tolerance in reader sas score for Windows

* Another pytorch approx

* Raise time limit for unit tests :(

* Skip poppler test on Windows CI

* Specify to pull with FF only in docs check

* temporarily run the docs check immediately

* Allow merge commit for now

* Try without fetch depth

* Accelerating test

* Accelerating test

* Add repository and ref alongside fetch-depth

* Separate out code&docs check from tests

* Use setup-python cache

* Delete custom action

* Remove the pull step in the docs check, will find a way to run on bot commits

* Add requirements.txt in .github for caching

* Actually install dependencies

* Change deps group for pylint

* Unclear why the requirements.txt is still required :/

* Fix the code check python setup

* Install all deps for pylint

* Make the autoformat check depend on tests and doc updates workflows

* Try installing dependencies in another order

* Try again to install the deps

* quoting the paths

* Ad back the requirements

* Try again to install rest_api and ui

* Change deps group

* Duplicate haystack install line

* See if the cache is the problem

* Disable also in mypy, who knows

* split the install step

* Split install step everywhere

* Revert "Separate out code&docs check from tests"

This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.

* Add back the action

* Proactive support for audio (see text2speech branch)

* Fix label generator tests

* Remove install of libsndfile1 on win temporarily

* exclude audio tests on win

* install ffmpeg for integration tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00
Vladimir Blagojevic
e10a3fba74
Add Generative Pseudo Labeling (#2388) 2022-06-02 10:12:47 -04:00
bogdankostic
61d9429c25
Simplify loading of EmbeddingRetriever (#2619)
* Infer model format for EmbeddingRetriever automatically

* Update Documentation & Code Style

* Adapt conftest to automatic inference of model_format

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Adapt tutorials

* Update Documentation & Code Style

* Add test for similarity scores with sentence transformers

* Adapt doc string and warning message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 15:05:29 +02:00
bogdankostic
867695ad0c
Change signature of queries param in batch methods (#2575)
* Change signature of queries param in batch methods

* Update Documentation & Code Style

* Fix mypy

* Remove unused import

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 12:33:45 +02:00
bogdankostic
738e008020
Add run_batch method to all nodes and Pipeline to allow batch querying (#2481)
* Add run_batch methods for batch querying

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Fix mypy

* Fix linter

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix mypy

* Fix rest api test

* Update Documentation & Code Style

* Add Doc strings

* Update Documentation & Code Style

* Add batch_size as attribute to nodes supporting batching

* Adapt error messages

* Adapt type of filters in retrievers

* Revert change about truncation_warning in summarizer

* Unify multiple_doc_lists tests

* Use smaller models in extractor tests

* Add return types to JoinAnswers and RouteDocuments

* Adapt return statements in reader's run_batch method

* Allow list of filters

* Adapt error messages

* Update Documentation & Code Style

* Fix tests

* Fix mypy

* Adapt print_questions

* Remove disabling warning about too many public methods

* Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py

* Add type check

* Update Documentation & Code Style

* Adapt tutorial 11

* Update Documentation & Code Style

* Add query_batch method for DCDocStore

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 11:11:00 +02:00
bogdankostic
4581b91e83
Make DeepsetCloudDocumentStore work with non-existing index (#2513)
* Make DeepsetCloudDocumentStore work with non-existing index

* Update Documentation & Code Style

* Add tests

* Update Documentation & Code Style

* Fix tests, adapt warning messages + lowercase deepset

* Update Documentation & Code Style

* Fix typo in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 15:21:35 +02:00
Sara Zan
f8e02310bf
Validate YAML files without loading the nodes (#2438)
* Remove BasePipeline and make a module for RayPipeline

* Can load pipelines from yaml, plenty of issues left

* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it

* Fix pipeline tests

* Move some tests out of test_pipeline.py and create MockDenseRetriever

* myoy and pylint (silencing too-many-public-methods)

* Fix issue found in some yaml files and in schema files

* Fix paths to YAML and fix some typos in Ray

* Fix eval tests

* Simplify MockDenseRetriever

* Fix Ray test

* Accidentally pushed merge coinflict, fixed

* Typo in schemas

* Typo in _json_schema.py

* Slightly reduce noisyness of version validation warnings

* Fix version logs tests

* Fix version logs tests again

* remove seemingly unused file

* Add check and test to avoid adding the same node to the pipeline twice

* Update Documentation & Code Style

* Revert config to pipeline_config

* Remo0ve unused import

* Complete reverting to pipeline_config

* Some more stray config=

* Update Documentation & Code Style

* Feedback

* Move back other_nodes tests into pipeline tests temporarily

* Update Documentation & Code Style

* Fixing tests

* Update Documentation & Code Style

* Fixing ray and standard pipeline tests

* Rename colliding load() methods in dense retrievers and faiss

* Update Documentation & Code Style

* Fix mypy on ray.py as well

* Add check for no root node

* Fix tests to use load_from_directory and load_index

* Try to workaround the disabled add_node of RayPipeline

* Update Documentation & Code Style

* Fix Ray test

* Fix FAISS tests

* Relax class check in _add_node_to_pipeline_graph

* Update Documentation & Code Style

* Try to fix mypy in ray.py

* unused import

* Try another fix for Ray

* Fix connector tests

* Update Documentation & Code Style

* Fix ray

* Update Documentation & Code Style

* use BaseComponent.load() in pipelines/base.py

* another round of feedback

* stray BaseComponent.load()

* Update Documentation & Code Style

* Fix FAISS tests too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-04 17:39:06 +02:00
Tuana Celik
e2b85e2913
Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever (#2461)
* Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever

* adding missed init file

* Update Documentation & Code Style

* fixed docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-29 10:16:02 +02:00
tstadel
7498c7c6fb
Fix and use delete_index instead of delete_documents in tests (#2453)
* use delete_index instead of delete_documents in tests

* fix delete_index

* fix  delete_index() in memory and milvus

* fix imports

* fix memory keyerrors

* Update Documentation & Code Style

* increase timeout for pinecone tests to 60 minutes

* clean get_document_store()

* use recreate_index in tests

* Update Documentation & Code Style

* fix tests

* fix remaining tests

* log index deleted

* fix test_eval_pipeline

* simplify existing index detection in weaviate

* delete label_index on recreate_index for pinecone and milvus

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 19:06:30 +02:00
Tuana Celik
d49e92e21c
ElasticsearchRetriever to BM25Retriever (#2423)
* change class names to bm25

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update Documentation & Code Style

* Add back all_terms_must_match

* fix syntax

* Update Documentation & Code Style

* Update Documentation & Code Style

* Creating a wrapper for old ES retriever with deprecated wrapper

* Update Documentation & Code Style

* New method for deprecating old ESRetriever

* New attempt for deprecating the ESRetriever

* Reverting to the simplest solution - warning logged

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-04-26 16:09:39 +02:00
Sara Zan
929c685cda
Forbid usage of *args and **kwargs in any node's __init__ (#2362)
* Add failing test

* Remove `**kwargs` from docstores' `__init__` functions (#2407)

* Remove kwargs from ESDocStore subclasses

* Remove kwargs from subclasses of SQLDocumentStore

* Remove kwargs from Weaviate

* Revert change in pinecone

* Fix tests

* Fix retriever test wirh weaviate

* Change Exception into DocumentStoreError

* Update Documentation & Code Style

* Remove `**kwargs` from `FARMReader` (#2413)

* Remove FARMReader kwargs without trying to replace them functionally

* Update Documentation & Code Style

* enforce same index values before and after saving/loading eval dataframes (#2398)

* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)

* Add tests for missing init and super

* Update Documentation & Code Style

* change in with endswith

* Move test in pipeline.py and change test in pipeline_yaml.py

* Update Documentation & Code Style

* Use caplog to test the warning

* Update Documentation & Code Style

* move tests into test_pipeline and use get_config

* Update Documentation & Code Style

* Unmock version name

* Improve variadic args test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-14 16:42:02 +02:00