226 Commits

Author SHA1 Message Date
Julian Risch
3c6fcc3e42
Bump version to next release candidate (#2627)
* bump version to next release candidate

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 18:58:44 +02:00
Julian Risch
4ca331c0a7
Bump version to v1.5.0 and copy docs folder (#2625)
* bump version to v1.5.0 and copy docs folder

* Update Documentation & Code Style

* update links to v1.5.0

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 17:20:42 +02:00
Vladimir Blagojevic
e10a3fba74
Add Generative Pseudo Labeling (#2388) 2022-06-02 10:12:47 -04:00
bogdankostic
61d9429c25
Simplify loading of EmbeddingRetriever (#2619)
* Infer model format for EmbeddingRetriever automatically

* Update Documentation & Code Style

* Adapt conftest to automatic inference of model_format

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Adapt tutorials

* Update Documentation & Code Style

* Add test for similarity scores with sentence transformers

* Adapt doc string and warning message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 15:05:29 +02:00
bogdankostic
0395533a78
Add run_batch for standard pipelines (#2595)
* Add run_batch for standard pipelines

* Update Documentation & Code Style

* Fix mypy

* Remove code duplication

* Fix linter

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-27 10:42:48 +02:00
tstadel
7caca41c5d
Support context matching in pipeline.eval() (#2482)
* calculate context pred metrics

* Update Documentation & Code Style

* extend doc_relevance_col values

* fix import order

* Update Documentation & Code Style

* fix mypy

* fix typings literal import

* add option for custom document_id_field

* Update Documentation & Code Style

* fix tests and dataframe col-order

* Update Documentation & Code Style

* rename content to context in eval dataframe

* add backward compatibility to EvaluationResult.load()

* Update Documentation & Code Style

* add docstrings

* Update Documentation & Code Style

* support sas

* Update Documentation & Code Style

* add answer_scope param

* Update Documentation & Code Style

* rework doc_relevance_col and keep document_id col in case of custom_document_id_field

* Update Documentation & Code Style

* improve docstrings

* Update Documentation & Code Style

* rename document_relevance_criterion into document_scope

* Update Documentation & Code Style

* add document_scope and answer_scope to print_eval_report

* support all new features in execute_eval_run()

* fix imports

* fix mypy

* Update Documentation & Code Style

* rename pred_label_sas_grid into pred_label_matrix

* update dataframe schema and sorting

* Update Documentation & Code Style

* pass through context_matching params and extend document_scope test

* Update Documentation & Code Style

* add answer_scope tests

* fix context_matching_threshold for document metrics

* shorten dataframe apply calls

* Update Documentation & Code Style

* fix queries getting lost if nothing was retrieved

* Update Documentation & Code Style

* Update Documentation & Code Style

* use document_id scopes

* Update Documentation & Code Style

* fix answer_scope literal

* Update Documentation & Code Style

* update the docs (lg changes)

* Update Documentation & Code Style

* update tutorial 5

* Update Documentation & Code Style

* fix tests

* Add minor lg updates

* final docstring changes

* fix single quotes in docstrings

* Update Documentation & Code Style

* dataframe scopes added for each column

* better docstrings for context_matching params

* Update Documentation & Code Style

* fix summarizer eval test

* Update Documentation & Code Style

* fix test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2022-05-24 18:11:52 +02:00
bogdankostic
867695ad0c
Change signature of queries param in batch methods (#2575)
* Change signature of queries param in batch methods

* Update Documentation & Code Style

* Fix mypy

* Remove unused import

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 12:33:45 +02:00
Julian Risch
075ed7fbcb
Remove encoding option from PDFToTextOCRConverter (#2553)
* remove encoding option from PDFToTextOCRConverter

* Update Documentation & Code Style

* add unused 'encoding' param to PDFToTextOCRConverter

* Update Documentation & Code Style

* call run instead of convert to use ligature replacing

* Update Documentation & Code Style

* add text to check installed poppler version

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 11:31:32 +02:00
dimitrisna
5bda63a6c0
Add training checkpoint in retriever trainer (#2543)
* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 09:51:26 +02:00
Agnieszka Marzec
ebd54b225b
Update Ray pipeline docs with validation info (#2590)
* Update Ray pipeline docs

* Add Sara's suggestion

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 09:29:52 +02:00
tstadel
0e83535108
Show search endpoint after deepset Cloud deployment (#2569)
* show try-out-message after deployment

* better messages

* Update Documentation & Code Style

* tests added

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-23 14:19:31 +02:00
tstadel
f6e3a63906
Prevent losing names of utilized components when loaded from config (#2525)
* Prevent losing names of utilized components when loaded from config

* Update Documentation & Code Style

* update test

* fix failing tests

* Update Documentation & Code Style

* fix even more tests

* Update Documentation & Code Style

* incorporate review feedback

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-18 14:17:54 +02:00
tstadel
110b9c2b0a
Warnings for write operations of DeepsetCloudDocumentStore (#2565)
* log inputs to write operations

* Update Documentation & Code Style

* adjust tests

* simplify by using decorator for write operation functions

* Update Documentation & Code Style

* fix comma

* fix comma in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 17:53:55 +02:00
MichelBartels
a952ba240f
Include meta data when computing embeddings in EmbeddingRetriever (#2559)
* include meta data when calculating embeddings in EmbeddingRetriever

* Update Documentation & Code Style

* fix None meta field

* remove default values

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 12:37:04 +02:00
MichelBartels
686e9d24ef
Documenting output score of JoinDocuments when using concatenation (#2561)
* add documentation regarding the score of JoinDocuments when using concatenation

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-16 18:30:07 +02:00
Agnieszka Marzec
2d03a26045
Minor lg changes (#2533)
* Minor lg change

* Update Documentation & Code Style

* Fix missing articles

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-13 16:12:22 +02:00
Agnieszka Marzec
1ae5a1449b
Update run() and run_batch() params descriptions in API (#2542)
* Update run() and run_batch() params descriptions

* Update Documentation & Code Style

* Update api params descriptions

* Update Documentation & Code Style

* Fix typo

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Add Bogdan's suggestions

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-05-13 15:11:01 +02:00
bogdankostic
738e008020
Add run_batch method to all nodes and Pipeline to allow batch querying (#2481)
* Add run_batch methods for batch querying

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Fix mypy

* Fix linter

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix mypy

* Fix rest api test

* Update Documentation & Code Style

* Add Doc strings

* Update Documentation & Code Style

* Add batch_size as attribute to nodes supporting batching

* Adapt error messages

* Adapt type of filters in retrievers

* Revert change about truncation_warning in summarizer

* Unify multiple_doc_lists tests

* Use smaller models in extractor tests

* Add return types to JoinAnswers and RouteDocuments

* Adapt return statements in reader's run_batch method

* Allow list of filters

* Adapt error messages

* Update Documentation & Code Style

* Fix tests

* Fix mypy

* Adapt print_questions

* Remove disabling warning about too many public methods

* Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py

* Add type check

* Update Documentation & Code Style

* Adapt tutorial 11

* Update Documentation & Code Style

* Add query_batch method for DCDocStore

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 11:11:00 +02:00
bogdankostic
4581b91e83
Make DeepsetCloudDocumentStore work with non-existing index (#2513)
* Make DeepsetCloudDocumentStore work with non-existing index

* Update Documentation & Code Style

* Add tests

* Update Documentation & Code Style

* Fix tests, adapt warning messages + lowercase deepset

* Update Documentation & Code Style

* Fix typo in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 15:21:35 +02:00
Branden Chan
43bfea6f3d
Add sort arg to JoinAnswers (#2436)
* Add sort arg to JoinAnswers

* Update Documentation & Code Style

* Change naming and docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 11:47:00 +02:00
Sara Zan
3d8bdf3cb6
Remove safe import from ElasticsearchDocumentStore (#2522)
* Update version to 1.4.1rc0

* Elasticsearch is not an optional dependency

* Fix import path

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-09 18:07:42 +02:00
Gabriel Altay
988568882a
fix small typo in Document doc string (#2520)
* fix small typo in Document doc string

Was going through the tutorial, then digging through the code and just noticed a small typo

* generate markdown file changes from docstrings

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-09 18:04:33 +02:00
Branden Chan
caf1336424
Adjust pydoc markdown config so methods shown with classes (#2511)
* add_member_class_prefix: true

* Update Documentation & Code Style

* Trigger redeploy

* Trigger redeploy

* Fix pydoc param

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 16:00:08 +02:00
Sara Zan
1ed407cb5a
Update version to 1.4.1rc0 (#2509)
* Update version to 1.4.1rc0

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 11:46:31 +02:00
Julian Risch
081b886aa1
Release v1.4.0 (#2502)
* delete unneeded files of last release

* add v1.4.0 docs with updated links

* upgrade version number

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 12:24:45 +02:00
Julian Risch
1418f0c603
change milvus links from 2.0.0 to 2.0.x (#2496)
* change milvus links from 2.0.0 to 2.0.x

* Update Documentation & Code Style

* fix two broken links

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 18:30:50 +02:00
Sara Zan
f8e02310bf
Validate YAML files without loading the nodes (#2438)
* Remove BasePipeline and make a module for RayPipeline

* Can load pipelines from yaml, plenty of issues left

* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it

* Fix pipeline tests

* Move some tests out of test_pipeline.py and create MockDenseRetriever

* myoy and pylint (silencing too-many-public-methods)

* Fix issue found in some yaml files and in schema files

* Fix paths to YAML and fix some typos in Ray

* Fix eval tests

* Simplify MockDenseRetriever

* Fix Ray test

* Accidentally pushed merge coinflict, fixed

* Typo in schemas

* Typo in _json_schema.py

* Slightly reduce noisyness of version validation warnings

* Fix version logs tests

* Fix version logs tests again

* remove seemingly unused file

* Add check and test to avoid adding the same node to the pipeline twice

* Update Documentation & Code Style

* Revert config to pipeline_config

* Remo0ve unused import

* Complete reverting to pipeline_config

* Some more stray config=

* Update Documentation & Code Style

* Feedback

* Move back other_nodes tests into pipeline tests temporarily

* Update Documentation & Code Style

* Fixing tests

* Update Documentation & Code Style

* Fixing ray and standard pipeline tests

* Rename colliding load() methods in dense retrievers and faiss

* Update Documentation & Code Style

* Fix mypy on ray.py as well

* Add check for no root node

* Fix tests to use load_from_directory and load_index

* Try to workaround the disabled add_node of RayPipeline

* Update Documentation & Code Style

* Fix Ray test

* Fix FAISS tests

* Relax class check in _add_node_to_pipeline_graph

* Update Documentation & Code Style

* Try to fix mypy in ray.py

* unused import

* Try another fix for Ray

* Fix connector tests

* Update Documentation & Code Style

* Fix ray

* Update Documentation & Code Style

* use BaseComponent.load() in pipelines/base.py

* another round of feedback

* stray BaseComponent.load()

* Update Documentation & Code Style

* Fix FAISS tests too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-04 17:39:06 +02:00
Sara Zan
01ea4bf21f
Change default encoding for PDFToTextConverter from Latin 1 to UTF-8 (#2420)
* Change default encoding for PDFToTextConverter

* Update Documentation & Code Style

* Improve docstring

* Update Documentation & Code Style

* Add list of ligatures to ignore and add the possibility to modify such list at need

* Add docstring

* Add tests

* Rename parameter

* Update Documentation & Code Style

* Move implementation into the base converter to make mypy happier

* Update Documentation & Code Style

* mypy and pylint

* mypy

* move encoding parameter to init of PDFToTextConverter

* Update Documentation & Code Style

* make utf8 default and fix mypy

* Update Documentation & Code Style

* Update Documentation & Code Style

* remove note on encoding in tutorial8

* Update Documentation & Code Style

* skip OCRConverter and test converter.run

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-04 17:01:45 +02:00
bogdankostic
a4e603ce87
Deprecate Milvus1DocumentStore (#2495)
* Add warning message

* Update doc string

* Update Documentation & Code Style

* Change DeprecationWarning to FutureWarning

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 15:09:57 +02:00
Julian Risch
970c476615
Align TransformersReader defaults with FARMReader (#2490)
* Align TransformersReader defaults with vFARMReader

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 10:04:18 +02:00
tstadel
7d6b3fe954
Add flag to disable scaling scores to probabilities (#2454)
* add scale_scores_to_probabilities flag

* Update Documentation & Code Style

* fix tests

* fix sql mypy

* Update Documentation & Code Style

* fix responses

* Update Documentation & Code Style

* rename to scale_score_to_probability + docstrings

* use BaseDocumentStore.score_to_probability in elasticsearch and milvus2

* Update Documentation & Code Style

* fix tests

* Update Documentation & Code Style

* add tests

* improve naming

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-02 13:35:07 +02:00
Tuana Celik
e2b85e2913
Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever (#2461)
* Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever

* adding missed init file

* Update Documentation & Code Style

* fixed docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-29 10:16:02 +02:00
Malte Pietsch
766e75370c
Update docs of DeepsetCloudDocumentStore (#2460)
* Update docs of DeepsetCloudDocumentStore

* Update Documentation & Code Style

* Update docstring

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>

* Update Documentation & Code Style

* move DEFAULT_API_ENDPOINT

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-04-27 19:40:39 +02:00
tstadel
7498c7c6fb
Fix and use delete_index instead of delete_documents in tests (#2453)
* use delete_index instead of delete_documents in tests

* fix delete_index

* fix  delete_index() in memory and milvus

* fix imports

* fix memory keyerrors

* Update Documentation & Code Style

* increase timeout for pinecone tests to 60 minutes

* clean get_document_store()

* use recreate_index in tests

* Update Documentation & Code Style

* fix tests

* fix remaining tests

* log index deleted

* fix test_eval_pipeline

* simplify existing index detection in weaviate

* delete label_index on recreate_index for pinecone and milvus

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 19:06:30 +02:00
Tuana Celik
d49e92e21c
ElasticsearchRetriever to BM25Retriever (#2423)
* change class names to bm25

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update Documentation & Code Style

* Add back all_terms_must_match

* fix syntax

* Update Documentation & Code Style

* Update Documentation & Code Style

* Creating a wrapper for old ES retriever with deprecated wrapper

* Update Documentation & Code Style

* New method for deprecating old ESRetriever

* New attempt for deprecating the ESRetriever

* Reverting to the simplest solution - warning logged

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-04-26 16:09:39 +02:00
tstadel
60ff46e4e1
Log evaluation results to MLflow (#2337)
* track eval results in mlflow

* Update Documentation & Code Style

* add pipeline.yaml and environment info

* improve logging to mlflow

* Update Documentation & Code Style

* introduce ExperimentTracker

* Update Documentation & Code Style

* move modeling.utils.logger to utils.experiment_tracking

* renaming: tracker and TrackingHead

* Update Documentation & Code Style

* refactor env tracking

* fix pylint findings

* Update Documentation & Code Style

* rename MLFlowTrackingHead to MLflowTrackingHead

* implement dataset hash

* Update Documentation & Code Style

* set docstrings

* Update Documentation & Code Style

* introduce PipelineBundle and Corpus

* Update Documentation & Code Style

* support reusing index

* Update Documentation & Code Style

* rename Corpus to FileCorpus

* fix Corpus -> FileCorpus

* Update Documentation & Code Style

* resolve cyclic dependencies

* fix linter issues

* Update Documentation & Code Style

* remove helper classes

* Update Documentation & Code Style

* fix imports

* fix another unused import

* update docstrings

* Update Documentation & Code Style

* simplify usage of experiment tracking tools

* fix Literal import

* revert schema changes

* Update Documentation & Code Style

* always end run

* Update Documentation & Code Style

* fix mypy issue

* rename to execute_eval_run

* Update Documentation & Code Style

* fix merge of get_or_create_env_meta_data

* improve docstrings

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-25 20:14:48 +02:00
Adrien Wald
c401e86099
Use ElasticsearchDocumentStore.get_all_documents in ElasticsearchFilterOnlyRetriever.retrieve (#2151)
* use get_all_documents in ElasticsearchFilterOnlyRetriever.retrieve

* Update Documentation & Code Style

* add test case for es_filter_only retriever

* Update Documentation & Code Style

* fix test by adding empty string for query

* Update Documentation & Code Style

* add explicit name of argument "query"

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-04-25 09:53:48 +02:00
tstadel
25475a68c7
Match answer sorting in QuestionAnsweringHead with FARMReader (#2414)
* match no_answer confidence

* Update Documentation & Code Style

* test added

* Update Documentation & Code Style

* fix tests

* Update Documentation & Code Style

* apply penalties of scores to confidences too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-21 11:24:39 +02:00
Sara Zan
07d7ecbff1
Make python-magic fully optional (#2412)
* Add windows specific package for python-magic

* Disable some tests on Windows and add explanatory warning in case of issues with libmagic

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-20 09:18:02 +02:00
tstadel
e862400256
Prevent Stackoverflow on Windows CI (#2426)
* prevent stackoverflow on windows ci

* Update Documentation & Code Style

* fix is_windows condition

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-04-19 16:10:39 +02:00
Sara Zan
4eec2dc45e
Change YAML version exception into a warning (#2385)
* Change exception into warning, add strict_version param, and remove compatibility between schemas

* Simplify update_json_schema

* Rename unstable into master

* Prevent validate_config from changing the config to validate

* Fix version validation and add tests

* Rename master into ignore

* Complete parameter rename

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 16:08:08 +02:00
Sara Zan
ba9c976bfe
Update pdftotext link (#2432)
* Update pdftotext link

* Update Documentation & Code Style

* Update Tutorial8_Preprocessing.ipynb

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 14:30:18 +02:00
Sara Zan
929c685cda
Forbid usage of *args and **kwargs in any node's __init__ (#2362)
* Add failing test

* Remove `**kwargs` from docstores' `__init__` functions (#2407)

* Remove kwargs from ESDocStore subclasses

* Remove kwargs from subclasses of SQLDocumentStore

* Remove kwargs from Weaviate

* Revert change in pinecone

* Fix tests

* Fix retriever test wirh weaviate

* Change Exception into DocumentStoreError

* Update Documentation & Code Style

* Remove `**kwargs` from `FARMReader` (#2413)

* Remove FARMReader kwargs without trying to replace them functionally

* Update Documentation & Code Style

* enforce same index values before and after saving/loading eval dataframes (#2398)

* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)

* Add tests for missing init and super

* Update Documentation & Code Style

* change in with endswith

* Move test in pipeline.py and change test in pipeline_yaml.py

* Update Documentation & Code Style

* Use caplog to test the warning

* Update Documentation & Code Style

* move tests into test_pipeline and use get_config

* Update Documentation & Code Style

* Unmock version name

* Improve variadic args test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-14 16:42:02 +02:00
Sara Zan
96a538b182
Pylint (import related warnings) and REST API improvements (#2326)
* remove duplicate imports

* fix ungrouped-imports

* Fix wrong-import-position

* Fix unused-import

* pyproject.toml

* Working on wrong-import-order

* Solve wrong-import-order

* fix Pool import

* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py

* remove Converter from modeling

* Fix mypy issues on adaptive_model.py

* create es_converter.py

* remove converter import

* change import path in tests

* Restructure REST API to not rely on global vars from search.apy and improve tests

* Fix openapi generator

* Move variable initialization

* Change type of FilterRequest.filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 16:41:05 +02:00
Sara Zan
4862bbcd73
Add devices alongside use_gpu in FARMReader (#2294)
* Make initialize_device_settings take a devices list, and change signature of FARMReader

* reintroduce use_gpu and propagate devices to other methods

* fix typing for initialize_device_settings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 14:21:25 +02:00
tstadel
8342a6c1d6
Fix eval discrepancies (#2381)
* fix eval discrepancies

* Update Documentation & Code Style

* fix reader eval comparison

* Update Documentation & Code Style

* slightly improve messed up top_n_f1 func

* add no_answer hint to reader.eval metrics

* fix tut5

* Update Documentation & Code Style

* correct doc_relevance_col in tests

* Update Documentation & Code Style

* redefine recall metrics for no_answers

* fix bugs in EvalAnswers

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 09:24:22 +02:00
Sara Zan
ae712fe6bf
Upgrade weaviate-client to 3.3.3 and fix get_all_documents (#1895)
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents

* Add type

* Update Weaviate version on the CI

* Fix bug on get_document_count where there are no documents

* Add more info in the docstrings of get_all_documents and get_all_documents_generator

* Add latest docstring and tutorial changes

* Apply Black

* Update Documentation & Code Style

* Trigger pipeline

* Update Documentation & Code Style

* Include StefanBogdan feedback

* Fix mypy issues and LogicalFilterClause

* Add more types

* Update Documentation & Code Style

* update setup.cfg

* Upgrade weaviate containers too

* Allow to filter for content field in Weaviate

* Use convert_to_weaviate instead of convert_to_pinecone

* Fix _get_all_documents_in_index

* Update docstrings and docs

* Catching an exception in get_document(s)_by_id

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-04-01 15:37:34 +03:00
Timo Moeller
3459020600
Add confidence filtering to FARMReader (#2376)
Add confidence filtering to FARMReader
2022-03-31 15:18:05 +02:00
MichelBartels
fc1cb63bcc
Fix RouteDocuments documentation (#2380)
* fix RouteDocuments documentation

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-31 11:45:02 +02:00
Florian Hardow
a273c3a51d
EvaluationSetClient for deepset cloud to fetch evaluation sets and la… (#2345)
* EvaluationSetClient for deepset cloud to fetch evaluation sets and labels for one specific evaluation set

* make DeepsetCloudDocumentStore able to fetch uploaded evaluation set names

* fix missing renaming of get_evaluation_set_names in DeepsetCloudDocumentStore

* update documentation for evaluation set functionality in deepset cloud document store

* DeepsetCloudDocumentStore tests for evaluation set functionality

* rename index to evaluation_set_name for DeepsetCloudDocumentStore evaluation set functionality

* raise DeepsetCloudError when no labels were found for evaluation set

* make use of .get_with_auto_paging in EvaluationSetClient

* Return result of get_with_auto_paging() as it parses the response already

* Make schema import source more specific

* fetch all evaluation sets for a workspace in deepset Cloud

* Rename evaluation_set_name to label_index

* make use of generator functionality for fetching labels

* Update Documentation & Code Style

* Adjust function input for DeepsetCloudDocumentStore.get_all_labels, adjust tests for it, fix typos, make linter happy

* Match error message with pytest.raises

* Update Documentation & Code Style

* DeepsetCloudDocumentStore.get_labels_count raises DeepsetCloudError when no evaluation set was found to count labels on

* remove unneeded import in tests

* DeepsetCloudDocumentStore tests, make reponse bodies a string through json.dumps

* DeepsetcloudDocumentStore.get_label_count - move raise to return

* stringify uuid before json.dump as uuid is not serilizable

* DeepsetcloudDocumentStore - adjust response mocking in tests

* DeepsetcloudDocumentStore - json dump response body in test

* DeepsetCloudDocumentStore introduce label_index, EvaluationSetClient rename label_index to evaluation_set

* Update Documentation & Code Style

* DeepsetCloudDocumentStore rename evaluation_set to evaluation_set_response as there is a name clash with the input variable

* DeepsetCloudDocumentStore - rename missed variable in test

* DeepsetCloudDocumentStore - rename missed label_index to index in doc string, rename label_index to evaluation_set in EvaluationSetClient

* Update Documentation & Code Style

* DeepsetCloudDocumentStore - update docstrings for EvaluationSetClient

* DeepsetCloudDocumentStore - fix typo in doc string

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-31 08:59:58 +02:00