1491 Commits

Author SHA1 Message Date
bogdankostic
738e008020
Add run_batch method to all nodes and Pipeline to allow batch querying (#2481)
* Add run_batch methods for batch querying

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Fix mypy

* Fix linter

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix mypy

* Fix rest api test

* Update Documentation & Code Style

* Add Doc strings

* Update Documentation & Code Style

* Add batch_size as attribute to nodes supporting batching

* Adapt error messages

* Adapt type of filters in retrievers

* Revert change about truncation_warning in summarizer

* Unify multiple_doc_lists tests

* Use smaller models in extractor tests

* Add return types to JoinAnswers and RouteDocuments

* Adapt return statements in reader's run_batch method

* Allow list of filters

* Adapt error messages

* Update Documentation & Code Style

* Fix tests

* Fix mypy

* Adapt print_questions

* Remove disabling warning about too many public methods

* Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py

* Add type check

* Update Documentation & Code Style

* Adapt tutorial 11

* Update Documentation & Code Style

* Add query_batch method for DCDocStore

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 11:11:00 +02:00
bogdankostic
5378a9ab48
Fix tutorials 4, 7 and 8 (#2526)
* Fix tutorials 4, 7 and 8

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 09:17:05 +02:00
bogdankostic
4581b91e83
Make DeepsetCloudDocumentStore work with non-existing index (#2513)
* Make DeepsetCloudDocumentStore work with non-existing index

* Update Documentation & Code Style

* Add tests

* Update Documentation & Code Style

* Fix tests, adapt warning messages + lowercase deepset

* Update Documentation & Code Style

* Fix typo in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 15:21:35 +02:00
Massimiliano Pippi
7595bb49ab
rearrange contributing guidelines (#2515)
* rearrange contributing guidelines

* revert unneeded change to README
2022-05-10 14:59:46 +02:00
Branden Chan
43bfea6f3d
Add sort arg to JoinAnswers (#2436)
* Add sort arg to JoinAnswers

* Update Documentation & Code Style

* Change naming and docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 11:47:00 +02:00
Sara Zan
15a9ff6f67
PR template mention of enabling Actions (#2523)
* Update version to 1.4.1rc0

* Add hint of enabling action on the fork in the PR template

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 09:46:09 +02:00
Sara Zan
3d8bdf3cb6
Remove safe import from ElasticsearchDocumentStore (#2522)
* Update version to 1.4.1rc0

* Elasticsearch is not an optional dependency

* Fix import path

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-09 18:07:42 +02:00
Gabriel Altay
988568882a
fix small typo in Document doc string (#2520)
* fix small typo in Document doc string

Was going through the tutorial, then digging through the code and just noticed a small typo

* generate markdown file changes from docstrings

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-09 18:04:33 +02:00
bogdankostic
bce84577c6
Upgrade transformers version to 4.18.0 (#2514)
* Upgrade transformers version to 4.18.0

* Adapt tokenization test to upgrade

* Adapt tokenization test to upgrade
2022-05-06 16:57:13 +02:00
Branden Chan
caf1336424
Adjust pydoc markdown config so methods shown with classes (#2511)
* add_member_class_prefix: true

* Update Documentation & Code Style

* Trigger redeploy

* Trigger redeploy

* Fix pydoc param

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 16:00:08 +02:00
Sara Zan
1ed407cb5a
Update version to 1.4.1rc0 (#2509)
* Update version to 1.4.1rc0

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 11:46:31 +02:00
Julian Risch
081b886aa1
Release v1.4.0 (#2502)
* delete unneeded files of last release

* add v1.4.0 docs with updated links

* upgrade version number

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.4.0
2022-05-05 12:24:45 +02:00
Sara Zan
f3e0ba4be9
Fix OpenSearchDocumentStore's __init__ (#2498)
* Move super in OpenSearchDocumentStore and add small test

* Update Documentation & Code Style

* Add Opensearch container to the CI

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:38:09 +02:00
MichelBartels
c7e39e5225
Replace TableTextRetriever with EmbeddingRetriever in Tutorial 15 (#2479)
* replace TableTextRetriever with EmbeddingRetriever in Tutorial 15

* Update Documentation & Code Style

* fix bug

* Update Documentation & Code Style

* update tutorial 15 outputs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-212.eu-west-1.compute.internal>
2022-05-05 10:12:44 +02:00
MichelBartels
5d98810a17
Raise error if torch-scatter is not installed or wrong version is installed (#2486)
* automatically download correct torch-scatter version

* raise error if torch-scatter is not installed

* Update Documentation & Code Style

* catch all import errors and fix linter

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:12:10 +02:00
Julian Risch
1418f0c603
change milvus links from 2.0.0 to 2.0.x (#2496)
* change milvus links from 2.0.0 to 2.0.x

* Update Documentation & Code Style

* fix two broken links

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 18:30:50 +02:00
Julian Risch
fa277bcea8
Upgrade xpdf to 4.04 in Exception text (#2488)
* Upgrade xpdf to 4.04 in Exception text

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 17:42:32 +02:00
Sara Zan
f8e02310bf
Validate YAML files without loading the nodes (#2438)
* Remove BasePipeline and make a module for RayPipeline

* Can load pipelines from yaml, plenty of issues left

* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it

* Fix pipeline tests

* Move some tests out of test_pipeline.py and create MockDenseRetriever

* myoy and pylint (silencing too-many-public-methods)

* Fix issue found in some yaml files and in schema files

* Fix paths to YAML and fix some typos in Ray

* Fix eval tests

* Simplify MockDenseRetriever

* Fix Ray test

* Accidentally pushed merge coinflict, fixed

* Typo in schemas

* Typo in _json_schema.py

* Slightly reduce noisyness of version validation warnings

* Fix version logs tests

* Fix version logs tests again

* remove seemingly unused file

* Add check and test to avoid adding the same node to the pipeline twice

* Update Documentation & Code Style

* Revert config to pipeline_config

* Remo0ve unused import

* Complete reverting to pipeline_config

* Some more stray config=

* Update Documentation & Code Style

* Feedback

* Move back other_nodes tests into pipeline tests temporarily

* Update Documentation & Code Style

* Fixing tests

* Update Documentation & Code Style

* Fixing ray and standard pipeline tests

* Rename colliding load() methods in dense retrievers and faiss

* Update Documentation & Code Style

* Fix mypy on ray.py as well

* Add check for no root node

* Fix tests to use load_from_directory and load_index

* Try to workaround the disabled add_node of RayPipeline

* Update Documentation & Code Style

* Fix Ray test

* Fix FAISS tests

* Relax class check in _add_node_to_pipeline_graph

* Update Documentation & Code Style

* Try to fix mypy in ray.py

* unused import

* Try another fix for Ray

* Fix connector tests

* Update Documentation & Code Style

* Fix ray

* Update Documentation & Code Style

* use BaseComponent.load() in pipelines/base.py

* another round of feedback

* stray BaseComponent.load()

* Update Documentation & Code Style

* Fix FAISS tests too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-04 17:39:06 +02:00
Sara Zan
01ea4bf21f
Change default encoding for PDFToTextConverter from Latin 1 to UTF-8 (#2420)
* Change default encoding for PDFToTextConverter

* Update Documentation & Code Style

* Improve docstring

* Update Documentation & Code Style

* Add list of ligatures to ignore and add the possibility to modify such list at need

* Add docstring

* Add tests

* Rename parameter

* Update Documentation & Code Style

* Move implementation into the base converter to make mypy happier

* Update Documentation & Code Style

* mypy and pylint

* mypy

* move encoding parameter to init of PDFToTextConverter

* Update Documentation & Code Style

* make utf8 default and fix mypy

* Update Documentation & Code Style

* Update Documentation & Code Style

* remove note on encoding in tutorial8

* Update Documentation & Code Style

* skip OCRConverter and test converter.run

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-04 17:01:45 +02:00
bogdankostic
a4e603ce87
Deprecate Milvus1DocumentStore (#2495)
* Add warning message

* Update doc string

* Update Documentation & Code Style

* Change DeprecationWarning to FutureWarning

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 15:09:57 +02:00
Julian Risch
970c476615
Align TransformersReader defaults with FARMReader (#2490)
* Align TransformersReader defaults with vFARMReader

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 10:04:18 +02:00
James Briggs
a0bf34036f
fix dot_product metric in pinecone (#2494) 2022-05-04 09:24:10 +02:00
Ahmed Nabil
9cdd719a6d
Update xpdfreader package installation (#2491)
This Update will fix this exception `Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite. ` Now, converting PDFs wouldn't have any issues.
2022-05-03 18:09:41 +02:00
Tuana Celik
b6e369d1ca
changing the name of the retrievers from es_retriever to retriever (#2487)
* changing the name of the retrievers from es_retriever to retriever

* Update Documentation & Code Style

* name fix 2

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-03 18:08:23 +02:00
tstadel
509944f47d
Add support for positional args in pipeline.get_config() (#2478)
* add support for positional args in pipeline.get_config()

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-02 14:41:07 +02:00
tstadel
7d6b3fe954
Add flag to disable scaling scores to probabilities (#2454)
* add scale_scores_to_probabilities flag

* Update Documentation & Code Style

* fix tests

* fix sql mypy

* Update Documentation & Code Style

* fix responses

* Update Documentation & Code Style

* rename to scale_score_to_probability + docstrings

* use BaseDocumentStore.score_to_probability in elasticsearch and milvus2

* Update Documentation & Code Style

* fix tests

* Update Documentation & Code Style

* add tests

* improve naming

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-02 13:35:07 +02:00
tstadel
32ee271225
fix reader.eval and reader.eval_on_file output (#2476) 2022-04-29 13:52:24 +02:00
Tuana Celik
e2b85e2913
Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever (#2461)
* Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever

* adding missed init file

* Update Documentation & Code Style

* fixed docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-29 10:16:02 +02:00
MichelBartels
0a80cc452c
Linearize tables in EmbeddingRetriever (#2462)
* Linearize tables in EmbeddingRetriever

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* simplify table linearization method + make it private

* Update Documentation & Code Style

* fix typing

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-28 12:06:59 +02:00
Jonathan Gallon
25b87e8cf0
Add support for aliases in elasticsearch document store (#2448)
* Add support for aliases in elasticsearch document store

* Add alias support for OpenSearch

* Missing variable index

* Update Documentation & Code Style

* Add unit test for elasticsearch alias support

* Fix unit test when index is not compatible with haystack

* Fix auto format conflict

* Add comment explaining for loop for alias

* Update Documentation & Code Style

Co-authored-by: Jonathan Gallon <jonathan.gallon@totalenergies.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-04-28 10:10:37 +02:00
tstadel
2a44840f82
Ignore mypy issues regarding files param of requests.post (#2468)
* ignore mypy issues regarding files param of requests.post

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-28 08:36:29 +02:00
Malte Pietsch
766e75370c
Update docs of DeepsetCloudDocumentStore (#2460)
* Update docs of DeepsetCloudDocumentStore

* Update Documentation & Code Style

* Update docstring

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>

* Update Documentation & Code Style

* move DEFAULT_API_ENDPOINT

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-04-27 19:40:39 +02:00
tstadel
7a1e0bb7bc
rename dataset to evaluation_set when logging to mlflow (#2457) 2022-04-26 20:28:48 +02:00
tstadel
cc1bb9ad73
Disable telemetry logs per default (#2463) 2022-04-26 20:28:12 +02:00
tstadel
7498c7c6fb
Fix and use delete_index instead of delete_documents in tests (#2453)
* use delete_index instead of delete_documents in tests

* fix delete_index

* fix  delete_index() in memory and milvus

* fix imports

* fix memory keyerrors

* Update Documentation & Code Style

* increase timeout for pinecone tests to 60 minutes

* clean get_document_store()

* use recreate_index in tests

* Update Documentation & Code Style

* fix tests

* fix remaining tests

* log index deleted

* fix test_eval_pipeline

* simplify existing index detection in weaviate

* delete label_index on recreate_index for pinecone and milvus

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 19:06:30 +02:00
Tuana Celik
d49e92e21c
ElasticsearchRetriever to BM25Retriever (#2423)
* change class names to bm25

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update Documentation & Code Style

* Add back all_terms_must_match

* fix syntax

* Update Documentation & Code Style

* Update Documentation & Code Style

* Creating a wrapper for old ES retriever with deprecated wrapper

* Update Documentation & Code Style

* New method for deprecating old ESRetriever

* New attempt for deprecating the ESRetriever

* Reverting to the simplest solution - warning logged

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-04-26 16:09:39 +02:00
Sara Zan
34bca2ba83
Make sure that debug=True and params={'debug': True} behaves the same way (#2442)
* Make sure that debug=True and params={'debug': True} behaves the same way

* Update Documentation & Code Style

* Account for the possibility of node_input being None

* Fix condition

* Avoid situation where params=None

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 09:24:19 +02:00
tstadel
60ff46e4e1
Log evaluation results to MLflow (#2337)
* track eval results in mlflow

* Update Documentation & Code Style

* add pipeline.yaml and environment info

* improve logging to mlflow

* Update Documentation & Code Style

* introduce ExperimentTracker

* Update Documentation & Code Style

* move modeling.utils.logger to utils.experiment_tracking

* renaming: tracker and TrackingHead

* Update Documentation & Code Style

* refactor env tracking

* fix pylint findings

* Update Documentation & Code Style

* rename MLFlowTrackingHead to MLflowTrackingHead

* implement dataset hash

* Update Documentation & Code Style

* set docstrings

* Update Documentation & Code Style

* introduce PipelineBundle and Corpus

* Update Documentation & Code Style

* support reusing index

* Update Documentation & Code Style

* rename Corpus to FileCorpus

* fix Corpus -> FileCorpus

* Update Documentation & Code Style

* resolve cyclic dependencies

* fix linter issues

* Update Documentation & Code Style

* remove helper classes

* Update Documentation & Code Style

* fix imports

* fix another unused import

* update docstrings

* Update Documentation & Code Style

* simplify usage of experiment tracking tools

* fix Literal import

* revert schema changes

* Update Documentation & Code Style

* always end run

* Update Documentation & Code Style

* fix mypy issue

* rename to execute_eval_run

* Update Documentation & Code Style

* fix merge of get_or_create_env_meta_data

* improve docstrings

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-25 20:14:48 +02:00
Adrien Wald
c401e86099
Use ElasticsearchDocumentStore.get_all_documents in ElasticsearchFilterOnlyRetriever.retrieve (#2151)
* use get_all_documents in ElasticsearchFilterOnlyRetriever.retrieve

* Update Documentation & Code Style

* add test case for es_filter_only retriever

* Update Documentation & Code Style

* fix test by adding empty string for query

* Update Documentation & Code Style

* add explicit name of argument "query"

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-04-25 09:53:48 +02:00
tstadel
25475a68c7
Match answer sorting in QuestionAnsweringHead with FARMReader (#2414)
* match no_answer confidence

* Update Documentation & Code Style

* test added

* Update Documentation & Code Style

* fix tests

* Update Documentation & Code Style

* apply penalties of scores to confidences too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-21 11:24:39 +02:00
Malte Pietsch
4bf470286b
Upgrade xpdf to 4.0.4 (#2443)
* Update minimal gpu docker image to xpdf 4.0.4

* Update Dockerfile-GPU

* Update Dockerfile

* Update Dockerfile-GPU

* Update Dockerfile-GPU-minimal
2022-04-21 10:27:56 +02:00
Malte Pietsch
133a76229b
Add info about execution env to minimal GPU image (#2441) 2022-04-21 08:30:42 +02:00
Sara Zan
07d7ecbff1
Make python-magic fully optional (#2412)
* Add windows specific package for python-magic

* Disable some tests on Windows and add explanatory warning in case of issues with libmagic

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-20 09:18:02 +02:00
tstadel
e862400256
Prevent Stackoverflow on Windows CI (#2426)
* prevent stackoverflow on windows ci

* Update Documentation & Code Style

* fix is_windows condition

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-04-19 16:10:39 +02:00
Sara Zan
4eec2dc45e
Change YAML version exception into a warning (#2385)
* Change exception into warning, add strict_version param, and remove compatibility between schemas

* Simplify update_json_schema

* Rename unstable into master

* Prevent validate_config from changing the config to validate

* Fix version validation and add tests

* Rename master into ignore

* Complete parameter rename

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 16:08:08 +02:00
Sara Zan
8abf11fbd3
Update pdftotext also on pinecone and milvus1 CI jobs (#2433)
* Upgrade pdftotext also on pinecone and milvus1 jobs

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 16:06:27 +02:00
Sara Zan
ba9c976bfe
Update pdftotext link (#2432)
* Update pdftotext link

* Update Documentation & Code Style

* Update Tutorial8_Preprocessing.ipynb

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 14:30:18 +02:00
Sara Zan
929c685cda
Forbid usage of *args and **kwargs in any node's __init__ (#2362)
* Add failing test

* Remove `**kwargs` from docstores' `__init__` functions (#2407)

* Remove kwargs from ESDocStore subclasses

* Remove kwargs from subclasses of SQLDocumentStore

* Remove kwargs from Weaviate

* Revert change in pinecone

* Fix tests

* Fix retriever test wirh weaviate

* Change Exception into DocumentStoreError

* Update Documentation & Code Style

* Remove `**kwargs` from `FARMReader` (#2413)

* Remove FARMReader kwargs without trying to replace them functionally

* Update Documentation & Code Style

* enforce same index values before and after saving/loading eval dataframes (#2398)

* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)

* Add tests for missing init and super

* Update Documentation & Code Style

* change in with endswith

* Move test in pipeline.py and change test in pipeline_yaml.py

* Update Documentation & Code Style

* Use caplog to test the warning

* Update Documentation & Code Style

* move tests into test_pipeline and use get_config

* Update Documentation & Code Style

* Unmock version name

* Improve variadic args test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-14 16:42:02 +02:00
tstadel
46a50fb979
Fix python-magic making Windows CI stuck (#2425)
* revert python-magic PR #2330

* Revert "revert python-magic PR #2330"

This reverts commit 23fa2cc836e36daecd9e77d340dde6e32e25c82b.

* remove python-magic dep

* use python-magic-bin only

* add comment about python-magic-bin
2022-04-14 16:08:55 +02:00
Sebastian
3d42b70fbb
Added macos version of xpdf in tutorial 8 (#2424)
* Added macos version of xpdf in tutorial 8

* mini-error
2022-04-14 15:31:40 +02:00