1461 Commits

Author SHA1 Message Date
Massimiliano Pippi
a9a4156731
[Weaviate] Exit the while loop when we query less documents than available (#2537)
* exit the while loop when we query less documents than available in Weaviate

* use monkeypatch fixture, remove unused markers

* we know key is there, use brackets to get the value

* use custom exception

* add warning message when we hit the QUERY_MAXIMUM_RESULTS problem

* restore pytest marker

* removed unused import

* make the warning message more clear
2022-05-20 09:07:03 +02:00
Sara Zan
fd2ca359fe
Validation for Ray pipelines (#2545)
* Ray pipelines now validate

* Update Documentation & Code Style

* rename Ray pipeline in tests

* Add extras:ray to the test pipeline

* pylint

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-19 19:40:03 +02:00
Sara Zan
89bb1ca139
[CI refactoring] Improve autoformat.yml (#2556)
* Restructure autoformat to run a single script

* Reduce diff for autoforma.yml

* Reduce diff on linux_ci.yml
2022-05-18 20:02:43 +02:00
tstadel
f6e3a63906
Prevent losing names of utilized components when loaded from config (#2525)
* Prevent losing names of utilized components when loaded from config

* Update Documentation & Code Style

* update test

* fix failing tests

* Update Documentation & Code Style

* fix even more tests

* Update Documentation & Code Style

* incorporate review feedback

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-18 14:17:54 +02:00
tstadel
110b9c2b0a
Warnings for write operations of DeepsetCloudDocumentStore (#2565)
* log inputs to write operations

* Update Documentation & Code Style

* adjust tests

* simplify by using decorator for write operation functions

* Update Documentation & Code Style

* fix comma

* fix comma in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 17:53:55 +02:00
Stefano Fiorucci
686a19b35d
added launch_tika method (#2567)
* added launch_tika method

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 17:53:04 +02:00
Julian Risch
5a1e98e3ff
Update scriptrunner module path for streamlit ui (#2566)
* Pin streamlit version to <1.9.0

* update scriptrunner module path for streamlit ui
2022-05-17 16:06:44 +02:00
Julian Risch
70ca1e9fc6
Smaller demo instance type (#2564)
This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge.
g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory
p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory
which results in 75% lower costs with g4dn.2xlarge.

I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.
2022-05-17 12:47:15 +02:00
MichelBartels
a952ba240f
Include meta data when computing embeddings in EmbeddingRetriever (#2559)
* include meta data when calculating embeddings in EmbeddingRetriever

* Update Documentation & Code Style

* fix None meta field

* remove default values

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 12:37:04 +02:00
Sara Zan
ff4303c51b
[CI refactoring] Categorize tests into folders (#2554)
* Categorize tests into folders

* Fix linux_ci.yml and an import

* Wrong path
2022-05-17 09:55:53 +01:00
Sara Zan
81223f8cd1
[CI refactoring] Avoid ray==1.12.0 on Windows (#2562)
* Avoid ray 1.12.0 on windows (bug)

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 09:55:16 +01:00
MichelBartels
686e9d24ef
Documenting output score of JoinDocuments when using concatenation (#2561)
* add documentation regarding the score of JoinDocuments when using concatenation

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-16 18:30:07 +02:00
Ivan Lopez
a2a99f79b1
Fix docker image tag with semantic version for releases (#2548)
* Fix docker tag with semantic version for releases

* Prepend latest docker tag with tagprefix in cache-from
2022-05-16 13:26:33 +02:00
ClaMnc
2b11981b08
set top_k to 5 in SAS to be consistent (#2550)
* set top_k to 5 in SAS to be consistent

* set top_k to 5 in SAS to be consistent
2022-05-16 10:29:03 +02:00
Sara Zan
00aa1f41d7
convert_files_to_docs typo (#2546) 2022-05-13 16:38:43 +02:00
Agnieszka Marzec
2d03a26045
Minor lg changes (#2533)
* Minor lg change

* Update Documentation & Code Style

* Fix missing articles

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-13 16:12:22 +02:00
Agnieszka Marzec
1ae5a1449b
Update run() and run_batch() params descriptions in API (#2542)
* Update run() and run_batch() params descriptions

* Update Documentation & Code Style

* Update api params descriptions

* Update Documentation & Code Style

* Fix typo

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Add Bogdan's suggestions

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-05-13 15:11:01 +02:00
bogdankostic
300ee1ac83
Upgrade torch version to 1.11 (#2538)
* Bump torch version

* Upgrade torch version in torch-scatter
2022-05-13 14:45:53 +02:00
MichelBartels
4f22942cb0
Handle transformers pipeline flattening lists of length 1 (#2531)
* Handle transformers pipeline flattening lists of length 1

* Consider case where only one document is passed

* Change position of fix

* Make use of top_k_per_candidate in predict method

* Fix predict method

Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-05-12 16:11:38 +02:00
tstadel
771ed0bb1d
Remove wrong retriever top_1 metrics from print_eval_report (#2510)
* remove wrong retriever top_1 metrics

* Update Documentation & Code Style

* don't show wrong examples frame when n_wrong_examples is 0

* Update Documentation & Code Style

* Update Documentation & Code Style

* only use farm reader during eval tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-12 12:34:11 +02:00
bogdankostic
738e008020
Add run_batch method to all nodes and Pipeline to allow batch querying (#2481)
* Add run_batch methods for batch querying

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Fix mypy

* Fix linter

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix mypy

* Fix rest api test

* Update Documentation & Code Style

* Add Doc strings

* Update Documentation & Code Style

* Add batch_size as attribute to nodes supporting batching

* Adapt error messages

* Adapt type of filters in retrievers

* Revert change about truncation_warning in summarizer

* Unify multiple_doc_lists tests

* Use smaller models in extractor tests

* Add return types to JoinAnswers and RouteDocuments

* Adapt return statements in reader's run_batch method

* Allow list of filters

* Adapt error messages

* Update Documentation & Code Style

* Fix tests

* Fix mypy

* Adapt print_questions

* Remove disabling warning about too many public methods

* Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py

* Add type check

* Update Documentation & Code Style

* Adapt tutorial 11

* Update Documentation & Code Style

* Add query_batch method for DCDocStore

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 11:11:00 +02:00
bogdankostic
5378a9ab48
Fix tutorials 4, 7 and 8 (#2526)
* Fix tutorials 4, 7 and 8

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 09:17:05 +02:00
bogdankostic
4581b91e83
Make DeepsetCloudDocumentStore work with non-existing index (#2513)
* Make DeepsetCloudDocumentStore work with non-existing index

* Update Documentation & Code Style

* Add tests

* Update Documentation & Code Style

* Fix tests, adapt warning messages + lowercase deepset

* Update Documentation & Code Style

* Fix typo in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 15:21:35 +02:00
Massimiliano Pippi
7595bb49ab
rearrange contributing guidelines (#2515)
* rearrange contributing guidelines

* revert unneeded change to README
2022-05-10 14:59:46 +02:00
Branden Chan
43bfea6f3d
Add sort arg to JoinAnswers (#2436)
* Add sort arg to JoinAnswers

* Update Documentation & Code Style

* Change naming and docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 11:47:00 +02:00
Sara Zan
15a9ff6f67
PR template mention of enabling Actions (#2523)
* Update version to 1.4.1rc0

* Add hint of enabling action on the fork in the PR template

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 09:46:09 +02:00
Sara Zan
3d8bdf3cb6
Remove safe import from ElasticsearchDocumentStore (#2522)
* Update version to 1.4.1rc0

* Elasticsearch is not an optional dependency

* Fix import path

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-09 18:07:42 +02:00
Gabriel Altay
988568882a
fix small typo in Document doc string (#2520)
* fix small typo in Document doc string

Was going through the tutorial, then digging through the code and just noticed a small typo

* generate markdown file changes from docstrings

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-09 18:04:33 +02:00
bogdankostic
bce84577c6
Upgrade transformers version to 4.18.0 (#2514)
* Upgrade transformers version to 4.18.0

* Adapt tokenization test to upgrade

* Adapt tokenization test to upgrade
2022-05-06 16:57:13 +02:00
Branden Chan
caf1336424
Adjust pydoc markdown config so methods shown with classes (#2511)
* add_member_class_prefix: true

* Update Documentation & Code Style

* Trigger redeploy

* Trigger redeploy

* Fix pydoc param

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 16:00:08 +02:00
Sara Zan
1ed407cb5a
Update version to 1.4.1rc0 (#2509)
* Update version to 1.4.1rc0

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 11:46:31 +02:00
Julian Risch
081b886aa1
Release v1.4.0 (#2502)
* delete unneeded files of last release

* add v1.4.0 docs with updated links

* upgrade version number

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.4.0
2022-05-05 12:24:45 +02:00
Sara Zan
f3e0ba4be9
Fix OpenSearchDocumentStore's __init__ (#2498)
* Move super in OpenSearchDocumentStore and add small test

* Update Documentation & Code Style

* Add Opensearch container to the CI

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:38:09 +02:00
MichelBartels
c7e39e5225
Replace TableTextRetriever with EmbeddingRetriever in Tutorial 15 (#2479)
* replace TableTextRetriever with EmbeddingRetriever in Tutorial 15

* Update Documentation & Code Style

* fix bug

* Update Documentation & Code Style

* update tutorial 15 outputs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-212.eu-west-1.compute.internal>
2022-05-05 10:12:44 +02:00
MichelBartels
5d98810a17
Raise error if torch-scatter is not installed or wrong version is installed (#2486)
* automatically download correct torch-scatter version

* raise error if torch-scatter is not installed

* Update Documentation & Code Style

* catch all import errors and fix linter

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:12:10 +02:00
Julian Risch
1418f0c603
change milvus links from 2.0.0 to 2.0.x (#2496)
* change milvus links from 2.0.0 to 2.0.x

* Update Documentation & Code Style

* fix two broken links

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 18:30:50 +02:00
Julian Risch
fa277bcea8
Upgrade xpdf to 4.04 in Exception text (#2488)
* Upgrade xpdf to 4.04 in Exception text

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 17:42:32 +02:00
Sara Zan
f8e02310bf
Validate YAML files without loading the nodes (#2438)
* Remove BasePipeline and make a module for RayPipeline

* Can load pipelines from yaml, plenty of issues left

* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it

* Fix pipeline tests

* Move some tests out of test_pipeline.py and create MockDenseRetriever

* myoy and pylint (silencing too-many-public-methods)

* Fix issue found in some yaml files and in schema files

* Fix paths to YAML and fix some typos in Ray

* Fix eval tests

* Simplify MockDenseRetriever

* Fix Ray test

* Accidentally pushed merge coinflict, fixed

* Typo in schemas

* Typo in _json_schema.py

* Slightly reduce noisyness of version validation warnings

* Fix version logs tests

* Fix version logs tests again

* remove seemingly unused file

* Add check and test to avoid adding the same node to the pipeline twice

* Update Documentation & Code Style

* Revert config to pipeline_config

* Remo0ve unused import

* Complete reverting to pipeline_config

* Some more stray config=

* Update Documentation & Code Style

* Feedback

* Move back other_nodes tests into pipeline tests temporarily

* Update Documentation & Code Style

* Fixing tests

* Update Documentation & Code Style

* Fixing ray and standard pipeline tests

* Rename colliding load() methods in dense retrievers and faiss

* Update Documentation & Code Style

* Fix mypy on ray.py as well

* Add check for no root node

* Fix tests to use load_from_directory and load_index

* Try to workaround the disabled add_node of RayPipeline

* Update Documentation & Code Style

* Fix Ray test

* Fix FAISS tests

* Relax class check in _add_node_to_pipeline_graph

* Update Documentation & Code Style

* Try to fix mypy in ray.py

* unused import

* Try another fix for Ray

* Fix connector tests

* Update Documentation & Code Style

* Fix ray

* Update Documentation & Code Style

* use BaseComponent.load() in pipelines/base.py

* another round of feedback

* stray BaseComponent.load()

* Update Documentation & Code Style

* Fix FAISS tests too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-04 17:39:06 +02:00
Sara Zan
01ea4bf21f
Change default encoding for PDFToTextConverter from Latin 1 to UTF-8 (#2420)
* Change default encoding for PDFToTextConverter

* Update Documentation & Code Style

* Improve docstring

* Update Documentation & Code Style

* Add list of ligatures to ignore and add the possibility to modify such list at need

* Add docstring

* Add tests

* Rename parameter

* Update Documentation & Code Style

* Move implementation into the base converter to make mypy happier

* Update Documentation & Code Style

* mypy and pylint

* mypy

* move encoding parameter to init of PDFToTextConverter

* Update Documentation & Code Style

* make utf8 default and fix mypy

* Update Documentation & Code Style

* Update Documentation & Code Style

* remove note on encoding in tutorial8

* Update Documentation & Code Style

* skip OCRConverter and test converter.run

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-04 17:01:45 +02:00
bogdankostic
a4e603ce87
Deprecate Milvus1DocumentStore (#2495)
* Add warning message

* Update doc string

* Update Documentation & Code Style

* Change DeprecationWarning to FutureWarning

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 15:09:57 +02:00
Julian Risch
970c476615
Align TransformersReader defaults with FARMReader (#2490)
* Align TransformersReader defaults with vFARMReader

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 10:04:18 +02:00
James Briggs
a0bf34036f
fix dot_product metric in pinecone (#2494) 2022-05-04 09:24:10 +02:00
Ahmed Nabil
9cdd719a6d
Update xpdfreader package installation (#2491)
This Update will fix this exception `Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite. ` Now, converting PDFs wouldn't have any issues.
2022-05-03 18:09:41 +02:00
Tuana Celik
b6e369d1ca
changing the name of the retrievers from es_retriever to retriever (#2487)
* changing the name of the retrievers from es_retriever to retriever

* Update Documentation & Code Style

* name fix 2

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-03 18:08:23 +02:00
tstadel
509944f47d
Add support for positional args in pipeline.get_config() (#2478)
* add support for positional args in pipeline.get_config()

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-02 14:41:07 +02:00
tstadel
7d6b3fe954
Add flag to disable scaling scores to probabilities (#2454)
* add scale_scores_to_probabilities flag

* Update Documentation & Code Style

* fix tests

* fix sql mypy

* Update Documentation & Code Style

* fix responses

* Update Documentation & Code Style

* rename to scale_score_to_probability + docstrings

* use BaseDocumentStore.score_to_probability in elasticsearch and milvus2

* Update Documentation & Code Style

* fix tests

* Update Documentation & Code Style

* add tests

* improve naming

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-02 13:35:07 +02:00
tstadel
32ee271225
fix reader.eval and reader.eval_on_file output (#2476) 2022-04-29 13:52:24 +02:00
Tuana Celik
e2b85e2913
Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever (#2461)
* Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever

* adding missed init file

* Update Documentation & Code Style

* fixed docstring

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-29 10:16:02 +02:00
MichelBartels
0a80cc452c
Linearize tables in EmbeddingRetriever (#2462)
* Linearize tables in EmbeddingRetriever

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* simplify table linearization method + make it private

* Update Documentation & Code Style

* fix typing

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-28 12:06:59 +02:00
Jonathan Gallon
25b87e8cf0
Add support for aliases in elasticsearch document store (#2448)
* Add support for aliases in elasticsearch document store

* Add alias support for OpenSearch

* Missing variable index

* Update Documentation & Code Style

* Add unit test for elasticsearch alias support

* Fix unit test when index is not compatible with haystack

* Fix auto format conflict

* Add comment explaining for loop for alias

* Update Documentation & Code Style

Co-authored-by: Jonathan Gallon <jonathan.gallon@totalenergies.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-04-28 10:10:37 +02:00