MichelBartels
a952ba240f
Include meta data when computing embeddings in EmbeddingRetriever ( #2559 )
...
* include meta data when calculating embeddings in EmbeddingRetriever
* Update Documentation & Code Style
* fix None meta field
* remove default values
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 12:37:04 +02:00
Sara Zan
ff4303c51b
[CI refactoring] Categorize tests into folders ( #2554 )
...
* Categorize tests into folders
* Fix linux_ci.yml and an import
* Wrong path
2022-05-17 09:55:53 +01:00
Sara Zan
81223f8cd1
[CI refactoring] Avoid ray==1.12.0
on Windows ( #2562 )
...
* Avoid ray 1.12.0 on windows (bug)
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 09:55:16 +01:00
MichelBartels
686e9d24ef
Documenting output score of JoinDocuments when using concatenation ( #2561 )
...
* add documentation regarding the score of JoinDocuments when using concatenation
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-16 18:30:07 +02:00
Ivan Lopez
a2a99f79b1
Fix docker image tag with semantic version for releases ( #2548 )
...
* Fix docker tag with semantic version for releases
* Prepend latest docker tag with tagprefix in cache-from
2022-05-16 13:26:33 +02:00
ClaMnc
2b11981b08
set top_k to 5 in SAS to be consistent ( #2550 )
...
* set top_k to 5 in SAS to be consistent
* set top_k to 5 in SAS to be consistent
2022-05-16 10:29:03 +02:00
Sara Zan
00aa1f41d7
convert_files_to_docs typo ( #2546 )
2022-05-13 16:38:43 +02:00
Agnieszka Marzec
2d03a26045
Minor lg changes ( #2533 )
...
* Minor lg change
* Update Documentation & Code Style
* Fix missing articles
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-13 16:12:22 +02:00
Agnieszka Marzec
1ae5a1449b
Update run() and run_batch() params descriptions in API ( #2542 )
...
* Update run() and run_batch() params descriptions
* Update Documentation & Code Style
* Update api params descriptions
* Update Documentation & Code Style
* Fix typo
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Add Bogdan's suggestions
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-05-13 15:11:01 +02:00
bogdankostic
300ee1ac83
Upgrade torch version to 1.11 ( #2538 )
...
* Bump torch version
* Upgrade torch version in torch-scatter
2022-05-13 14:45:53 +02:00
MichelBartels
4f22942cb0
Handle transformers pipeline flattening lists of length 1 ( #2531 )
...
* Handle transformers pipeline flattening lists of length 1
* Consider case where only one document is passed
* Change position of fix
* Make use of top_k_per_candidate in predict method
* Fix predict method
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-05-12 16:11:38 +02:00
tstadel
771ed0bb1d
Remove wrong retriever top_1 metrics from print_eval_report
( #2510 )
...
* remove wrong retriever top_1 metrics
* Update Documentation & Code Style
* don't show wrong examples frame when n_wrong_examples is 0
* Update Documentation & Code Style
* Update Documentation & Code Style
* only use farm reader during eval tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-12 12:34:11 +02:00
bogdankostic
738e008020
Add run_batch
method to all nodes and Pipeline
to allow batch querying ( #2481 )
...
* Add run_batch methods for batch querying
* Update Documentation & Code Style
* Fix mypy
* Update Documentation & Code Style
* Fix mypy
* Fix linter
* Fix tests
* Update Documentation & Code Style
* Fix tests
* Update Documentation & Code Style
* Fix mypy
* Fix rest api test
* Update Documentation & Code Style
* Add Doc strings
* Update Documentation & Code Style
* Add batch_size as attribute to nodes supporting batching
* Adapt error messages
* Adapt type of filters in retrievers
* Revert change about truncation_warning in summarizer
* Unify multiple_doc_lists tests
* Use smaller models in extractor tests
* Add return types to JoinAnswers and RouteDocuments
* Adapt return statements in reader's run_batch method
* Allow list of filters
* Adapt error messages
* Update Documentation & Code Style
* Fix tests
* Fix mypy
* Adapt print_questions
* Remove disabling warning about too many public methods
* Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py
* Add type check
* Update Documentation & Code Style
* Adapt tutorial 11
* Update Documentation & Code Style
* Add query_batch method for DCDocStore
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 11:11:00 +02:00
bogdankostic
5378a9ab48
Fix tutorials 4, 7 and 8 ( #2526 )
...
* Fix tutorials 4, 7 and 8
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 09:17:05 +02:00
bogdankostic
4581b91e83
Make DeepsetCloudDocumentStore
work with non-existing index ( #2513 )
...
* Make DeepsetCloudDocumentStore work with non-existing index
* Update Documentation & Code Style
* Add tests
* Update Documentation & Code Style
* Fix tests, adapt warning messages + lowercase deepset
* Update Documentation & Code Style
* Fix typo in test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 15:21:35 +02:00
Massimiliano Pippi
7595bb49ab
rearrange contributing guidelines ( #2515 )
...
* rearrange contributing guidelines
* revert unneeded change to README
2022-05-10 14:59:46 +02:00
Branden Chan
43bfea6f3d
Add sort arg to JoinAnswers ( #2436 )
...
* Add sort arg to JoinAnswers
* Update Documentation & Code Style
* Change naming and docstring
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 11:47:00 +02:00
Sara Zan
15a9ff6f67
PR template mention of enabling Actions ( #2523 )
...
* Update version to 1.4.1rc0
* Add hint of enabling action on the fork in the PR template
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 09:46:09 +02:00
Sara Zan
3d8bdf3cb6
Remove safe import from ElasticsearchDocumentStore
( #2522 )
...
* Update version to 1.4.1rc0
* Elasticsearch is not an optional dependency
* Fix import path
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-09 18:07:42 +02:00
Gabriel Altay
988568882a
fix small typo in Document doc string ( #2520 )
...
* fix small typo in Document doc string
Was going through the tutorial, then digging through the code and just noticed a small typo
* generate markdown file changes from docstrings
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-09 18:04:33 +02:00
bogdankostic
bce84577c6
Upgrade transformers version to 4.18.0 ( #2514 )
...
* Upgrade transformers version to 4.18.0
* Adapt tokenization test to upgrade
* Adapt tokenization test to upgrade
2022-05-06 16:57:13 +02:00
Branden Chan
caf1336424
Adjust pydoc markdown config so methods shown with classes ( #2511 )
...
* add_member_class_prefix: true
* Update Documentation & Code Style
* Trigger redeploy
* Trigger redeploy
* Fix pydoc param
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 16:00:08 +02:00
Sara Zan
1ed407cb5a
Update version to 1.4.1rc0 ( #2509 )
...
* Update version to 1.4.1rc0
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 11:46:31 +02:00
Julian Risch
081b886aa1
Release v1.4.0 ( #2502 )
...
* delete unneeded files of last release
* add v1.4.0 docs with updated links
* upgrade version number
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.4.0
2022-05-05 12:24:45 +02:00
Sara Zan
f3e0ba4be9
Fix OpenSearchDocumentStore
's __init__
( #2498 )
...
* Move super in OpenSearchDocumentStore and add small test
* Update Documentation & Code Style
* Add Opensearch container to the CI
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:38:09 +02:00
MichelBartels
c7e39e5225
Replace TableTextRetriever with EmbeddingRetriever in Tutorial 15 ( #2479 )
...
* replace TableTextRetriever with EmbeddingRetriever in Tutorial 15
* Update Documentation & Code Style
* fix bug
* Update Documentation & Code Style
* update tutorial 15 outputs
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-212.eu-west-1.compute.internal>
2022-05-05 10:12:44 +02:00
MichelBartels
5d98810a17
Raise error if torch-scatter is not installed or wrong version is installed ( #2486 )
...
* automatically download correct torch-scatter version
* raise error if torch-scatter is not installed
* Update Documentation & Code Style
* catch all import errors and fix linter
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:12:10 +02:00
Julian Risch
1418f0c603
change milvus links from 2.0.0 to 2.0.x ( #2496 )
...
* change milvus links from 2.0.0 to 2.0.x
* Update Documentation & Code Style
* fix two broken links
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 18:30:50 +02:00
Julian Risch
fa277bcea8
Upgrade xpdf to 4.04 in Exception text ( #2488 )
...
* Upgrade xpdf to 4.04 in Exception text
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 17:42:32 +02:00
Sara Zan
f8e02310bf
Validate YAML files without loading the nodes ( #2438 )
...
* Remove BasePipeline and make a module for RayPipeline
* Can load pipelines from yaml, plenty of issues left
* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it
* Fix pipeline tests
* Move some tests out of test_pipeline.py and create MockDenseRetriever
* myoy and pylint (silencing too-many-public-methods)
* Fix issue found in some yaml files and in schema files
* Fix paths to YAML and fix some typos in Ray
* Fix eval tests
* Simplify MockDenseRetriever
* Fix Ray test
* Accidentally pushed merge coinflict, fixed
* Typo in schemas
* Typo in _json_schema.py
* Slightly reduce noisyness of version validation warnings
* Fix version logs tests
* Fix version logs tests again
* remove seemingly unused file
* Add check and test to avoid adding the same node to the pipeline twice
* Update Documentation & Code Style
* Revert config to pipeline_config
* Remo0ve unused import
* Complete reverting to pipeline_config
* Some more stray config=
* Update Documentation & Code Style
* Feedback
* Move back other_nodes tests into pipeline tests temporarily
* Update Documentation & Code Style
* Fixing tests
* Update Documentation & Code Style
* Fixing ray and standard pipeline tests
* Rename colliding load() methods in dense retrievers and faiss
* Update Documentation & Code Style
* Fix mypy on ray.py as well
* Add check for no root node
* Fix tests to use load_from_directory and load_index
* Try to workaround the disabled add_node of RayPipeline
* Update Documentation & Code Style
* Fix Ray test
* Fix FAISS tests
* Relax class check in _add_node_to_pipeline_graph
* Update Documentation & Code Style
* Try to fix mypy in ray.py
* unused import
* Try another fix for Ray
* Fix connector tests
* Update Documentation & Code Style
* Fix ray
* Update Documentation & Code Style
* use BaseComponent.load() in pipelines/base.py
* another round of feedback
* stray BaseComponent.load()
* Update Documentation & Code Style
* Fix FAISS tests too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-04 17:39:06 +02:00
Sara Zan
01ea4bf21f
Change default encoding for PDFToTextConverter
from Latin 1
to UTF-8
( #2420 )
...
* Change default encoding for PDFToTextConverter
* Update Documentation & Code Style
* Improve docstring
* Update Documentation & Code Style
* Add list of ligatures to ignore and add the possibility to modify such list at need
* Add docstring
* Add tests
* Rename parameter
* Update Documentation & Code Style
* Move implementation into the base converter to make mypy happier
* Update Documentation & Code Style
* mypy and pylint
* mypy
* move encoding parameter to init of PDFToTextConverter
* Update Documentation & Code Style
* make utf8 default and fix mypy
* Update Documentation & Code Style
* Update Documentation & Code Style
* remove note on encoding in tutorial8
* Update Documentation & Code Style
* skip OCRConverter and test converter.run
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-04 17:01:45 +02:00
bogdankostic
a4e603ce87
Deprecate Milvus1DocumentStore
( #2495 )
...
* Add warning message
* Update doc string
* Update Documentation & Code Style
* Change DeprecationWarning to FutureWarning
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 15:09:57 +02:00
Julian Risch
970c476615
Align TransformersReader defaults with FARMReader ( #2490 )
...
* Align TransformersReader defaults with vFARMReader
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-04 10:04:18 +02:00
James Briggs
a0bf34036f
fix dot_product metric in pinecone ( #2494 )
2022-05-04 09:24:10 +02:00
Ahmed Nabil
9cdd719a6d
Update xpdfreader
package installation ( #2491 )
...
This Update will fix this exception `Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite. ` Now, converting PDFs wouldn't have any issues.
2022-05-03 18:09:41 +02:00
Tuana Celik
b6e369d1ca
changing the name of the retrievers from es_retriever to retriever ( #2487 )
...
* changing the name of the retrievers from es_retriever to retriever
* Update Documentation & Code Style
* name fix 2
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-03 18:08:23 +02:00
tstadel
509944f47d
Add support for positional args in pipeline.get_config() ( #2478 )
...
* add support for positional args in pipeline.get_config()
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-02 14:41:07 +02:00
tstadel
7d6b3fe954
Add flag to disable scaling scores to probabilities ( #2454 )
...
* add scale_scores_to_probabilities flag
* Update Documentation & Code Style
* fix tests
* fix sql mypy
* Update Documentation & Code Style
* fix responses
* Update Documentation & Code Style
* rename to scale_score_to_probability + docstrings
* use BaseDocumentStore.score_to_probability in elasticsearch and milvus2
* Update Documentation & Code Style
* fix tests
* Update Documentation & Code Style
* add tests
* improve naming
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-02 13:35:07 +02:00
tstadel
32ee271225
fix reader.eval and reader.eval_on_file output ( #2476 )
2022-04-29 13:52:24 +02:00
Tuana Celik
e2b85e2913
Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever ( #2461 )
...
* Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever
* adding missed init file
* Update Documentation & Code Style
* fixed docstring
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-29 10:16:02 +02:00
MichelBartels
0a80cc452c
Linearize tables in EmbeddingRetriever ( #2462 )
...
* Linearize tables in EmbeddingRetriever
* Update Documentation & Code Style
* Fix typing
* Update Documentation & Code Style
* simplify table linearization method + make it private
* Update Documentation & Code Style
* fix typing
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-28 12:06:59 +02:00
Jonathan Gallon
25b87e8cf0
Add support for aliases in elasticsearch document store ( #2448 )
...
* Add support for aliases in elasticsearch document store
* Add alias support for OpenSearch
* Missing variable index
* Update Documentation & Code Style
* Add unit test for elasticsearch alias support
* Fix unit test when index is not compatible with haystack
* Fix auto format conflict
* Add comment explaining for loop for alias
* Update Documentation & Code Style
Co-authored-by: Jonathan Gallon <jonathan.gallon@totalenergies.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-04-28 10:10:37 +02:00
tstadel
2a44840f82
Ignore mypy issues regarding files param of requests.post ( #2468 )
...
* ignore mypy issues regarding files param of requests.post
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-28 08:36:29 +02:00
Malte Pietsch
766e75370c
Update docs of DeepsetCloudDocumentStore ( #2460 )
...
* Update docs of DeepsetCloudDocumentStore
* Update Documentation & Code Style
* Update docstring
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* Update Documentation & Code Style
* move DEFAULT_API_ENDPOINT
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-04-27 19:40:39 +02:00
tstadel
7a1e0bb7bc
rename dataset to evaluation_set when logging to mlflow ( #2457 )
2022-04-26 20:28:48 +02:00
tstadel
cc1bb9ad73
Disable telemetry logs per default ( #2463 )
2022-04-26 20:28:12 +02:00
tstadel
7498c7c6fb
Fix and use delete_index instead of delete_documents in tests ( #2453 )
...
* use delete_index instead of delete_documents in tests
* fix delete_index
* fix delete_index() in memory and milvus
* fix imports
* fix memory keyerrors
* Update Documentation & Code Style
* increase timeout for pinecone tests to 60 minutes
* clean get_document_store()
* use recreate_index in tests
* Update Documentation & Code Style
* fix tests
* fix remaining tests
* log index deleted
* fix test_eval_pipeline
* simplify existing index detection in weaviate
* delete label_index on recreate_index for pinecone and milvus
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 19:06:30 +02:00
Tuana Celik
d49e92e21c
ElasticsearchRetriever to BM25Retriever ( #2423 )
...
* change class names to bm25
* Update Documentation & Code Style
* Update Documentation & Code Style
* Update Documentation & Code Style
* Add back all_terms_must_match
* fix syntax
* Update Documentation & Code Style
* Update Documentation & Code Style
* Creating a wrapper for old ES retriever with deprecated wrapper
* Update Documentation & Code Style
* New method for deprecating old ESRetriever
* New attempt for deprecating the ESRetriever
* Reverting to the simplest solution - warning logged
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-04-26 16:09:39 +02:00
Sara Zan
34bca2ba83
Make sure that debug=True
and params={'debug': True}
behaves the same way ( #2442 )
...
* Make sure that debug=True and params={'debug': True} behaves the same way
* Update Documentation & Code Style
* Account for the possibility of node_input being None
* Fix condition
* Avoid situation where params=None
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 09:24:19 +02:00
tstadel
60ff46e4e1
Log evaluation results to MLflow ( #2337 )
...
* track eval results in mlflow
* Update Documentation & Code Style
* add pipeline.yaml and environment info
* improve logging to mlflow
* Update Documentation & Code Style
* introduce ExperimentTracker
* Update Documentation & Code Style
* move modeling.utils.logger to utils.experiment_tracking
* renaming: tracker and TrackingHead
* Update Documentation & Code Style
* refactor env tracking
* fix pylint findings
* Update Documentation & Code Style
* rename MLFlowTrackingHead to MLflowTrackingHead
* implement dataset hash
* Update Documentation & Code Style
* set docstrings
* Update Documentation & Code Style
* introduce PipelineBundle and Corpus
* Update Documentation & Code Style
* support reusing index
* Update Documentation & Code Style
* rename Corpus to FileCorpus
* fix Corpus -> FileCorpus
* Update Documentation & Code Style
* resolve cyclic dependencies
* fix linter issues
* Update Documentation & Code Style
* remove helper classes
* Update Documentation & Code Style
* fix imports
* fix another unused import
* update docstrings
* Update Documentation & Code Style
* simplify usage of experiment tracking tools
* fix Literal import
* revert schema changes
* Update Documentation & Code Style
* always end run
* Update Documentation & Code Style
* fix mypy issue
* rename to execute_eval_run
* Update Documentation & Code Style
* fix merge of get_or_create_env_meta_data
* improve docstrings
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-25 20:14:48 +02:00