haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-09 06:13:43 +00:00

Author	SHA1	Message	Date
Aleksander Smywiński-Pohl	642229255f	Use AutoTokenizer by default, to easily adapt to new models and token… (#1902 ) * Use AutoTokenizer by default, to easily adapt to new models and tokenizers * Add missing AutoTokenizer import * Apply Black * Missing import * Fix DPR tests * Remove tests on max length * Update Documentation & Code Style Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-15 13:13:48 +02:00
Sara Zan	584e046642	`AnswerToSpeech` (#2584 ) * Add new audio answer primitives * Add AnswerToSpeech * Add dependency group * Update Documentation & Code Style * Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives * Add tests * Update Documentation & Code Style * Add ability to compress audio and more tests * Add audio group to test, all and all-gpu * fix pylint * Update Documentation & Code Style * Accidental git tag * Try pleasing mypy * Update Documentation & Code Style * fix pylint * Add warning for missing OS library and support in CI * Try fixing mypy * Update Documentation & Code Style * Add docs, simplify args for audio nodes and add tutorials * Fix mypy * Fix run_batch * Feedback on tutorials * fix mypy and pylint * Fix mypy again * Fix mypy yet again * Fix the ci * Fix dicts merge and install ffmpeg on CI * Make the audio nodes import safe * Trying to increase tolerance in audio test * Fix import paths * fix linter * Update Documentation & Code Style * Add audio libs in unit tests * Update _text_to_speech.py * Update answer_to_speech.py * Use dedicated dataset & update telemetry * Remove and use distilled roberta * Revert special primitives so that the nodes run in indexing * Improve tutorials and fix smaller bugs * Update Documentation & Code Style * Fix serialization issue * Update Documentation & Code Style * Improve tutorial * Update Documentation & Code Style * Update _text_to_speech.py * Minor lg updates * Minor lg updates to tutorial * Making indexing work in tutorials * Update Documentation & Code Style * Improve docstrings * Try to use GPU when available * Update Documentation & Code Style * Fixi mypy and pylint * Try to pass the device correctly * Update Documentation & Code Style * Use type of device * use .cpu() * Improve .ipynb * update apt index to be able to download libsndfile1 * Fix SpeechDocument.from_dict() * Change pip URL Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-06-15 10:13:18 +02:00
Sara Zan	54518ac790	[CI Refactoring] Refactor `Document` fixtures in tests (#2577 ) * Refactor document fixtures * Add embedding files * Update Documentation & Code Style * Indentation issue * Update Documentation & Code Style * Fix type conversion in conftest.py * Update Documentation & Code Style * mypy on sql.py * mypy on crawler.py * mypy on pinecone.py * Adapt retriever tests * Update Documentation & Code Style * mypy on crawler.py * Update Documentation & Code Style * mypy on crawler.py again * Update Documentation & Code Style * mypy fix was too rough * Fix some more tests * Update Documentation & Code Style * Skip meaningless test on FilterRetriever * Make embedding values less specific * Update Documentation & Code Style * Use stable IDs in retriever tests that depend on it * Remove needless fixtures * docs_with_ids * Update Documentation & Code Style * Typo * Fix retriever tests * Fix reader tests * Update Documentation & Code Style * Workaround #2626 * Update Documentation & Code Style * Fix label generator tests * Reorder vectors * remove print * Update Documentation & Code Style * Update Documentation & Code Style * git tags leftover * Update Documentation & Code Style * fix last failing test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 18:22:48 +02:00
Sara Zan	e5423b1515	Fix markers in GPL tests (#2652 )	2022-06-10 06:42:19 -04:00
Sara Zan	33a51fa915	[CI Refactoring] Move unrelated tests out of `test_pipeline.py` (#2573 ) * move unrelated tests out of test_pipeline.py * Update Documentation & Code Style * fix fixture name * Typo * Make sure all docs are Documents in routedocuments tests * Fix tests * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 11:45:13 +02:00
Vladimir Blagojevic	b13c32eb9c	Add GPL API docs, unit tests update (#2634 ) * Update test_label_generator.py * GPL increase default batch size to 16 * GPL - API docs * GPL - split unit tests * Make devs aware of multilingual GPL * Create separate train/save test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 05:25:28 -04:00
Stefano Fiorucci	c178f60e3a	Make crawler extract also hidden text (#2642 ) * make crawler extract also hidden text * Update Documentation & Code Style * try to adapt test for extract_hidden_text * Update Documentation & Code Style * fix test bug * fix bug in test * added test for hidden text" * Update Documentation & Code Style * fix bug in test * Update Documentation & Code Style * fix test * Update Documentation & Code Style * fix other test bug Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 09:51:41 +02:00
Massimiliano Pippi	374155fd5c	Move Opensearch document store in its own module (#2603 ) * move OpenSearchDocumentStore into its own Python module * Update Documentation & Code Style * mark test with (sigh) elasticsearch * skip opensearch tests on windows Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-08 16:37:23 +02:00
tstadel	df6ebeb087	Do not show success message on failed evalset upload (#2639 ) * Do not show success message on failed evalset upload * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-08 08:31:25 +02:00
Sara Zan	c17969e001	Fix failing `Crawler` test (#2640 ) * Make tests insensntive to ordering of crawled pages * fix docstring	2022-06-07 18:14:43 +02:00
Sara Zan	59608ca474	[CI Refactoring] Workflow refactoring (#2576 ) * Unify CI tests (from #2466) * Update Documentation & Code Style * Change folder names * Fix markers list * Remove marker 'slow', replaced with 'integration' * Soften children check * Start ES first so it has time to boot while Python is setup * Run the full workflow * Try to make pip upgrade on Windows * Set KG tests as integration * Update Documentation & Code Style * typo * faster pylint * Make Pylint use the cache * filter diff files for pylint * debug pylint statement * revert pylint changes * Remove path from asserted log (fails on Windows) * Skip preprocessor test on Windows * Tackling Windows specific failures * Fix pytest command for windows suites * Remove \ from command * Move poppler test into integration * Skip opensearch test on windows * Add tolerance in reader sas score for Windows * Another pytorch approx * Raise time limit for unit tests :( * Skip poppler test on Windows CI * Specify to pull with FF only in docs check * temporarily run the docs check immediately * Allow merge commit for now * Try without fetch depth * Accelerating test * Accelerating test * Add repository and ref alongside fetch-depth * Separate out code&docs check from tests * Use setup-python cache * Delete custom action * Remove the pull step in the docs check, will find a way to run on bot commits * Add requirements.txt in .github for caching * Actually install dependencies * Change deps group for pylint * Unclear why the requirements.txt is still required :/ * Fix the code check python setup * Install all deps for pylint * Make the autoformat check depend on tests and doc updates workflows * Try installing dependencies in another order * Try again to install the deps * quoting the paths * Ad back the requirements * Try again to install rest_api and ui * Change deps group * Duplicate haystack install line * See if the cache is the problem * Disable also in mypy, who knows * split the install step * Split install step everywhere * Revert "Separate out code&docs check from tests" This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd. * Add back the action * Proactive support for audio (see text2speech branch) * Fix label generator tests * Remove install of libsndfile1 on win temporarily * exclude audio tests on win * install ffmpeg for integration tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-07 09:23:03 +02:00
Sara Zan	83648b9bc0	[CI refactoring] Rewrite `Crawler` tests (#2557 ) * Rewrite crawler tests (very slow) and fix small crawler bug * Update Documentation & Code Style * compile the regex only once * Factor out the html files & add content check to most tests * Clarify that even starting URLs can be excluded * Update Documentation & Code Style * Change signature * Fix failing test * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-06 17:52:37 +02:00
Ryan Russell	c1b7948e10	Improve Docs Readability (#2617 ) Signed-off-by: Ryan Russell <git@ryanrussell.org>	2022-06-03 09:57:40 +02:00
Vladimir Blagojevic	e10a3fba74	Add Generative Pseudo Labeling (#2388 )	2022-06-02 10:12:47 -04:00
bogdankostic	61d9429c25	Simplify loading of `EmbeddingRetriever` (#2619 ) * Infer model format for EmbeddingRetriever automatically * Update Documentation & Code Style * Adapt conftest to automatic inference of model_format * Update Documentation & Code Style * Fix tests * Update Documentation & Code Style * Fix tests * Adapt tutorials * Update Documentation & Code Style * Add test for similarity scores with sentence transformers * Adapt doc string and warning message * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-02 15:05:29 +02:00
bogdankostic	a617ab950b	Fix number of returned values in `get_metadata_values_by_key` (#2614 ) * Apply pagination in get_metdata_values_by_key * Update Documentation & Code Style * Adapt test * Fix test_eval.py by using pytest.approx Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-01 10:21:28 +02:00
tstadel	6b78990a38	Fix Pipeline.get_config() for forked pipelines (#2616 ) * Fix Pipeline.get_config() for forked pipelines * exclude root nodes * minor quickfix	2022-05-31 21:26:53 +02:00
tstadel	0efad96e08	DC SDK: Add possibility to upload evaluation sets to DC (#2610 ) * Add possibility to upload evaluation sets to DC * fix test_eval sas comparisons * quickwin docstring feedback changes * Add hint about annotation tool and mark optional and required columns * minor changes to docstrings	2022-05-31 17:08:19 +02:00
tstadel	fc25adf959	Create eval runs on deepset Cloud (#2534 ) * add EvaluationRunClient * Update Documentation & Code Style * temporarily resolve names to ids * Update Documentation & Code Style * add delete and update methods * minor fixes * add experiments facade * dummy implement start_run() * start eval runs added * Update Documentation & Code Style * fix merge * switch to names on api level * add create eval_run test * Update Documentation & Code Style * further tests added * update docstrings * add docstrings * add missing tags param, fix docstrings * refactor _get_evaluation_sets * fix mypy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-30 18:18:19 +02:00
bogdankostic	0395533a78	Add `run_batch` for standard pipelines (#2595 ) * Add run_batch for standard pipelines * Update Documentation & Code Style * Fix mypy * Remove code duplication * Fix linter Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-27 10:42:48 +02:00
tstadel	7caca41c5d	Support context matching in `pipeline.eval()` (#2482 ) * calculate context pred metrics * Update Documentation & Code Style * extend doc_relevance_col values * fix import order * Update Documentation & Code Style * fix mypy * fix typings literal import * add option for custom document_id_field * Update Documentation & Code Style * fix tests and dataframe col-order * Update Documentation & Code Style * rename content to context in eval dataframe * add backward compatibility to EvaluationResult.load() * Update Documentation & Code Style * add docstrings * Update Documentation & Code Style * support sas * Update Documentation & Code Style * add answer_scope param * Update Documentation & Code Style * rework doc_relevance_col and keep document_id col in case of custom_document_id_field * Update Documentation & Code Style * improve docstrings * Update Documentation & Code Style * rename document_relevance_criterion into document_scope * Update Documentation & Code Style * add document_scope and answer_scope to print_eval_report * support all new features in execute_eval_run() * fix imports * fix mypy * Update Documentation & Code Style * rename pred_label_sas_grid into pred_label_matrix * update dataframe schema and sorting * Update Documentation & Code Style * pass through context_matching params and extend document_scope test * Update Documentation & Code Style * add answer_scope tests * fix context_matching_threshold for document metrics * shorten dataframe apply calls * Update Documentation & Code Style * fix queries getting lost if nothing was retrieved * Update Documentation & Code Style * Update Documentation & Code Style * use document_id scopes * Update Documentation & Code Style * fix answer_scope literal * Update Documentation & Code Style * update the docs (lg changes) * Update Documentation & Code Style * update tutorial 5 * Update Documentation & Code Style * fix tests * Add minor lg updates * final docstring changes * fix single quotes in docstrings * Update Documentation & Code Style * dataframe scopes added for each column * better docstrings for context_matching params * Update Documentation & Code Style * fix summarizer eval test * Update Documentation & Code Style * fix test * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2022-05-24 18:11:52 +02:00
bogdankostic	867695ad0c	Change signature of queries param in batch methods (#2575 ) * Change signature of queries param in batch methods * Update Documentation & Code Style * Fix mypy * Remove unused import * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-24 12:33:45 +02:00
Julian Risch	075ed7fbcb	Remove encoding option from PDFToTextOCRConverter (#2553 ) * remove encoding option from PDFToTextOCRConverter * Update Documentation & Code Style * add unused 'encoding' param to PDFToTextOCRConverter * Update Documentation & Code Style * call run instead of convert to use ligature replacing * Update Documentation & Code Style * add text to check installed poppler version * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-24 11:31:32 +02:00
Sara Zan	7ab0239e31	Do not copy `_component_config` in `get_components_definitions` (#2574 ) * Do not deepcopy in get_components_definitions * Update Documentation & Code Style * comment * unused import * Add test to ensure env vars don't overwrite _component_config * Update Documentation & Code Style * Add test for get_config * Add test to show the rename is not sufficient * Update Documentation & Code Style * copy only if it's strictly necessary * Update Documentation & Code Style * Apply suggestions from code review Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> * review feedback Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-05-24 09:53:59 +02:00
tstadel	3ab4dac58d	Upload files to deepset Cloud (#2570 ) * added upload_files * Update Documentation & Code Style * expose file client via DeepsetCloud facade * Update Documentation & Code Style * tests added * Update Documentation & Code Style * always read file in binary mode and guess mimetype * add delete and list functions * fix method literals Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-23 17:05:56 +02:00
tstadel	0e83535108	Show search endpoint after deepset Cloud deployment (#2569 ) * show try-out-message after deployment * better messages * Update Documentation & Code Style * tests added * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-23 14:19:31 +02:00
Massimiliano Pippi	a9a4156731	[Weaviate] Exit the while loop when we query less documents than available (#2537 ) * exit the while loop when we query less documents than available in Weaviate * use monkeypatch fixture, remove unused markers * we know key is there, use brackets to get the value * use custom exception * add warning message when we hit the QUERY_MAXIMUM_RESULTS problem * restore pytest marker * removed unused import * make the warning message more clear	2022-05-20 09:07:03 +02:00
Sara Zan	fd2ca359fe	Validation for Ray pipelines (#2545 ) * Ray pipelines now validate * Update Documentation & Code Style * rename Ray pipeline in tests * Add extras:ray to the test pipeline * pylint Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-19 19:40:03 +02:00
tstadel	f6e3a63906	Prevent losing names of utilized components when loaded from config (#2525 ) * Prevent losing names of utilized components when loaded from config * Update Documentation & Code Style * update test * fix failing tests * Update Documentation & Code Style * fix even more tests * Update Documentation & Code Style * incorporate review feedback Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-18 14:17:54 +02:00
tstadel	110b9c2b0a	Warnings for write operations of `DeepsetCloudDocumentStore` (#2565 ) * log inputs to write operations * Update Documentation & Code Style * adjust tests * simplify by using decorator for write operation functions * Update Documentation & Code Style * fix comma * fix comma in test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-17 17:53:55 +02:00
Sara Zan	ff4303c51b	[CI refactoring] Categorize tests into folders (#2554 ) * Categorize tests into folders * Fix linux_ci.yml and an import * Wrong path	2022-05-17 09:55:53 +01:00
tstadel	771ed0bb1d	Remove wrong retriever top_1 metrics from `print_eval_report` (#2510 ) * remove wrong retriever top_1 metrics * Update Documentation & Code Style * don't show wrong examples frame when n_wrong_examples is 0 * Update Documentation & Code Style * Update Documentation & Code Style * only use farm reader during eval tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-12 12:34:11 +02:00
bogdankostic	738e008020	Add `run_batch` method to all nodes and `Pipeline` to allow batch querying (#2481 ) * Add run_batch methods for batch querying * Update Documentation & Code Style * Fix mypy * Update Documentation & Code Style * Fix mypy * Fix linter * Fix tests * Update Documentation & Code Style * Fix tests * Update Documentation & Code Style * Fix mypy * Fix rest api test * Update Documentation & Code Style * Add Doc strings * Update Documentation & Code Style * Add batch_size as attribute to nodes supporting batching * Adapt error messages * Adapt type of filters in retrievers * Revert change about truncation_warning in summarizer * Unify multiple_doc_lists tests * Use smaller models in extractor tests * Add return types to JoinAnswers and RouteDocuments * Adapt return statements in reader's run_batch method * Allow list of filters * Adapt error messages * Update Documentation & Code Style * Fix tests * Fix mypy * Adapt print_questions * Remove disabling warning about too many public methods * Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py * Add type check * Update Documentation & Code Style * Adapt tutorial 11 * Update Documentation & Code Style * Add query_batch method for DCDocStore * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-11 11:11:00 +02:00
bogdankostic	4581b91e83	Make `DeepsetCloudDocumentStore` work with non-existing index (#2513 ) * Make DeepsetCloudDocumentStore work with non-existing index * Update Documentation & Code Style * Add tests * Update Documentation & Code Style * Fix tests, adapt warning messages + lowercase deepset * Update Documentation & Code Style * Fix typo in test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-10 15:21:35 +02:00
bogdankostic	bce84577c6	Upgrade transformers version to 4.18.0 (#2514 ) * Upgrade transformers version to 4.18.0 * Adapt tokenization test to upgrade * Adapt tokenization test to upgrade	2022-05-06 16:57:13 +02:00
Sara Zan	f3e0ba4be9	Fix `OpenSearchDocumentStore`'s `__init__` (#2498 ) * Move super in OpenSearchDocumentStore and add small test * Update Documentation & Code Style * Add Opensearch container to the CI Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-05 10:38:09 +02:00
Sara Zan	f8e02310bf	Validate YAML files without loading the nodes (#2438 ) * Remove BasePipeline and make a module for RayPipeline * Can load pipelines from yaml, plenty of issues left * Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it * Fix pipeline tests * Move some tests out of test_pipeline.py and create MockDenseRetriever * myoy and pylint (silencing too-many-public-methods) * Fix issue found in some yaml files and in schema files * Fix paths to YAML and fix some typos in Ray * Fix eval tests * Simplify MockDenseRetriever * Fix Ray test * Accidentally pushed merge coinflict, fixed * Typo in schemas * Typo in _json_schema.py * Slightly reduce noisyness of version validation warnings * Fix version logs tests * Fix version logs tests again * remove seemingly unused file * Add check and test to avoid adding the same node to the pipeline twice * Update Documentation & Code Style * Revert config to pipeline_config * Remo0ve unused import * Complete reverting to pipeline_config * Some more stray config= * Update Documentation & Code Style * Feedback * Move back other_nodes tests into pipeline tests temporarily * Update Documentation & Code Style * Fixing tests * Update Documentation & Code Style * Fixing ray and standard pipeline tests * Rename colliding load() methods in dense retrievers and faiss * Update Documentation & Code Style * Fix mypy on ray.py as well * Add check for no root node * Fix tests to use load_from_directory and load_index * Try to workaround the disabled add_node of RayPipeline * Update Documentation & Code Style * Fix Ray test * Fix FAISS tests * Relax class check in _add_node_to_pipeline_graph * Update Documentation & Code Style * Try to fix mypy in ray.py * unused import * Try another fix for Ray * Fix connector tests * Update Documentation & Code Style * Fix ray * Update Documentation & Code Style * use BaseComponent.load() in pipelines/base.py * another round of feedback * stray BaseComponent.load() * Update Documentation & Code Style * Fix FAISS tests too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-05-04 17:39:06 +02:00
Sara Zan	01ea4bf21f	Change default encoding for `PDFToTextConverter` from `Latin 1` to `UTF-8` (#2420 ) * Change default encoding for PDFToTextConverter * Update Documentation & Code Style * Improve docstring * Update Documentation & Code Style * Add list of ligatures to ignore and add the possibility to modify such list at need * Add docstring * Add tests * Rename parameter * Update Documentation & Code Style * Move implementation into the base converter to make mypy happier * Update Documentation & Code Style * mypy and pylint * mypy * move encoding parameter to init of PDFToTextConverter * Update Documentation & Code Style * make utf8 default and fix mypy * Update Documentation & Code Style * Update Documentation & Code Style * remove note on encoding in tutorial8 * Update Documentation & Code Style * skip OCRConverter and test converter.run * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-05-04 17:01:45 +02:00
tstadel	509944f47d	Add support for positional args in pipeline.get_config() (#2478 ) * add support for positional args in pipeline.get_config() * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-02 14:41:07 +02:00
tstadel	7d6b3fe954	Add flag to disable scaling scores to probabilities (#2454 ) * add scale_scores_to_probabilities flag * Update Documentation & Code Style * fix tests * fix sql mypy * Update Documentation & Code Style * fix responses * Update Documentation & Code Style * rename to scale_score_to_probability + docstrings * use BaseDocumentStore.score_to_probability in elasticsearch and milvus2 * Update Documentation & Code Style * fix tests * Update Documentation & Code Style * add tests * improve naming * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-02 13:35:07 +02:00
Tuana Celik	e2b85e2913	Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever (#2461 ) * Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever * adding missed init file * Update Documentation & Code Style * fixed docstring * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-29 10:16:02 +02:00
Jonathan Gallon	25b87e8cf0	Add support for aliases in elasticsearch document store (#2448 ) * Add support for aliases in elasticsearch document store * Add alias support for OpenSearch * Missing variable index * Update Documentation & Code Style * Add unit test for elasticsearch alias support * Fix unit test when index is not compatible with haystack * Fix auto format conflict * Add comment explaining for loop for alias * Update Documentation & Code Style Co-authored-by: Jonathan Gallon <jonathan.gallon@totalenergies.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-04-28 10:10:37 +02:00
tstadel	7498c7c6fb	Fix and use delete_index instead of delete_documents in tests (#2453 ) * use delete_index instead of delete_documents in tests * fix delete_index * fix delete_index() in memory and milvus * fix imports * fix memory keyerrors * Update Documentation & Code Style * increase timeout for pinecone tests to 60 minutes * clean get_document_store() * use recreate_index in tests * Update Documentation & Code Style * fix tests * fix remaining tests * log index deleted * fix test_eval_pipeline * simplify existing index detection in weaviate * delete label_index on recreate_index for pinecone and milvus * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-26 19:06:30 +02:00
Tuana Celik	d49e92e21c	ElasticsearchRetriever to BM25Retriever (#2423 ) * change class names to bm25 * Update Documentation & Code Style * Update Documentation & Code Style * Update Documentation & Code Style * Add back all_terms_must_match * fix syntax * Update Documentation & Code Style * Update Documentation & Code Style * Creating a wrapper for old ES retriever with deprecated wrapper * Update Documentation & Code Style * New method for deprecating old ESRetriever * New attempt for deprecating the ESRetriever * Reverting to the simplest solution - warning logged * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-04-26 16:09:39 +02:00
Adrien Wald	c401e86099	Use `ElasticsearchDocumentStore.get_all_documents` in `ElasticsearchFilterOnlyRetriever.retrieve` (#2151 ) * use get_all_documents in ElasticsearchFilterOnlyRetriever.retrieve * Update Documentation & Code Style * add test case for es_filter_only retriever * Update Documentation & Code Style * fix test by adding empty string for query * Update Documentation & Code Style * add explicit name of argument "query" Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-04-25 09:53:48 +02:00
tstadel	25475a68c7	Match answer sorting in `QuestionAnsweringHead` with `FARMReader` (#2414 ) * match no_answer confidence * Update Documentation & Code Style * test added * Update Documentation & Code Style * fix tests * Update Documentation & Code Style * apply penalties of scores to confidences too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-21 11:24:39 +02:00
Sara Zan	07d7ecbff1	Make `python-magic` fully optional (#2412 ) * Add windows specific package for python-magic * Disable some tests on Windows and add explanatory warning in case of issues with libmagic Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-20 09:18:02 +02:00
tstadel	e862400256	Prevent Stackoverflow on Windows CI (#2426 ) * prevent stackoverflow on windows ci * Update Documentation & Code Style * fix is_windows condition * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-04-19 16:10:39 +02:00
Sara Zan	4eec2dc45e	Change YAML version exception into a warning (#2385 ) * Change exception into warning, add strict_version param, and remove compatibility between schemas * Simplify update_json_schema * Rename unstable into master * Prevent validate_config from changing the config to validate * Fix version validation and add tests * Rename master into ignore * Complete parameter rename Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 16:08:08 +02:00
Sara Zan	929c685cda	Forbid usage of `args` and `kwargs` in any node's `__init__` (#2362 ) Add failing test * Remove `*kwargs` from docstores' `__init__` functions (#2407) Remove kwargs from ESDocStore subclasses * Remove kwargs from subclasses of SQLDocumentStore * Remove kwargs from Weaviate * Revert change in pinecone * Fix tests * Fix retriever test wirh weaviate * Change Exception into DocumentStoreError * Update Documentation & Code Style * Remove `*kwargs` from `FARMReader` (#2413) Remove FARMReader kwargs without trying to replace them functionally * Update Documentation & Code Style * enforce same index values before and after saving/loading eval dataframes (#2398) * Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350) * Add tests for missing init and super * Update Documentation & Code Style * change in with endswith * Move test in pipeline.py and change test in pipeline_yaml.py * Update Documentation & Code Style * Use caplog to test the warning * Update Documentation & Code Style * move tests into test_pipeline and use get_config * Update Documentation & Code Style * Unmock version name * Improve variadic args test * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-14 16:42:02 +02:00

... 22 23 24 25 26 ...

1524 Commits