haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-12 10:37:58 +00:00

Author	SHA1	Message	Date
Unai Garay Maestre	3a2c8ae3c5	bug: Adds better way of checking `query` in BaseRetriever and Pipeline.run() (#3304 ) * changes how query and queries are checked if they have been passed in BaseRetriever * Fixes checking query properly in Pipeline run * Fixes checking query properly in Pipeline run * Adds test for FilterRetriever using run method when query is empty * Adds mock filter retriever and adapts test * Removes old test, adds MockRetriever to test file and test uses document_store * Logs error when query is not of type string with a new test for run batch * Update test/nodes/test_retriever.py * schemas	2022-10-17 19:00:13 +02:00
Sara Zan	101d2bc86c	feat: `MultiModalRetriever` (#2891 ) * Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly * content_types * Splitting classes into respective folders * small changes * Fix EOF * eof * black * API * EOF * whitespace * api * improve multimodal similarity processor * tokenizer -> feature extractor * Making feature vectors come out of the feature extractor in the similarity head * embed_queries is now self-sufficient * couple trivial errors * Implemented separate language model classes for multimodal inference * Document embedding seems to work * removing batch_encode_plus, is deprecated anyway * Realized the base Data2Vec models are not trained on retrieval tasks * Issue with the generated embeddings * Add batching * Try to fit CLIP in * Stub of CLIP integration * Retrieval goes through but returns noise only * Still working on the scores * Introduce temporary adapter for CLIP models * Image retrieval now works with sentence-transformers * Tidying up the code * Refactoring is now functional * Add MPNet to the supported sentence transformers models * Remove unused classes * pylint * docs * docs * Remove the method renaming * mpyp first pass * docs * tutorial * schema * mypy * Move devices setup into get_model * more mypy * mypy * pylint * Move a few params in HaystackModel's init * make feature extractor work with squadprocessor * fix feature_extractor_kwargs forwarding * Forgotten part of the fix * Revert unrelated ES change * Revert unrelated memdocstore changes * comment * Small corrections * mypy and pylint * mypy * typo * mypy * Refactor the call * mypy * Do not make FARMReader use the new FeatureExtractor * mypy * Detach DPR tests from FeatureExtractor too * Detach processor tests too * Add end2end marker * extract end2end feature extractor tests * temporary disable feature extraction tests * Introduce end2end tests for tokenizer tests * pylint * Fix model loading from folder in FeatureExtractor * working o n end2end * end2end keeps failing * Restructuring retriever tests * Restructuring retriever tests * remove covert_dataset_to_dataloader * remove comment * Better check sentence-transformers models * Use embed_meta_fields properly * rename passage into document * Embedding dims can't be found * Add check for models that support it * pylint * Split all retriever tests into suites, running mostly on InMemory only * fix mypy * fix tfidf test * fix weaviate tests * Parallelize on every docstore * Fix schema and specify modality in base retriever suite * tests * Add first image tests * remove comment * Revert to simpler tests * Update docs/_src/api/api/primitives.md Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/__init__.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * get_args * mypy * Update haystack/modeling/model/multimodal/__init__.py * Update haystack/modeling/model/multimodal/base.py * Update haystack/modeling/model/multimodal/base.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/sentence_transformers.py * Update haystack/modeling/model/multimodal/sentence_transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/multimodal/retriever.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * mypy * removing more ContentTypes * more contentypes * pylint * add to __init__ * revert end2end workflow for now * missing integration markers * Update haystack/nodes/retriever/multimodal/embedder.py Co-authored-by: bogdankostic <bogdankostic@web.de> * review feedback, removing HaystackImageTransformerModel * review feedback part 2 * mypy & pylint * mypy * mypy * fix multimodal docs also for Pinecone * add note on internal constants * Fix pinecone write_documents * schemas * keep support for sentence-transformers only * fix pinecone test * schemas * fix pinecone again * temporarily disable some tests, need to understand if they're still relevant Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-10-17 18:58:35 +02:00
Vladimir Blagojevic	159cd5a666	feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever (#3356 )	2022-10-14 15:01:03 +02:00
Vladimir Blagojevic	5ebe3cb33d	fix: QuestionGenerator generates wrong document questions for non-default `num_queries_per_doc` parameter (#3381 )	2022-10-14 12:08:30 +02:00
Stefano Fiorucci	7290196c32	fix: allow same `vector_id` in different indexes for SQL-based Document stores (#3383 ) * fix_multiple_indexes * improve test names	2022-10-14 09:55:56 +02:00
Massimiliano Pippi	31fa75e9fd	feat: add support for Elasticsearch 7.16.2 (#3318 ) * bump elastic to 7.16.2+ * decouple Elasticsearch and Opensearch use method override instead of func variables fix mypy default value fix broken tests update schema * relax version pin * rename the base class * rename module * fix import order * do not run the new tests in the old job * remove outdated TODO	2022-10-13 11:53:27 +02:00
Sebastian	75641dd024	fix: Added checks for DataParallel and WrappedDataParallel (#3366 ) * Added checks for DataParallel and WrappedDataParallel * Update isinstance checks according to pylint recommendation * Using isinstance over types * Added test for dpr training	2022-10-13 08:05:56 +02:00
Malte Pietsch	fb02b61e90	Update README.md (#3247 )	2022-10-11 10:43:17 +02:00
tstadel	7fe5003c97	fix: eval() with `add_isolated_node_eval=True` breaks if no node supports it (#3347 ) * fix isolated eval for pipelines without a node supporting isolated mode * reformat * add test	2022-10-10 20:48:13 +02:00
bogdankostic	84aff5e2b3	fix: Allow less restrictive values for parameters in Pipeline configurations (#3345 ) * fix: Allow arbitrary values for parameters in Pipeline configurations * Add test * Adapt expected error message in tests * Fix bug * Fix bug on checking JSON * Remove test cases that previously tested if error was thrown * Change encoding in test * Restrict possible values * Re-add tests * Re-add tests * Add value flag to list elements	2022-10-10 13:08:45 +02:00
JacdDev	797c20c966	feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch (#3301 ) * Adding filters param to MostSimilarDocumentsPipeline run and run_batch * Adding index param to MostSimilarDocumentsPipeline run and run_batch * Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch * Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch * Adding filters param to MostSimilarDocumentsPipeline run and run_batch * Adding index param to MostSimilarDocumentsPipeline run and run_batch * Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch * Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch	2022-10-10 10:22:14 +02:00
tstadel	b84a6b1716	fix: opensearch script score with filters (#3321 ) * fix opensearch script score filters * add comment * add integration test * update schema	2022-10-06 15:41:29 +02:00
Vladimir Blagojevic	6cb4e93965	refactor: remove Inferencer multiprocessing (#3283 )	2022-10-04 14:08:23 +02:00
Jeff Risberg	ad8fbe56ee	bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node (#3170 ) * don't send the list of inputs back as an output in the running of a node. * updated documentation * Update pydoc-markdown.py * added test case for pipeline join fix Co-authored-by: JeffRisberg <jrisberg@aol.com>	2022-09-30 13:27:17 +02:00
Stefano Fiorucci	e2e6887ee8	Improve TransformersDocumentClassifier tests (#3270 )	2022-09-27 13:25:34 +02:00
Vladimir Blagojevic	9582a423a2	fix: ONNX FARMReader model conversion is broken (#3211 )	2022-09-26 09:18:12 -04:00
Vladimir Blagojevic	9ca3ccae98	fix:MostSimilarDocumentsPipeline doesn't have pipeline property (#3265 ) * Add comments and a unit test * More unit tests for MostSimilarDocumentsPipeline	2022-09-23 09:46:48 -04:00
tstadel	05a86b9d3d	feat: FAISS in OpenSearch: Support HNSW for cosine (#3217 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * split PR * update docs * manually revert tutorial doc change * Fix embedding type * set integration marker correctly * make BaseDocumentStore.normalize_embedding static * format * fix handling of opensearch_faiss param * fix merge * add DenseRetriever typing * organize imports in conftest.py * organize imports in conftest.py (2) * fix DenseRetriever import * add opensearch-tests-linux	2022-09-23 13:26:49 +02:00
tstadel	4fa9d2d8e7	Fix milvus and faiss tests not running (#3263 ) * fix milvus and faiss tests not running * fix schema manually * fix test_dpr_embedding test for milvus * pip freeze on milvus tests * fix milvus1 tests being executed: fix all_doc_stores order * Revert "pip freeze on milvus tests" This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab. * make infer_required_doc_store more robust * don't skip tests without docstore requirements * use markers for docstore tests	2022-09-22 17:46:49 +02:00
tstadel	b10e2c392e	chore: add `DenseRetriever` abstraction (#3252 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * update docs * fix imports * import DenseRetriever normally * update docs * fix deepcopy of documents * update schema * Revert "update schema" This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f. * fix schema for ci manually	2022-09-21 19:08:54 +02:00
Vladimir Blagojevic	938e6fda5b	Classify pipeline's type based on its components (#3132 ) * Add pipeline get_type mehod * Add pipeline uptime * Add pipeline telemetry event sending * Send pipeline telemetry once a day (at most) * Add pipeline invocation counter, change invocation counter logic * Update allowed telemetry parameters - allow pipeline parameters * PR review: add unit test	2022-09-21 14:53:42 +02:00
Stefano Fiorucci	89247b804c	refactor: make `TransformersDocumentClassifier` output consistent between different types of classification (#3224 ) * make output consistent * make output consistent * added tests for details * better tests * Update test_document_classifier.py * make black happy * Update test_document_classifier.py * Update test_document_classifier.py	2022-09-21 13:16:03 +02:00
Malte Pietsch	7e79a48540	bug: reactivate benchmarks with quick fixes (#2766 ) * quick fix benchmark runs to make them work with current haystack version * fix minor typo * update readme. fix minor things to make benchmarks run again * Update Documentation & Code Style * fix typo in readme * update result files for reader and retriever querying * reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs) * change default memory allocation back to normal. add note to readme * add first indexing results * add memory to docker cmd * full benchmarks results on commit c5a2651fcbbeffca06ffa9036b10e62669bcc1b0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-09-20 10:22:08 +02:00
Sara Zan	dcb132ba59	chore: remove f-strings from logs for performance reasons (#3212 ) * Use the %s syntax on all debug messages * Use the %s syntax on some more debug messages * Use the %s syntax on info messages * Use the %s syntax on warning messages * Use the %s syntax on error and exception messages * mypy * pylint * trogger tutorials execution in CI * trigger tutorials execution on CI * black * remove embeddings from repr * fix Document `__repr__` * address feedback * mypy	2022-09-19 18:18:32 +02:00
Massimiliano Pippi	8fbccbda82	fix: handle Documents containing dataframes in Multilabel constructor (#3237 ) * format * fix docs	2022-09-19 14:59:20 +02:00
Daniel Bichuetti	df1f4205b6	feat: add public layout-base extraction support on PDFToTextConverter (#3137 ) * feat(PDFToTextConverter): add option to get text in physical layout order * test: add physical layout extraction test to PDFToTextConverter * refactor: change layout parameter attribution places * docs: manually trigger pre-commits * docs: generate new docs to comply with pydoc-markdown style	2022-09-13 16:55:21 +02:00
Kristof Herrmann	da1cc577ae	feat: exponential backoff with exp decreasing batch size for opensearch client (#3194 ) * Validate custom_mapping properly as an object * Remove related test * black * feat: exponential backoff with exp dec batch size * added docstring and split doc lsit * fix * fix mypy * fix * catch generic exception * added test * mypy ignore * fixed no attribute * added test * added tests * revert strange merge conflicts * revert merge conflict again * Update haystack/document_stores/elasticsearch.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * done * adjust test * remove not required caplog * fixed comments Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-09-13 14:30:30 +01:00
Sara Zan	96bb9b5905	bug: validate `custom_mapping` as an object (#3189 ) * Validate custom_mapping properly as an object * Remove related test * black	2022-09-09 18:03:29 +02:00
Daniel Bichuetti	621e1af74c	refactor: improve support for dataclasses (#3142 ) * refactor: improve support for dataclasses * refactor: refactor class init * refactor: remove unused import * refactor: testing 3.7 diffs * refactor: checking meta where is Optional * refactor: reverting some changes on 3.7 * refactor: remove unused imports * build: manual pre-commit run * doc: run doc pre-commit manually * refactor: post initialization hack for 3.7-3.10 compat. TODO: investigate another method to improve 3.7 compatibility. * doc: force pre-commit * refactor: refactored for both Python 3.7 and 3.9 * docs: manually run pre-commit hooks * docs: run api docs manually * docs: fix wrong comment * refactor: change no type-checked test code * docs: update primitives * docs: api documentation * docs: api documentation * refactor: minor test refactoring * refactor: remova unused enumeration on test * refactor: remove unneeded dir in gitignore * refactor: exclude all private fields and change meta def * refactor: add pydantic comment * refactor : fix for mypy on Python 3.7 * refactor: revert custom init * docs: update docs to new pydoc-markdown style * Update test/nodes/test_generator.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-09-09 11:31:37 +02:00
Daniel Bichuetti	e1f399284f	refactor: update dependencies and remove pins (#3147 ) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version	2022-09-05 14:30:35 +02:00
Vladimir Blagojevic	66f3f42a46	fix: Replace multiprocessing tokenization with batched fast tokenization (#3089 ) * Replace multiprocessing tokenization with batched fast tokenization * Replace deprecated tokenization method invocations	2022-08-31 07:33:39 -04:00
Sara Zan	e88f1e2577	Add custom_mapping to the list of fields that can contain string-encoded JSON (#3065 )	2022-08-29 11:10:24 +02:00
Julian Risch	3e3ff33cdd	feat: add batch evaluation method for pipelines (#2942 ) * add basic pipeline.eval_batch for qa without filters * black formatting * pydoc-markdown * remove batch eval tests failing due to bugs * remove comment * explain commented out tests * avoid code duplication * black * mypy * pydoc markdown * add batch option to execute_eval_run * pydoc markdown * Apply documentation suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply documentation suggestion from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * add documentation based on review comments * black * black * schema updates * remove duplicate tests * add separate method for column reordering * merge _build_eval_dataframe methods * pylint ignore in function * change type annotation of queries to list only * one-liner addressing review comment on params dict * markdown files updated Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-25 17:50:57 +02:00
bogdankostic	e2ec0d1c15	feat: FAISS in OpenSearch: check existing index (#3101 ) * Add check for mapping for existing indices * Add test * Check if "method" field exists	2022-08-25 17:33:26 +02:00
Sara Zan	e92ea4fccb	refactor: rename `master` into `main` in documentation and links (#3063 ) * master->main * revert master rename * Revert change to sphinx link and rename master schema	2022-08-24 19:05:12 +02:00
tstadel	92046ce5b5	feat: FAISS in OpenSearch: Support HNSW for dot product and l2 (#3029 ) * support faiss hnsw * blacken * update docs * improve similarity check * add tests * update schema * set ef_search param correctly * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * regenerate docs Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 16:43:48 +02:00
James Briggs	9b1b03002f	update to PineconeDocumentStore to remove dependency on SQL db (#2749 ) * update to PineconeDocumentStore to remove dependency on SQL db * Update Documentation & Code Style * typing fixes * Update Documentation & Code Style * fixed embedding generator to yield Documents * Update Documentation & Code Style * fixes for final typing issues * fixes for pylint * Update Documentation & Code Style * uncomment pinecone tests * added new params to docstrings * Update Documentation & Code Style * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * changes based on comments, updated errors and install * Update Documentation & Code Style * mypy * implement simple filtering in pinecone mock * typo * typo in reverse * account for missing meta key in filtering * typo * added metadata filtering to describe index * added handling for users switching indexes in same doc store, and handling duplicate docs in write * syntax tweaks * added index option to document/embedding count calls * labels implementation in progress * added metadata fields to be indexed for pinecone tests * further changes to mock * WIP implementation of labels+multilabels * switched to rely on labels namespace rather than filter * simpler delete_labels * label fixes, remove debug code * Apply dostring fixes Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * pylint * docs * temporarily un-mock Pinecone * Small Pinecone test suite * pylint * Add fake test key to pass the None check * Add again fake test key to pass the None check * Add Pinecone to default docstores and fix filters * Fix field name * Change field name * Change field value * Remove comments * forgot to upgrade pyproject.toml Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sarazanzo94@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 13:27:15 +02:00
Stefano Fiorucci	891707ecaa	bug: handle `Optional` params in schema validation (#2980 ) * not working draft * first draft * fix * revert json schema * better schema * improvements, support different python versions * little simplification * improvements and more tests * Revert "Merge branch 'handle_optional_params' into origin/main" This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34. * fix git mess * handle optional params; schema * test null values Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-24 10:40:19 +02:00
bogdankostic	b03de53716	Use `random_sample` instead of `ndarray` for random array (#3083 )	2022-08-22 13:19:45 +02:00
Massimiliano Pippi	97a8d30512	feat: Allow exact list matching with field in Elasticsearch filtering (#2988 ) * ES filtering - allow exact list matching with field typing fix Update Documentation & Code Style remove default hit limit in filtering queries Update Documentation & Code Style pytest es list eq filter Update Documentation & Code Style * review feedback * fixed test Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>	2022-08-22 12:42:37 +02:00
Daniel Bichuetti	d5e36ce6b4	fix(translator): write translated text to output documents, while keeping input untouched (#3077 ) * Set translated text on a copy of original document * Return new translated list * Manually generated docs TODO: check pre-commit * Hook generated file * Rename variables for better maintenance * fix(translator): prevent inputs from being changed * fix: manual update translator docs * style(translator): explicit type declaration on List * docs(translator): re-run pre-commit hook * style(translator): ignore mypy wrong type check * docs(translator): re-run pre-commit hook	2022-08-22 04:07:05 -04:00
James Briggs	82c9cff3d9	test: update filtering of Pinecone mock to imitate doc store (#3020 ) * updated filtering of doc store to imitate pinecone * Update test/mocks/pinecone.py	2022-08-18 09:57:08 +02:00
Igor Tarlinskiy	5b06658670	Forbid the key `id` from `Document`s to be written in `WeaviateDocumentStore` (#2846 ) * Raise error upon duplicate document key found within meta info * value error msg fix * Update Documentation & Code Style * Raise exception instead of asserting * Update Documentation & Code Style * add test	2022-08-12 17:50:54 +02:00
Dmitry Goryunov	da7836a931	feat: Support embedding dimensions on DeepsetCloudDocumentStore (#2995 ) * Add embedding_dim to dc store * Remove similarity from query params, it is not used * Remove unused `return_embedding` parameter * Remove unused param * Update the documentation * Update schemas * Revert openapi changes * Revert openapi changes * Fix openapi * Fix json schema * Improve docstrings Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Improve logs Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update the docs * Fix similarity Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-12 11:46:52 +02:00
tstadel	c0fbe45c02	feat: Add `delete_all_files()` to `FileClient` (#3025 ) * add delete_all_files() * rename `file` to `files` * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * streamline "If set to None" and "to the API call" Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-12 11:20:30 +02:00
tstadel	668fd548a6	Fix `embeddings_field_supports_similarity` of `OpenSearchDocumentStore` when creating index (#3030 ) * fix embeddings_field_supports_similarity when creating index * fix test	2022-08-12 11:19:59 +02:00
James Briggs	26c938a8e6	test: add meta fields for meta_config to be used during testing (#3021 ) * added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore * Add documentation on metadata filtering in docstring * docs Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-12 10:27:56 +02:00
Zoltan Fedor	408d8e6ff5	Enable the `JoinDocuments` node to work with documents with `score=None` (#2984 ) * Enable the `JoinDocuments` node to work with documents with `score=None` This fixes #2983 As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate. THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`. There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436 This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried. The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default. * Fixing test bug * Addressing PR review comments - Extending unit tests - Simplifying logic * Making the sorting work even with no scores By making the no score being sorted as -Inf * Forgot to commit the change in `join_docs.py` * [EMPTY] Re-trigger CI * Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None` * Adjusting the arguments of `any()` * [EMPTY] Re-trigger CI	2022-08-11 10:43:25 +02:00
Zoltan Fedor	aafa017c17	Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run` (#2981 ) * Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run` This is to fix #2968 * Bug: variable `i` was already in use * Removing unused imports * Removing unused import * [EMPTY] Re-trigger CI * Addressing concerns raised pre-review - Removing the attempt to try to make it without the need for `JoinDocuments` - it is okey to fail without `JoinDocuments` for certain pipelines. * Refactoring based on reviews	2022-08-11 09:50:14 +02:00
Zoltan Fedor	f4128d3581	Adding support for additional distance/similarity metrics for Weaviate (#3001 ) * Adding support for additional distance metrics for Weaviate Fixes #3000 * Updating the docs * Fixing error texts * Fixing issues raised by the review * Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity` * [EMPTY] Re-trigger CI * Fixing things based on review * [EMPTY] Re-trigger CI	2022-08-11 09:48:21 +02:00

... 18 19 20 21 22 ...

1414 Commits