haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-01 18:29:32 +00:00

Author	SHA1	Message	Date
Sara Zan	101d2bc86c	feat: `MultiModalRetriever` (#2891 ) * Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly * content_types * Splitting classes into respective folders * small changes * Fix EOF * eof * black * API * EOF * whitespace * api * improve multimodal similarity processor * tokenizer -> feature extractor * Making feature vectors come out of the feature extractor in the similarity head * embed_queries is now self-sufficient * couple trivial errors * Implemented separate language model classes for multimodal inference * Document embedding seems to work * removing batch_encode_plus, is deprecated anyway * Realized the base Data2Vec models are not trained on retrieval tasks * Issue with the generated embeddings * Add batching * Try to fit CLIP in * Stub of CLIP integration * Retrieval goes through but returns noise only * Still working on the scores * Introduce temporary adapter for CLIP models * Image retrieval now works with sentence-transformers * Tidying up the code * Refactoring is now functional * Add MPNet to the supported sentence transformers models * Remove unused classes * pylint * docs * docs * Remove the method renaming * mpyp first pass * docs * tutorial * schema * mypy * Move devices setup into get_model * more mypy * mypy * pylint * Move a few params in HaystackModel's init * make feature extractor work with squadprocessor * fix feature_extractor_kwargs forwarding * Forgotten part of the fix * Revert unrelated ES change * Revert unrelated memdocstore changes * comment * Small corrections * mypy and pylint * mypy * typo * mypy * Refactor the call * mypy * Do not make FARMReader use the new FeatureExtractor * mypy * Detach DPR tests from FeatureExtractor too * Detach processor tests too * Add end2end marker * extract end2end feature extractor tests * temporary disable feature extraction tests * Introduce end2end tests for tokenizer tests * pylint * Fix model loading from folder in FeatureExtractor * working o n end2end * end2end keeps failing * Restructuring retriever tests * Restructuring retriever tests * remove covert_dataset_to_dataloader * remove comment * Better check sentence-transformers models * Use embed_meta_fields properly * rename passage into document * Embedding dims can't be found * Add check for models that support it * pylint * Split all retriever tests into suites, running mostly on InMemory only * fix mypy * fix tfidf test * fix weaviate tests * Parallelize on every docstore * Fix schema and specify modality in base retriever suite * tests * Add first image tests * remove comment * Revert to simpler tests * Update docs/_src/api/api/primitives.md Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/__init__.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * get_args * mypy * Update haystack/modeling/model/multimodal/__init__.py * Update haystack/modeling/model/multimodal/base.py * Update haystack/modeling/model/multimodal/base.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/sentence_transformers.py * Update haystack/modeling/model/multimodal/sentence_transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/multimodal/retriever.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * mypy * removing more ContentTypes * more contentypes * pylint * add to __init__ * revert end2end workflow for now * missing integration markers * Update haystack/nodes/retriever/multimodal/embedder.py Co-authored-by: bogdankostic <bogdankostic@web.de> * review feedback, removing HaystackImageTransformerModel * review feedback part 2 * mypy & pylint * mypy * mypy * fix multimodal docs also for Pinecone * add note on internal constants * Fix pinecone write_documents * schemas * keep support for sentence-transformers only * fix pinecone test * schemas * fix pinecone again * temporarily disable some tests, need to understand if they're still relevant Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-10-17 18:58:35 +02:00
Sebastian	ca2a1e1792	fix: Update how schema is ordered (#3399 ) * Use builtin sort_keys option for json.dump * Order anyof_list which is causing issues	2022-10-17 17:09:32 +02:00
Sara Zan	50f34372e1	fix: stable YAML schema generation (#3388 ) * add key sorting in schema generation * add pre-commit hook * try pre-commit hook * Fixed schemas * trying a simpler version * pylint * ordered dict * reverting to dict * unused import * remove hook	2022-10-14 18:36:47 +02:00
Massimiliano Pippi	7d0f89b6f5	fix: demo won't start through Docker compose (#3337 ) * use new Docker images and add a health check for ES * try * silence streamlit errors * remove CMD override * final touches * leftover * make pylint happy	2022-10-14 18:16:20 +02:00
Vladimir Blagojevic	159cd5a666	feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever (#3356 )	2022-10-14 15:01:03 +02:00
Vladimir Blagojevic	5ebe3cb33d	fix: QuestionGenerator generates wrong document questions for non-default `num_queries_per_doc` parameter (#3381 )	2022-10-14 12:08:30 +02:00
Stefano Fiorucci	7290196c32	fix: allow same `vector_id` in different indexes for SQL-based Document stores (#3383 ) * fix_multiple_indexes * improve test names	2022-10-14 09:55:56 +02:00
tstadel	ba30971d8d	feat: extract label aggregation (#3363 ) * extract label aggregation * refactoring * reformat * add missing param docstrings * fix comment	2022-10-13 19:09:14 +02:00
Massimiliano Pippi	3b0f00a615	[CI] Use VERSION.txt to sync with Readme (#3367 ) * use VERSION.txt to sync with Readme * add docs * force workflow run * unrelated change * Revert "force workflow run" This reverts commit f0aea59afa57c96f374073465629f893031f727a. * make the steps mutually exclusive	2022-10-13 18:39:23 +02:00
Branden Chan	37bd61a48e	Create minor_version_release.yml (#3338 ) * Create minor_version_release.yml * Incorporate reviewer feedback	2022-10-13 14:32:31 +02:00
Stefano Fiorucci	60f678e120	refactor: remove dead code from FAISSDocumentStore (#3372 )	2022-10-13 13:23:01 +02:00
Massimiliano Pippi	31fa75e9fd	feat: add support for Elasticsearch 7.16.2 (#3318 ) * bump elastic to 7.16.2+ * decouple Elasticsearch and Opensearch use method override instead of func variables fix mypy default value fix broken tests update schema * relax version pin * rename the base class * rename module * fix import order * do not run the new tests in the old job * remove outdated TODO	2022-10-13 11:53:27 +02:00
Sebastian	75641dd024	fix: Added checks for DataParallel and WrappedDataParallel (#3366 ) * Added checks for DataParallel and WrappedDataParallel * Update isinstance checks according to pylint recommendation * Using isinstance over types * Added test for dpr training	2022-10-13 08:05:56 +02:00
Massimiliano Pippi	db6e5754cd	add deprecation notice to old dockerfiles (#3317 )	2022-10-11 16:10:57 +02:00
hsm207	c2537dfc28	Update weaviate schema doc link (#3351 ) * Update weaviate schema doc * Update haystack/document_stores/weaviate.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-10-11 15:30:20 +02:00
Massimiliano Pippi	8ddb6d7821	feat: add multi-platform Docker images (#3354 ) * add arm platform to the build * add a note about multi-platforms build * test on current branch * setup qemu on Github actions * better naming * Revert "test on current branch" This reverts commit b0e5ea77b46e3e0bafd579c95e434c6a3c8ef84f.	2022-10-11 12:29:33 +02:00
Malte Pietsch	fb02b61e90	Update README.md (#3247 )	2022-10-11 10:43:17 +02:00
tstadel	7fe5003c97	fix: eval() with `add_isolated_node_eval=True` breaks if no node supports it (#3347 ) * fix isolated eval for pipelines without a node supporting isolated mode * reformat * add test	2022-10-10 20:48:13 +02:00
bogdankostic	84aff5e2b3	fix: Allow less restrictive values for parameters in Pipeline configurations (#3345 ) * fix: Allow arbitrary values for parameters in Pipeline configurations * Add test * Adapt expected error message in tests * Fix bug * Fix bug on checking JSON * Remove test cases that previously tested if error was thrown * Change encoding in test * Restrict possible values * Re-add tests * Re-add tests * Add value flag to list elements	2022-10-10 13:08:45 +02:00
JacdDev	797c20c966	feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch (#3301 ) * Adding filters param to MostSimilarDocumentsPipeline run and run_batch * Adding index param to MostSimilarDocumentsPipeline run and run_batch * Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch * Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch * Adding filters param to MostSimilarDocumentsPipeline run and run_batch * Adding index param to MostSimilarDocumentsPipeline run and run_batch * Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch * Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch	2022-10-10 10:22:14 +02:00
tstadel	b84a6b1716	fix: opensearch script score with filters (#3321 ) * fix opensearch script score filters * add comment * add integration test * update schema	2022-10-06 15:41:29 +02:00
Vladimir Blagojevic	6cb4e93965	refactor: remove Inferencer multiprocessing (#3283 )	2022-10-04 14:08:23 +02:00
Massimiliano Pippi	b49bce97aa	remove test step (#3278 )	2022-10-04 11:34:43 +02:00
nickchomey	e6767fccef	bugfix for TranslationWrapperPipeline (#3290 ) * bugfix for TranslationWrapperPipeline * Update standard_pipelines.py * Update haystack/pipelines/standard_pipelines.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-10-04 09:44:48 +02:00
Jeff Risberg	ad8fbe56ee	bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node (#3170 ) * don't send the list of inputs back as an output in the running of a node. * updated documentation * Update pydoc-markdown.py * added test case for pipeline join fix Co-authored-by: JeffRisberg <jrisberg@aol.com>	2022-09-30 13:27:17 +02:00
Stefano Fiorucci	e2e6887ee8	Improve TransformersDocumentClassifier tests (#3270 )	2022-09-27 13:25:34 +02:00
Taner Topal	24d4591307	docs: Fix a docstring in ray.py	2022-09-27 09:05:04 +02:00
Vladimir Blagojevic	9582a423a2	fix: ONNX FARMReader model conversion is broken (#3211 )	2022-09-26 09:18:12 -04:00
Stefano Fiorucci	b579b9d54a	bug: make `ElasticSearchDocumentStore` use `batch_size` in `get_documents_by_id` (#3166 ) * use batch_size * try to fix git mess * improve docstrings * fix	2022-09-26 13:21:59 +02:00
Vladimir Blagojevic	9ca3ccae98	fix:MostSimilarDocumentsPipeline doesn't have pipeline property (#3265 ) * Add comments and a unit test * More unit tests for MostSimilarDocumentsPipeline	2022-09-23 09:46:48 -04:00
Vladimir Blagojevic	eba7cf51b1	chore: Remove Update API documentation hook (#3271 ) * Remove Update API documentation hook * Remove .github/utils/pydoc-markdown.py file	2022-09-23 08:54:08 -04:00
tstadel	05a86b9d3d	feat: FAISS in OpenSearch: Support HNSW for cosine (#3217 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * split PR * update docs * manually revert tutorial doc change * Fix embedding type * set integration marker correctly * make BaseDocumentStore.normalize_embedding static * format * fix handling of opensearch_faiss param * fix merge * add DenseRetriever typing * organize imports in conftest.py * organize imports in conftest.py (2) * fix DenseRetriever import * add opensearch-tests-linux	2022-09-23 13:26:49 +02:00
tstadel	4fa9d2d8e7	Fix milvus and faiss tests not running (#3263 ) * fix milvus and faiss tests not running * fix schema manually * fix test_dpr_embedding test for milvus * pip freeze on milvus tests * fix milvus1 tests being executed: fix all_doc_stores order * Revert "pip freeze on milvus tests" This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab. * make infer_required_doc_store more robust * don't skip tests without docstore requirements * use markers for docstore tests	2022-09-22 17:46:49 +02:00
Massimiliano Pippi	2b803a265b	run checks on release branches (#3267 )	2022-09-22 16:25:34 +02:00
Vladimir Blagojevic	820742cac7	Fix schema for 1.10.x (#3269 )	2022-09-22 15:20:51 +02:00
tstadel	b10e2c392e	chore: add `DenseRetriever` abstraction (#3252 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * update docs * fix imports * import DenseRetriever normally * update docs * fix deepcopy of documents * update schema * Revert "update schema" This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f. * fix schema for ci manually	2022-09-21 19:08:54 +02:00
Branden Chan	492a8046d8	docs: sync Haystack API with Readme (#3223 ) * First pass at syncing Haystack API with Readme * Reapply changes * Regularize slugs * Regularize slugs * Regularize slugs * Set category id and regen * Trigger workflow * Delete old md files * Test sync * Undo test string * Incorporate reviewer feedback * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Change name of pydoc-markdown scripts * Test on the fly API generation and sync * Remove version tag * Test version tag * Test version tag * Test version tag * Revert test docstring * Revert md file changes * Revert md file changes * Revert script naming * Test on the fly generation and sync * Adjust for on the fly generation and sync * Revert test string * Remove old documentation workflow * Set workflow to work on main * Change readme version name	2022-09-21 17:18:34 +02:00
Massimiliano Pippi	8f76d64f6f	chore: bump release number for unstable version (#3251 ) * bump version for unstable * allow generation of rc schemas * update schemas	2022-09-21 16:58:06 +02:00
Vladimir Blagojevic	938e6fda5b	Classify pipeline's type based on its components (#3132 ) * Add pipeline get_type mehod * Add pipeline uptime * Add pipeline telemetry event sending * Send pipeline telemetry once a day (at most) * Add pipeline invocation counter, change invocation counter logic * Update allowed telemetry parameters - allow pipeline parameters * PR review: add unit test	2022-09-21 14:53:42 +02:00
Stefano Fiorucci	89247b804c	refactor: make `TransformersDocumentClassifier` output consistent between different types of classification (#3224 ) * make output consistent * make output consistent * added tests for details * better tests * Update test_document_classifier.py * make black happy * Update test_document_classifier.py * Update test_document_classifier.py	2022-09-21 13:16:03 +02:00
Massimiliano Pippi	15bb6c2ea2	remove tutorials from the repo (#3244 )	2022-09-20 18:32:45 +02:00
Tuana Celik	336c144e72	chore: updating colab links in older docs versions (#3250 ) * updating colab links to tutorial 1 * remaining tutorials Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-09-20 18:15:29 +02:00
Vladimir Blagojevic	fe31896fcb	Proper retrieval of answers for batch eval (#3245 ) * Proper retrieval of answers and documents for batch eval	2022-09-20 08:16:03 -04:00
Malte Pietsch	7e79a48540	bug: reactivate benchmarks with quick fixes (#2766 ) * quick fix benchmark runs to make them work with current haystack version * fix minor typo * update readme. fix minor things to make benchmarks run again * Update Documentation & Code Style * fix typo in readme * update result files for reader and retriever querying * reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs) * change default memory allocation back to normal. add note to readme * add first indexing results * add memory to docker cmd * full benchmarks results on commit c5a2651fcbbeffca06ffa9036b10e62669bcc1b0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-09-20 10:22:08 +02:00
Massimiliano Pippi	9399ddf949	fix pydoc-markdown hook (#3238 )	2022-09-19 18:20:35 +02:00
Sara Zan	dcb132ba59	chore: remove f-strings from logs for performance reasons (#3212 ) * Use the %s syntax on all debug messages * Use the %s syntax on some more debug messages * Use the %s syntax on info messages * Use the %s syntax on warning messages * Use the %s syntax on error and exception messages * mypy * pylint * trogger tutorials execution in CI * trigger tutorials execution on CI * black * remove embeddings from repr * fix Document `__repr__` * address feedback * mypy	2022-09-19 18:18:32 +02:00
Massimiliano Pippi	8fbccbda82	fix: handle Documents containing dataframes in Multilabel constructor (#3237 ) * format * fix docs	2022-09-19 14:59:20 +02:00
banjocustard	19af6f4e40	bug: fix pdftotext installation verification (#3233 )	2022-09-19 11:32:58 +02:00
Massimiliano Pippi	859c303c16	include fontconfig in the final image and fix tagging (#3230 )	2022-09-16 15:33:24 +02:00
Malte Pietsch	3134b0d679	fix: type of `temperature` param and adjust defaults for `OpenAIAnswerGenerator` (#3073 ) * fix: type of temperature param and adjust defaults * update schema * update api docs	2022-09-16 14:11:33 +02:00

1 2 3 4 5 ...

1576 Commits