haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-30 00:30:09 +00:00

Author	SHA1	Message	Date
Sebastian	756e0114e6	refactor: Remove duplicate code in TableReader (#3708 ) * Refactor table reader to use util functions to reduce code duplication. * Expanding the tests for the table reader * Adding types * Updating tests to work for RCIReader * Fix bug in RCIReader. Saving the wrong queries list. * Update _flatten_inputs to not change input variable * Remove duplicate code	2022-12-21 14:33:19 +01:00
bogdankostic	12c264603e	fix: Fix number of concurrent requests in RequestLimiter (#3705 )	2022-12-21 11:40:33 +01:00
Stefano Fiorucci	82ad408a74	refactor: remove unused code in `TfidfRetriever` (#3733 )	2022-12-20 17:51:46 +01:00
Vladimir Blagojevic	9ebf164cfd	feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate (#3667 ) Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-12-20 11:21:26 +01:00
Stefano Fiorucci	559f6e0569	better compatibility with different versions of sklearn (#3732 )	2022-12-20 09:59:36 +01:00
Zoltan Fedor	e143f7cc36	Fixing broken BM25 support with Weaviate - fixes #3720 (#3723 ) * Fixing broken BM25 support with Weaviate - fixes #3720 Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit. Please see more under issue #3720. * Fixing mypy issue - method signature wasn't matching the base class * Mypy related test fix Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime. I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests. * Adding a note regarding an upcomming fix in Weaviate v1.17.0 * Apply suggestions from code review * revert * [EMPTY] Re-trigger CI	2022-12-19 17:24:46 +01:00
Vladimir Blagojevic	56803e5465	feat: Enable text-embedding-ada-002 for EmbeddingRetriever (#3721 ) * Enable text-embedding-ada-002 for EmbeddingRetriever * Easier to understand code, more unit tests	2022-12-19 17:06:48 +01:00
Massimiliano Pippi	8edfd8978e	Update the proposals process (#3718 ) * update the proposals process * add stalebot to manage proposals lifecycle * typo * Update 0000-template.md * clarify PR labelling staying away from implementation details	2022-12-19 14:35:07 +01:00
Sebastian	d7fabb569b	feat: Use torch.inference_mode() for TableQA (#3731 ) * Update to make inference_mode work in TableQA * Update variable names * Added torch.inference_mode() for the RCIReader model forward passes	2022-12-19 13:07:07 +01:00
Stefano Fiorucci	5b9c661155	feat: add `index` parameter to `TfidfRetriever` (#3666 ) * first draft to add index param to tfidf * better mypy handling * Revert "better mypy handling" This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c. * new check in auto_fit * new check also in retrieve * better dict typings * new test and improvements to other test * remove unnecessary lambda * improve test * remove newline from openapi json * fix test * language fix Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 2 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 3 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 4 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 5 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 6 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * explicit index value handling * fix test * better error messages Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-19 12:07:49 +01:00
Agnieszka Marzec	a1d8557c80	Update the readme action version (#3726 ) * Update the readme action version Updated the rdme action version to the latest one. * Update the version	2022-12-19 10:23:05 +01:00
Zoltan Fedor	3990697869	Fixing the `query_batch` method of the deepsetcloud document store - … (#3724 ) * Fixing the `query_batch` method of the deepsetcloud document store - fixes #3722 * Trigger Build * Trigger Build * Trigger CI Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-12-19 09:57:26 +01:00
Julian Risch	adde194b04	build: upgrade torch and let transformers pick the version (#3727 ) * test torch 1.13.1 release * let transformers handle torch version	2022-12-16 21:33:01 +05:30
Vladimir Blagojevic	42926596e4	Update cohere embedding models (#3704 )	2022-12-16 16:49:59 +01:00
Sebastian	4afdbc33b2	fix: Removed overlooked torch scatter references (#3719 ) * Removed torch scatter references * Add back /	2022-12-16 10:36:19 +01:00
Vladimir Blagojevic	c69222faf4	Add PromptNode proposal (#3665 )	2022-12-16 10:27:58 +01:00
Agnieszka Marzec	a23f425877	Fix lg (#3725 )	2022-12-16 09:43:22 +01:00
Sebastian	54bf7ad343	Remove && \ from end of line (#3710 )	2022-12-13 21:29:18 +05:30
Sebastian	d0f786af9f	feat: Bump transformers version to remove torch scatter dependency (#3703 ) * Bump transformers version so we can remove torch scatter dependency * manual re-merge Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2022-12-13 18:33:07 +05:30
Sara Zan	f24cbdbb5d	remove beir from the base GPU image (#3692 )	2022-12-13 11:11:58 +01:00
Stefano Fiorucci	e1401f79b6	refactor: improve Multilabel design (#3658 ) * first try and new test * fix test * fix unused import * remove comments * no more dataclass * add __eq__ and extend test * better design from review * Update schema.py * fix black * fix openapi * fix openapi 2 * new try to fix openapi * remove newline from openapi json	2022-12-13 10:45:56 +01:00
James Briggs	520b23ec1b	fix: pinecone metadata format (#3660 ) * fix for multilevel metadata dictionaries * add metadata dict formating to update function * typing * added check for labels meta * added more info to input parameters * added test for multilayer metadata * removed todo	2022-12-13 10:11:24 +01:00
github-actions[bot]	5405d9d7f8	Update unstable version and openapi schema (#3700 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2022-12-13 09:59:52 +01:00
Sara Zan	eba518a589	add trailing newlines to make `end-of-file-fixer` happy (#3699 )	2022-12-12 14:42:25 +01:00
tstadel	600dc2d611	refactor: filters type (#3682 ) * consolidate filters type * remove unnecessary optionals * fix mypy * fix pylint * fix pylint * move FilterType to schema * remove Optional from FilterType * move to Dict[str, Any] * Revert "move to Dict[str, Any]" This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5. * fix mypy * fix pylint * revert isort changes in elasticsearch * remove todos in milvus.py * remove todos in sql.py * add aggregate_labels tests * consolidate aggregate_labels tests * remove superfluous type todos * remove ALL superfluous #todos	2022-12-12 14:04:29 +01:00
Sara Zan	8e3c7bc6be	fix: pin `espnet` in the `audio` extra (#3693 ) * downgrade pytorch in the audio extra * pin torch * remove torch pin and pin espnet * add comment	2022-12-12 13:09:26 +01:00
Sara Zan	b1fc912859	refactor: remove `test` extra (#3679 ) * remove test extra, make dev install all * remove all from dev * reduce diff	2022-12-12 11:22:03 +01:00
Sara Zan	642fa3a6b7	fix typing (#3680 )	2022-12-12 11:20:48 +01:00
Vladimir Blagojevic	c28f6688f5	proposal: New EmbeddingRetriever for Haystack 2.0 (#3558 ) * Add EmbeddingRetriever proposal * Update with Sara's feedback * Consistent naming	2022-12-12 10:06:35 +01:00
Unai Garay Maestre	77cea8b140	feat: Adds all_terms_must_match parameter to BM25Retriever at runtime (#3627 ) * Adds all_terms_must_match implementation and tests * Adds all_terms_must_match as Optional Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com> * Avoid mypy error and follow pattern checking var is None * Mypy works ok on this file now * added mypy ignores to BaseRetriever * ignoring all overrides for this file * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * marked elasticsearch Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-08 17:18:43 +05:30
tstadel	c1c1c97bb2	feat: add query_by_embedding_batch (#3546 ) * add query_by_embedding_batch * fix mypy * fix pylint * add test * move query_by_embedding_batch to search_engine * fix and add tests * fix pylint * remove Retriever query logs * add test for multimodal batch retrieval * allow for np.ndarray	2022-12-08 08:28:43 +01:00
Sebastian	25bf95d47f	Update table reader tests to include checking the score of answers. (#3641 )	2022-12-07 07:30:49 -08:00
Stefano Fiorucci	399d8f1668	monkey patch sklearn (#3678 )	2022-12-07 10:31:32 +01:00
Vladimir Blagojevic	18444427da	Use from tqdm.auto import tqdm instead of from tqdm import tqdm (#3672 )	2022-12-06 22:53:41 +01:00
Sara Zan	0c71849e4a	remove beir from all-gpu (#3669 )	2022-12-06 14:56:27 +01:00
Sara Zan	fc89f6ea74	fix: revert Weaviate query with filters and improve tests (#3646 ) * revert weaviate query with filters and improve tests * pylint * upgrade weaviate container * use latest docker tag * fix text * fix text	2022-12-06 14:48:58 +01:00
Vladimir Blagojevic	e4c3817d01	Adjust get_type() method for pipelines (#3657 )	2022-12-02 14:48:47 +01:00
Julian Risch	adb580b6b7	feat: add offsets_in_context to evaluation result (#3640 ) * add offsets_in_context to eval result * extend test case	2022-11-30 11:43:42 +01:00
Massimiliano Pippi	af06519fc4	re-enable hooks (#3629 )	2022-11-29 09:00:45 +01:00
Sebastian	c7c2235874	Move all of the forward pass to under torch.no_grad() (#3636 )	2022-11-29 08:59:49 +01:00
Massimiliano Pippi	b20f808119	refactor: move more tests to the base class (#3637 ) * move more tests to the base class * skip tests where unsupported * do not pass index label explicitly * skip test for Pinecone	2022-11-29 08:43:27 +01:00
Ivan Lopez	839eef6695	fix rest_api paths in docker-compose-gpu.yml (#3532 ) Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-11-29 07:47:14 +01:00
Mayank Jobanputra	95cf666a20	refactor: change MultiModal retriever to be of type DenseRetriever (#3598 ) * changed Multimodal retriever to be of type DenseRetriever * format fix * Pylint fix * Added embed_queries and tests	2022-11-28 19:24:22 +01:00
Massimiliano Pippi	6f9a0f2215	use 9200 as the default port in launch_opensearch (#3630 )	2022-11-28 19:06:45 +05:30
Sara Zan	eb7b9452d0	refactor: Weaviate query with filters (#3628 )	2022-11-28 12:26:33 +01:00
Branden Chan	4a83b2049d	docs: Reformat code blocks in docstrings (#3580 ) * Fix docstrings for DocumentStores * Fix docstrings for AnswerGenerator * Fix docstrings for Connector * Fix docstrings for DocumentClassifier * Fix docstrings for LabelGenerator * Fix docstrings for QueryClassifier * Fix docstrings for Ranker * Fix docstrings for Retriever and Summarizer * Fix docstrings for Translator * Fix docstrings for Pipelines * Fix docstrings for Primitives * Fix Python code block spacing * Add line break before code block * Fix code blocks * fix: discard metadata fields if not set in Weaviate (#3578) * fix weaviate bug in returning embeddings and setting empty meta fields * review comment * Update unstable version and openapi schema (#3584) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * fix: Flatten `DocumentClassifier` output in `SQLDocumentStore`; remove `_sql_session_rollback` hack in tests (#3273) * first draft * fix * fix * move test to test_sql * test: add test to check id_hash_keys is not ignored (#3577) * refactor: Generate JSON schema when missing (#3533) * removed unused script * print info logs when generating openapi schema * create json schema only when needed * fix tests * Remove leftover Co-authored-by: ZanSara <sarazanzo94@gmail.com> * move milvus tests to their own module (#3596) * feat: store metadata using JSON in SQLDocumentStore (#3547) * add warnings * make the field cachable * review comment * Pin faiss-cpu as 1.7.3 seems to have problems (#3603) * Update Haystack imports (#3599) * Update Python version (#3602) * fix: `ParsrConverter` fails on pages without text (#3605) * try to fix bug * remove print * leftover * refactor: update Squad data (#3513) * refractor the to_squad data class * fix the validation label * refractor the to_squad data class * fix the validation label * add the test for the to_label object function * fix the tests for to_label_objects * move all the test related to squad data to one file * remove unused imports * revert tiny_augmented.json Co-authored-by: ZanSara <sarazanzo94@gmail.com> * Url fixes (#3592) * add 2 example scripts * fixing faq script * fixing some urls * removing example scripts * black reformatting * add labeler to the repo (#3609) * convert eval metrics to python float (#3612) * feat: add support for `BM25Retriever` in `InMemoryDocumentStore` (#3561) * very first draft * implement query and query_batch * add more bm25 parameters * add rank_bm25 dependency * fix mypy * remove tokenizer callable parameter * remove unused import * only json serializable attributes * try to fix: pylint too-many-public-methods / R0904 * bm25 attribute always present * convert errors into warnings to make the tutorial 1 work * add docstrings; tests * try to make tests run * better docstrings; revert not running tests * some suggestions from review * rename elasticsearch retriever as bm25 in tests; try to test memory_bm25 * exclude tests with filters * change elasticsearch to bm25 retriever in test_summarizer * add tests * try to improve tests * better type hint * adapt test_table_text_retriever_embedding * handle non-textual docs * query only textual documents * Incorporate Reviewer feedback * refactor: replace `torch.no_grad` with `torch.inference_mode` (where possible) (#3601) * try to replace torch.no_grad * revert erroneous change * revert other module breaking * revert training/base * Fix docstrings for DocumentStores * Fix docstrings for AnswerGenerator * Fix docstrings for Connector * Fix docstrings for DocumentClassifier * Fix docstrings for LabelGenerator * Fix docstrings for QueryClassifier * Fix docstrings for Ranker * Fix docstrings for Retriever and Summarizer * Fix docstrings for Translator * Fix docstrings for Pipelines * Fix docstrings for Primitives * Fix Python code block spacing * Add line break before code block * Fix code blocks * Incorporate Reviewer feedback Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai> Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: Espoir Murhabazi <espoir.mur@gmail.com> Co-authored-by: Tuana Celik <tuana.celik@deepset.ai> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-11-28 09:21:07 +01:00
Massimiliano Pippi	c6890c3e86	chore: remove redundant tests (#3620 ) * remove redundant tests * skip test on win * fix missing import * revert mistake * revert	2022-11-25 20:55:21 +05:30
Tuana Celik	ed7d03665d	fixing the url for document merger (#3615 )	2022-11-25 14:40:55 +01:00
Massimiliano Pippi	ddeaf2c98c	clean up colab dependencies (#3626 )	2022-11-24 18:37:57 +01:00
Tuana Celik	0771cf1cce	Update CONTRIBUTING.md (#3624 )	2022-11-24 13:59:49 +00:00

1 2 3 4 5 ...

1724 Commits