haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-25 18:00:28 +00:00

Author	SHA1	Message	Date
Stefano Fiorucci	f2b5f123b3	del HF token in tests (#8634 )	2024-12-13 09:50:23 +01:00
David S. Batista	b5a2fad642	feat: adding Maximum Margin Relevance Ranker (#8554 ) * initial import * linting * adding MRR tests * adding release notes * fixing tests * adding linting ignore to cross-encoder ranker * update docstring * refactoring * making strategy Optional instead of Literal * wip: adding unit tests * refactoring MMR algorithm * refactoring tests * cleaning up and updating tests * adding empty line between license + code * bug in tests * using Enum for strategy and similarity metric * adding more tests * adding empty line between license + code * removing run time params * PR comments * PR comments * fixing * fixing serialisation * fixing serialisation tests * Update haystack/components/rankers/sentence_transformers_diversity.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/rankers/sentence_transformers_diversity.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/rankers/sentence_transformers_diversity.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/rankers/sentence_transformers_diversity.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/rankers/sentence_transformers_diversity.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/rankers/sentence_transformers_diversity.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/rankers/sentence_transformers_diversity.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * fixing tests * PR comments * PR comments * PR comments * PR comments --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-11-22 14:58:45 +00:00
David S. Batista	e5a80722c2	feat: adding metadata grouper component (#8512 ) * initial import * making tests more readable; adding docstring * adding release notes * adding LICENSE header * Update test/components/rankers/test_metadata_grouper.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * refactoring * fixing docstring * fixing types * test docstrings * renaming test * handling too-many-arguments * liting * Update haystack/components/rankers/metadata_grouper.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * changing name * Update haystack/components/rankers/metadata_grouper.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/rankers/metadata_grouper.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * assiging value inside function for re-use * improving docstring * updating name to MetaFieldGroupingRanker * adding to pydocs * fixing imports * adding output docstring * Update haystack/components/rankers/meta_field_grouper_ranker.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update haystack/components/rankers/__init__.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update releasenotes/notes/add-metadata-grouper-21ec05fd4a307425.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/rankers/test_metadata_grouper.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * update docstring tests * fixing imports * rename modules for consistency * fix pydocs * simplification + more tests --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-11-12 16:01:53 +01:00
Sebastian Husch Lee	7227bcf9df	feat: TransformerSimilarityRanker add batching across Documents during inference (#8344 ) * First pass at adding batch support to TransformersSimilarityRanker * Add test * Add reno	2024-09-11 12:47:29 +02:00
Sebastian Husch Lee	c90495c2e8	feat: Add model and tokenizer kwargs to `TransformersSimilarityRanker`, `SentenceTransformersDocumentEmbedder`, `SentenceTransformersTextEmbedder` (#8145 ) * Start adding model and tokenizer kwargs support * Add model and tokenizer kwargs to doc embedder * Some updates and fixes in tests * Fix more tests * Fix tests * Add release note * Fix test * Add from_dict tests	2024-08-02 10:37:10 +02:00
Sebastian Husch Lee	c121c86c4c	fix: Fix from_dict methods of components using HF models to work with default values (#8003 ) * Fix from_dict to work if device isn't provided in init params * Minor refactoring of from_dict for components that load HF models * Add tests * Update tests to test loading with all default parameters * Add more tests * Add release notes * Add unit test for whisper local * Update reno * Add fix for ExtractiveReader * Fix NamedEntityExtractor	2024-07-10 12:18:05 +02:00
Vladimir Blagojevic	535a281eec	feat: Add option to use `HF_TOKEN` as env var for authentication across all HF components (#7942 ) * Read both HF_API_TOKEN and HF_TOKEN env vars in all HF related components * Add reno note * Test fixes * More test updates * More test updates	2024-06-27 10:31:58 +02:00
Rob Pasternak	28dd0f5596	feat: Add options for what to do with missing metadata fields in `MetaFieldRanker` (#7700 ) * Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation. * Implement `missing_meta` functionality in `run()`. * Finish first draft of revised `MetaFieldRanker` functionality. * Add tests for `MetaFieldRanker` `missing_meta` functionality. * Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation. * Implement `missing_meta` functionality in `run()`. * Finish first draft of revised `MetaFieldRanker` functionality. * Add tests for `MetaFieldRanker` `missing_meta` functionality. * Add release notes for new `missing_meta` param of `MetaFieldRanker` * Move part of docs_missing_meta_field warning string outside of `if...elif...else`.	2024-06-12 10:42:02 +02:00
Massimiliano Pippi	0ceeb733ba	chore: make `warm_up()` usage consistent (#7752 ) * make usage consistent * fix error type * release notes * pylint fix * change of plan * revert * fix test * revert * fix HF tests * Apply suggestions from code review Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * fix formatting * reformat * fix regex match with the new error message * fix integration test --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-05-29 10:54:21 +02:00
Sebastian Husch Lee	a2be90b95a	fix: Update device deserialization for components that use local models (#7686 ) * fix: Update device deserializtion for SentenceTransformersTextEmbedder * Add unit test * Fix unit test * Make same change to doc embedder * Add release notes * Add same change to Diversity Ranker and Named Entity Extractor * Add unit test * Add the same for whisper local * Update release notes	2024-05-14 08:36:14 +02:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Silvano Cerza	ff269db12d	Fix unit tests failing if HF_API_TOKEN is set (#7491 )	2024-04-05 18:05:43 +02:00
Mohit Lal	280719339c	bug: run parameter "ranking_mode" does not override init param in meta field ranker (#7375 ) * bug: run parameter ranking_mode does not override init param in metafield ranker * Added a release note * Used pytest.approx for comparing floating point numbers in unit test	2024-03-19 07:53:26 +01:00
Ashwin Mathur	38b3472bb2	feat: Add `SentenceTransformersDiversityRanker` (#7095 ) * Add Diversity Ranker * Update tests * Add separate suffix, prefix params for query and documents; allow empty query * Update docstrings * Make changes based on review * Add additional tests * Add test for warm up * Update release notes --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-03-11 13:14:59 +01:00
Stefano Fiorucci	38a80b0235	fix: `MetaFieldRanker` - use `weight` if passed in the `run` method (#7305 ) * fix: - use if passed in the method * reno	2024-03-05 12:13:56 +01:00
Julian Risch	c1c0cbfde4	docs: Update docs of MetaFieldRanker, TransformersSimilarityRanker (#7301 ) * docs: Update docstrings of MetaFieldRanker and TransformersSimilarityRanker * add warm_up() call to usage example * Apply suggestions from code review Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * show result of usage example --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-03-05 10:20:18 +01:00
Julian Risch	9a0e2e58fd	docs: Added LostInTheMiddleRanker usage example and updated docstrings (#7294 ) * docs: Added LostInTheMiddleRanker usage example * remove to_dict test * explain LITM in more detail	2024-03-04 15:42:51 +01:00
Varun Mathur	b335b5d723	feat: Add Lost In The Middle Ranker (#6995 ) * add lost in the middle ranker * update * add release notes * update release notes * fix mypy * Update * fix mypy * fix mypy [union-attr] for content.split * remove e2e tests and negative topk param * remove query param, validate params --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-02-20 19:55:41 +01:00
Sebastian Husch Lee	ea7275955d	feat: Meta field ranker add `meta_value_type` (#6977 ) * Update MetaFieldRanker to parse string meta values based on meta_value_type * Add some unit tests * Add another unit test * Add release notes * Fix mypy * Fix pylint * Add more unit tests * Update release notes * Update docs * Further improve doc strings	2024-02-13 13:08:35 +01:00
Madeesh Kannan	27d1af3068	feat!: Use `Secret` for passing authentication secrets to components (#6887 ) * feat!: Use `Secret` for passing authentication secrets to components * Add comment to clarify type ignore	2024-02-05 13:17:01 +01:00
Sebastian Husch Lee	ceda4cd655	feat: Add support for `device_map` (#6679 ) * Getting device_map working to support 8bit loading and multi device inference * Update to take account the device specified by the user * add release notes * Add device_map support for ExtractiveReader * Update test * Update to model that doesn't have issues * Update test * Update pytest approx * Update release notes * Start supporting device map * Update ExtractiveReader to use new ComponentDevice * Update similarity ranker to follow extractive reader implementation * Fixing pylint * Make mypy mostly happy * Add new unit test to test device_map * Adding unit tests * Some refactoring * Add more tests * Add more tests * Add another unit test * Update first_device property to return a ComponentDevice to be able to use the to methods * Updating tests for test_device * Update tests and now explicitly modify device_map in model_kwargs * Update haystack/utils/hf.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Make mypy happy * mypy * Remove unneeded optional flag * Update ExtractiveReader with new logic * Update ranker to follow new logic * Removing unneeded code * Make mypy happy * fxi pylint * Fix test * Adding unit tests for device_map="auto" * Add unit tests for ranker * PR comments * Make util method * Adding unit tests * Fix type annotation * Fix pylint * Fix test --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-01-30 13:47:57 +01:00
Sebastian Husch Lee	3bea3b1714	feat: Add query and document prefix options for the TransformerSimilarityRanker (#6826 ) * Add query and doc prefix * Fix some tests * add release notes	2024-01-25 15:29:19 +01:00
Madeesh Kannan	7376838922	feat!: Framework-agnostic device management (#6748 ) * feat: Framework-agnostic device management * Add release note * Linting * Fix test * Add `first_device` property, expand release notes, validate `ComponentDevice` state	2024-01-17 10:41:34 +01:00
Sebastian Husch Lee	20f04f6054	feat: MetaFieldRanker update (#6742 ) * Add weight and ranking_mode as params to run for easier experimentation * renaming of metadata to meta * User logger.warning instead of warnings * Add another unit test * Add support for sort_order and fix formatting of error messages * Make MetaFieldRanker more robust. Doesn't crash pipeline if some Documents are missing keys. * Don't print same warning message twice * Add another test * Making MetaFieldRanker more robust * Move up if return statement to earlier in the function * Setting up infer_type * Remove infer_type for now * Release notes * Add init file * Update releasenotes/notes/metafieldranker_sort-order_refactor-2000d89dc40dc15a.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-01-16 08:52:58 +01:00
ZanSara	0616197b44	feat!: Rename `model_name_or_path` to `model` in `TransformersSimilarityRanker` (#6734 ) * rename model parameter in transformers ranker * fix tests for transformers ranker * reno * reno * typo	2024-01-12 17:09:12 +01:00
Stefano Fiorucci	80c3e6825a	fix: serialize/deserialize torch dtype in the components that need it (#6713 ) * first draft for ranker * same for the reader * consider also bnb_4bit_compute_dtype * dtype serialization in hugging_face_local_generator * add release note * address dtype defined in huggingface_pipeline_kwargs * test quantization options in reader * fix * serialize quantization_config * test quantization_config serialization * address feedback * fix typo	2024-01-12 12:22:45 +01:00
Sebastian Husch Lee	beade1cef9	feat: Add scaling and thresholding of the similarity ranker scores (#6683 ) * Add scale_score functionality to the TransformersSimilarityRanker * Updated test to check scores * Use pytest approx when comparing floats * Updated how scale score works and added calibration factor. Started to add score threshold. * Add support for score_threshold * Add some parameters to the run method * Add release notes * Fix mypy * Be more tolerant on the score values * Adding unit test for scale_score=False * Add unit test for score threshold * Update tests * Rename test * Fix typo * PR comments	2024-01-08 09:05:24 +01:00
Stefano Fiorucci	c773c30c66	refactor!: rename all remaining `metadata` to `meta` (#6650 ) * change metadata to meta * release note	2023-12-28 12:18:15 +01:00
Sebastian Husch Lee	c294b8ac8c	feat: Add auto device checks and `model_kwargs` to `TransformersSimilarityRanker` (#6561 ) * Add device checking and model_kwargs like we do in ExtractiveReader * Add release notes * Make a utility function for the device checking * Better warning message and updated ExtractiveReader to use the util function * Add unit tests for get_device * Fix pylint	2023-12-18 15:13:42 +01:00
Sebastian Husch Lee	3e0e81b1e0	feat: Add `meta_fields_to_embed` to `TransformersSimilarityRanker` (#6564 ) * Add initial implementation following SentenceTransformersDocumentEmbedder * Add test for embedding metadata * Add release notes * Update name * Fix tests and to dict * Fix release notes	2023-12-18 11:28:16 +01:00
bogdankostic	728383a149	fix: Make `TransformersSimilarityRanker` run with single document list (#6503 ) * Make `TransformersSimilarityRanker` run with single document list * Add release note * Remove unused import in test	2023-12-08 16:18:46 +01:00
Massimiliano Pippi	7c05f37a53	remove unit marker (#6450 )	2023-11-29 19:24:25 +01:00
Silvano Cerza	e6637f5ec2	Fix all tests	2023-11-24 14:48:43 +01:00
Massimiliano Pippi	8adb8bbab8	Remove preview folder in test/ --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:52:55 +01:00

34 Commits