34 Commits

Author SHA1 Message Date
Stefano Fiorucci
f2b5f123b3
del HF token in tests (#8634) 2024-12-13 09:50:23 +01:00
David S. Batista
b5a2fad642
feat: adding Maximum Margin Relevance Ranker (#8554)
* initial import

* linting

* adding MRR tests

* adding release notes

* fixing tests

* adding linting ignore to cross-encoder ranker

* update docstring

* refactoring

* making strategy Optional instead of Literal

* wip: adding unit tests

* refactoring MMR algorithm

* refactoring tests

* cleaning up and updating tests

* adding empty line between license + code

* bug in tests

* using Enum for strategy and similarity metric

* adding more tests

* adding empty line between license + code

* removing run time params

* PR comments

* PR comments

* fixing

* fixing serialisation

* fixing serialisation tests

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fixing tests

* PR comments

* PR comments

* PR comments

* PR comments

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-22 14:58:45 +00:00
David S. Batista
e5a80722c2
feat: adding metadata grouper component (#8512)
* initial import

* making tests more readable; adding docstring

* adding release notes

* adding LICENSE header

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* refactoring

* fixing docstring

* fixing types

* test docstrings

* renaming test

* handling too-many-arguments

* liting

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* changing name

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* assiging value inside function for re-use

* improving docstring

* updating name to MetaFieldGroupingRanker

* adding to pydocs

* fixing imports

* adding output docstring

* Update haystack/components/rankers/meta_field_grouper_ranker.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/rankers/__init__.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/add-metadata-grouper-21ec05fd4a307425.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* update docstring tests

* fixing imports

* rename modules for consistency

* fix pydocs

* simplification + more tests

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-12 16:01:53 +01:00
Sebastian Husch Lee
7227bcf9df
feat: TransformerSimilarityRanker add batching across Documents during inference (#8344)
* First pass at adding batch support to TransformersSimilarityRanker

* Add test

* Add reno
2024-09-11 12:47:29 +02:00
Sebastian Husch Lee
c90495c2e8
feat: Add model and tokenizer kwargs to TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder (#8145)
* Start adding model and tokenizer kwargs support

* Add model and tokenizer kwargs to doc embedder

* Some updates and fixes in tests

* Fix more tests

* Fix tests

* Add release note

* Fix test

* Add from_dict tests
2024-08-02 10:37:10 +02:00
Sebastian Husch Lee
c121c86c4c
fix: Fix from_dict methods of components using HF models to work with default values (#8003)
* Fix from_dict to work if device isn't provided in init params

* Minor refactoring of from_dict for components that load HF models

* Add tests

* Update tests to test loading with all default parameters

* Add more tests

* Add release notes

* Add unit test for whisper local

* Update reno

* Add fix for ExtractiveReader

* Fix NamedEntityExtractor
2024-07-10 12:18:05 +02:00
Vladimir Blagojevic
535a281eec
feat: Add option to use HF_TOKEN as env var for authentication across all HF components (#7942)
* Read both HF_API_TOKEN and HF_TOKEN env vars in all HF related components

* Add reno note

* Test fixes

* More test updates

* More test updates
2024-06-27 10:31:58 +02:00
Rob Pasternak
28dd0f5596
feat: Add options for what to do with missing metadata fields in MetaFieldRanker (#7700)
* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.

* Implement `missing_meta` functionality in `run()`.

* Finish first draft of revised `MetaFieldRanker` functionality.

* Add tests for `MetaFieldRanker` `missing_meta` functionality.

* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.

* Implement `missing_meta` functionality in `run()`.

* Finish first draft of revised `MetaFieldRanker` functionality.

* Add tests for `MetaFieldRanker` `missing_meta` functionality.

* Add release notes for new `missing_meta` param of `MetaFieldRanker`

* Move part of docs_missing_meta_field warning string outside of `if...elif...else`.
2024-06-12 10:42:02 +02:00
Massimiliano Pippi
0ceeb733ba
chore: make warm_up() usage consistent (#7752)
* make  usage consistent

* fix error type

* release notes

* pylint fix

* change of plan

* revert

* fix test

* revert

* fix HF tests

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* fix formatting

* reformat

* fix regex match with the new error message

* fix integration test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-05-29 10:54:21 +02:00
Sebastian Husch Lee
a2be90b95a
fix: Update device deserialization for components that use local models (#7686)
* fix: Update device deserializtion for SentenceTransformersTextEmbedder

* Add unit test

* Fix unit test

* Make same change to doc embedder

* Add release notes

* Add same change to Diversity Ranker and Named Entity Extractor

* Add unit test

* Add the same for whisper local

* Update release notes
2024-05-14 08:36:14 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Silvano Cerza
ff269db12d
Fix unit tests failing if HF_API_TOKEN is set (#7491) 2024-04-05 18:05:43 +02:00
Mohit Lal
280719339c
bug: run parameter "ranking_mode" does not override init param in meta field ranker (#7375)
* bug: run parameter ranking_mode does not override init param in metafield ranker

* Added a release note

* Used pytest.approx for comparing floating point numbers in unit test
2024-03-19 07:53:26 +01:00
Ashwin Mathur
38b3472bb2
feat: Add SentenceTransformersDiversityRanker (#7095)
* Add Diversity Ranker

* Update tests

* Add separate suffix, prefix params for query and documents; allow empty query

* Update docstrings

* Make changes based on review

* Add additional tests

* Add test for warm up

* Update release notes

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-03-11 13:14:59 +01:00
Stefano Fiorucci
38a80b0235
fix: MetaFieldRanker - use weight if passed in the run method (#7305)
* fix:  - use  if passed in the  method

* reno
2024-03-05 12:13:56 +01:00
Julian Risch
c1c0cbfde4
docs: Update docs of MetaFieldRanker, TransformersSimilarityRanker (#7301)
* docs: Update docstrings of MetaFieldRanker and TransformersSimilarityRanker

* add warm_up() call to usage example

* Apply suggestions from code review

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* show result of usage example

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-03-05 10:20:18 +01:00
Julian Risch
9a0e2e58fd
docs: Added LostInTheMiddleRanker usage example and updated docstrings (#7294)
* docs: Added LostInTheMiddleRanker usage example

* remove to_dict test

* explain LITM in more detail
2024-03-04 15:42:51 +01:00
Varun Mathur
b335b5d723
feat: Add Lost In The Middle Ranker (#6995)
* add lost in the middle ranker

* update

* add release notes

* update release notes

* fix mypy

* Update

* fix mypy

* fix mypy [union-attr] for content.split

* remove e2e tests and negative topk param

* remove query param, validate params

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-02-20 19:55:41 +01:00
Sebastian Husch Lee
ea7275955d
feat: Meta field ranker add meta_value_type (#6977)
* Update MetaFieldRanker to parse string meta values based on meta_value_type

* Add some unit tests

* Add another unit test

* Add release notes

* Fix mypy

* Fix pylint

* Add more unit tests

* Update release notes

* Update docs

* Further improve doc strings
2024-02-13 13:08:35 +01:00
Madeesh Kannan
27d1af3068
feat!: Use Secret for passing authentication secrets to components (#6887)
* feat!: Use `Secret` for passing authentication secrets to components

* Add comment to clarify type ignore
2024-02-05 13:17:01 +01:00
Sebastian Husch Lee
ceda4cd655
feat: Add support for device_map (#6679)
* Getting device_map working to support 8bit loading and multi device inference

* Update to take account the device specified by the user

* add release notes

* Add device_map support for ExtractiveReader

* Update test

* Update to model that doesn't have issues

* Update test

* Update pytest approx

* Update release notes

* Start supporting device map

* Update ExtractiveReader to use new ComponentDevice

* Update similarity ranker to follow extractive reader implementation

* Fixing pylint

* Make mypy mostly happy

* Add new unit test to test device_map

* Adding unit tests

* Some refactoring

* Add more tests

* Add more tests

* Add another unit test

* Update first_device property to return a ComponentDevice to be able to use the to methods

* Updating tests for test_device

* Update tests and now explicitly modify device_map in model_kwargs

* Update haystack/utils/hf.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Make mypy happy

* mypy

* Remove unneeded optional flag

* Update ExtractiveReader with new logic

* Update ranker to follow new logic

* Removing unneeded code

* Make mypy happy

* fxi pylint

* Fix test

* Adding unit tests for device_map="auto"

* Add unit tests for ranker

* PR comments

* Make util method

* Adding unit tests

* Fix type annotation

* Fix pylint

* Fix test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-01-30 13:47:57 +01:00
Sebastian Husch Lee
3bea3b1714
feat: Add query and document prefix options for the TransformerSimilarityRanker (#6826)
* Add query and doc prefix

* Fix some tests

* add release notes
2024-01-25 15:29:19 +01:00
Madeesh Kannan
7376838922
feat!: Framework-agnostic device management (#6748)
* feat: Framework-agnostic device management

* Add release note

* Linting

* Fix test

* Add `first_device` property, expand release notes, validate `ComponentDevice` state
2024-01-17 10:41:34 +01:00
Sebastian Husch Lee
20f04f6054
feat: MetaFieldRanker update (#6742)
* Add weight and ranking_mode as params to run for easier experimentation

* renaming of metadata to meta

* User logger.warning instead of warnings

* Add another unit test

* Add support for sort_order and fix formatting of error messages

* Make MetaFieldRanker more robust. Doesn't crash pipeline if some Documents are missing keys.

* Don't print same warning message twice

* Add another test

* Making MetaFieldRanker more robust

* Move up if return statement to earlier in the function

* Setting up infer_type

* Remove infer_type for now

* Release notes

* Add init file

* Update releasenotes/notes/metafieldranker_sort-order_refactor-2000d89dc40dc15a.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-01-16 08:52:58 +01:00
ZanSara
0616197b44
feat!: Rename model_name_or_path to model in TransformersSimilarityRanker (#6734)
* rename model parameter in transformers ranker

* fix tests for transformers ranker

* reno

* reno

* typo
2024-01-12 17:09:12 +01:00
Stefano Fiorucci
80c3e6825a
fix: serialize/deserialize torch dtype in the components that need it (#6713)
* first draft for ranker

* same for the reader

* consider also bnb_4bit_compute_dtype

* dtype serialization in hugging_face_local_generator

* add release note

* address dtype defined in huggingface_pipeline_kwargs

* test quantization options in reader

* fix

* serialize quantization_config

* test quantization_config serialization

* address feedback

* fix typo
2024-01-12 12:22:45 +01:00
Sebastian Husch Lee
beade1cef9
feat: Add scaling and thresholding of the similarity ranker scores (#6683)
* Add scale_score functionality to the TransformersSimilarityRanker

* Updated test to check scores

* Use pytest approx when comparing floats

* Updated how scale score works and added calibration factor. Started to add score threshold.

* Add support for score_threshold

* Add some parameters to the run method

* Add release notes

* Fix mypy

* Be more tolerant on the score values

* Adding unit test for scale_score=False

* Add unit test for score threshold

* Update tests

* Rename test

* Fix typo

* PR comments
2024-01-08 09:05:24 +01:00
Stefano Fiorucci
c773c30c66
refactor!: rename all remaining metadata to meta (#6650)
* change metadata to meta

* release note
2023-12-28 12:18:15 +01:00
Sebastian Husch Lee
c294b8ac8c
feat: Add auto device checks and model_kwargs to TransformersSimilarityRanker (#6561)
* Add device checking and model_kwargs like we do in ExtractiveReader

* Add release notes

* Make a utility function for the device checking

* Better warning message and updated ExtractiveReader to use the util function

* Add unit tests for get_device

* Fix pylint
2023-12-18 15:13:42 +01:00
Sebastian Husch Lee
3e0e81b1e0
feat: Add meta_fields_to_embed to TransformersSimilarityRanker (#6564)
* Add initial implementation following SentenceTransformersDocumentEmbedder

* Add test for embedding metadata

* Add release notes

* Update name

* Fix tests and to dict

* Fix release notes
2023-12-18 11:28:16 +01:00
bogdankostic
728383a149
fix: Make TransformersSimilarityRanker run with single document list (#6503)
* Make `TransformersSimilarityRanker` run with single document list

* Add release note

* Remove unused import in test
2023-12-08 16:18:46 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker (#6450) 2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00