Stefano Fiorucci
5ae94886b2
fix: fix test failures with Transformers models in PRs from forks ( #8809 )
...
* trigger
* try pinning sentence transformers
* make integr tests run right away
* pin transformers instead
* older transformers version
* rm transformers pin
* try ignoring cache
* change ubuntu version
* try removing token
* try again
* more HF_API_TOKEN local deletions
* restore test priority
* rm leftover
* more deletions
* moreee
* more
* deletions
* restore jobs order
2025-02-04 19:08:37 +01:00
Ajit Singh
6cf13e8b98
enhancement: reduced usage of numpy and substituted built-in libraries ( #8418 )
...
* reduced usage of numpy and substituted built-in libraries
* added release note
* edited expit function to support both float as well as list (this case was giving error CI)
* revert code , numpy can't be removed here
* more cleaning
* fix relnote
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-10-18 15:42:19 +02:00
Julian Risch
08686d90af
feat: Add DocumentNDCGEvaluator component ( #8419 )
...
* draft new component and tests
* draft new component and tests
* fix tests, replace usage of get_attr
* improve docstrings, refactor tests
* add test for mixed documents w/wo scores
* add test with multiple lists and update docstring
* validate inputs, add tests, make methods static
* change fallback to binary relevance
* rename validate_init_parameters to validate_inputs
2024-10-01 16:15:02 +02:00
Sriniketh J
066e2e3ec5
Make api_key param optional in LLMEvaluator ( #8340 )
2024-09-20 10:47:13 +02:00
Madeesh Kannan
672bcf7e03
fix: Add constraints to set_input_type(s)
based on run
method ( #8358 )
...
* fix: Prevent the usage of `set_input_type(s)` when the `run` method doesn't have kwargs,
raise if `set_input_type(s)` overrides `run` method parameters
* fix: update components and tests
* reno
2024-09-12 15:58:16 +02:00
David S. Batista
0c9dc008f0
fix: improve context relevancy metric ( #7964 )
...
* fixing tests
* fixing tests
* updating tests
* updating tests
* updating docstring
* adding release notes
* making the insufficient information more robust
* updating docstring and release notes
* empty list instead of informative string
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* fixing tests
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* reverting commit
* reverting again commit
* fixing docstrings
* removing deprecation warning
* removing warning import
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-22 15:13:46 +02:00
Ulises M
6f8834d036
feat: add and expose api_params for OpenAIGenerator in LLMEvaluator based classes ( #7987 )
...
* initial support for api_params
* add tests and reno
* resolve suggestions and add integration test
* fix mypy
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-07-11 13:14:03 +02:00
David S. Batista
186512459d
feat: LLM-based evaluators return meta info from OpenAI ( #7947 )
...
* LLM-Evaluator returns metadata from OpenAI
* adding tests
* adding release notes
* updating test
* updating release notes
* fixing live tests
* attending PR comments
* fixing tests
* Update releasenotes/notes/adding-metadata-info-from-OpenAI-f5309af5f59bb6a7.yaml
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update llm_evaluator.py
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-02 11:31:51 +02:00
Vladimir Blagojevic
535a281eec
feat: Add option to use HF_TOKEN
as env var for authentication across all HF components ( #7942 )
...
* Read both HF_API_TOKEN and HF_TOKEN env vars in all HF related components
* Add reno note
* Test fixes
* More test updates
* More test updates
2024-06-27 10:31:58 +02:00
David S. Batista
8b9eddcd94
fix: explicitly tell ContextRelevanceEvaluator
that each statement should be scored ( #7904 )
...
* initial import
* adding release notes
* adding pytest decorator for live test
* make examples more readable
* updating tests
* reverting progress_bar = False
2024-06-25 16:59:37 +02:00
Amna Mubashar
fc011d7b04
bug: fix MRR and MAP calculations ( #7841 )
...
* bug: fix MRR and MAP calculations
2024-06-25 12:07:11 +02:00
Ulises M
9c45203a76
fix: check for None in SAS eval input ( #7909 )
...
* check for None in SAS input
* Update releasenotes/notes/check-for-None-SAS-eval-0b982ccc1491ee83.yaml
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-06-21 14:22:33 +02:00
Madeesh Kannan
fe60eedee9
fix: Fix deserialization of pipelines that contain LLMEvaluator
subclasses ( #7891 )
2024-06-19 13:47:38 +02:00
Madeesh Kannan
63226dad34
fix: Fix LLMEvaluator
serialization ( #7818 )
...
* fix: Fix `LLMEvaluator` serialization
* `reno`
2024-06-07 12:49:23 +02:00
David S. Batista
38747ff7a3
fix: failsafe for non-valid json and failed LLM calls ( #7723 )
...
* wip
* initial import
* adding tests
* adding params
* adding safeguards for nan in evaluators
* adding docstrings
* fixing tests
* removing unused imports
* adding tests to context and faithfullness evaluators
* fixing docstrings
* nit
* removing unused imports
* adding release notes
* attending PR comments
* fixing tests
* fixing tests
* adding types
* removing unused imports
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Update haystack/components/evaluators/faithfulness.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* attending PR comments
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-05-23 15:41:29 +00:00
David S. Batista
a4fc2b66e6
style: adding progress bar to llm-based evaluators ( #7726 )
...
* adding progress bar
* fixing typo
* fixing tests
* Update test_llm_evaluator.py
* fixing missing colon
* passing directly to parent
* adding docstrings
2024-05-23 09:22:14 +02:00
David S. Batista
798dc4a4a5
fix: avoid FaithfulnessEvaluator and ContextRelevanceEvaluator return Nan
( #7685 )
...
* initial import
* fixing tests
* relaxing condition
* adding safeguard for ContextRelevanceEvaluator as well
* adding release notes
2024-05-14 17:08:51 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules ( #7675 )
...
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Stefano Fiorucci
94467149c1
fix: fix serialization of DocumentRecallEvaluator
( #7662 )
...
* fix serialization of DocumentRecallEvaluator
* add requested tests
2024-05-08 16:00:49 +02:00
Julian Risch
2509eeea7e
refactor: Rename FaithfulnessEvaluator input responses to predicted_answers ( #7621 )
2024-04-30 16:30:57 +02:00
Madeesh Kannan
a881451d3a
refactor: Refactor EvaluationResult
into BaseEvaluationRunResult
and EvaluationRunResult
( #7594 )
...
The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.
2024-04-25 12:16:48 +02:00
Julian Risch
9c56dbe288
test: Make ContextRelevanceEvaluator integration test more robust ( #7584 )
2024-04-23 16:01:25 +00:00
Julian Risch
07307709ee
test: Make FaithfulnessEvaluator integration test more robust ( #7582 )
2024-04-23 15:44:00 +00:00
Julian Risch
d7638cfd4b
refactor: FaithfulnessEvaluator specifies inputs explicitly ( #7548 )
...
* specify inputs explicitly. move out examples
* Update haystack/components/evaluators/faithfulness.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-22 12:52:10 +00:00
Julian Risch
b12e0db134
feat: Add ContextRelevanceEvaluator component ( #7519 )
...
* feat: Add ContextRelevanceEvaluator component
* reno
* fix expected inputs and example docstring
* remove responses parameter from tests
* specify inputs explicitly
* add new evaluator to api reference docs
2024-04-22 14:10:00 +02:00
Massimiliano Pippi
2bad5bcb96
refactor: AnswerExactMatchEvaluator component inputs ( #7536 )
...
* refactor component inputs
* release notes
* Update class docstring
* pylint
* update existing note instead of creating a new one
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-04-12 06:59:16 +00:00
David S. Batista
9a9c8aa1c8
feat: implementing evalualtion results API ( #7520 )
...
* initial import
* adding tests
* attending PR comments
* fixing tests
* updating tests
* updating tests and code
* renaming
* fixing linting issues
* adding release notes
* adding docstrings
* latest fixes
2024-04-10 13:34:03 +00:00
Julian Risch
e974a23fa3
docs: Fix eval metric examples in docstrings ( #7505 )
...
* fix eval metric docstrings, change type of individual scores
* change import order
* change exactmatch docstring to single ground truth answer
* change exactmatch comment to single ground truth answer
* reverted changing docs to single ground truth
* add warm up in SASEvaluator example
* fix FaithfulnessEvaluator docstring example
* extend FaithfulnessEvaluator docstring example
* Update FaithfulnessEvaluator init docstring
* Remove outdated default from LLMEvaluator docstring
* Add examples param to LLMEvaluator docstring example
* Add import and print to LLMEvaluator docstring example
2024-04-10 11:00:20 +02:00
David S. Batista
aae2b31359
fix: typo in sas_evaluator arg ( #7486 )
...
* fixing typo on SAS arg
* fixing tests
* fixing tests
2024-04-08 10:21:37 +02:00
Julian Risch
9d02dc607a
feat: Add FaithfulnessEvaluator component ( #7424 )
...
* draft FaithfulnessEvaluator
* reno
* calculate score per statement and aggregate
* Update release note
* update default values in tests and fix import path
* remove instructions, inputs, outputs params
* remove unused imports
* add expected format example to docstring
* remove name 'llm' from tests and docstring
2024-04-04 16:33:59 +00:00
Julian Risch
8ef6062748
refactor: Remove name 'llm' from LLMEvaluator output ( #7479 )
2024-04-04 15:19:30 +00:00
Silvano Cerza
8b8a93bc0d
refactor: Rename DocumentMeanAveragePrecision
and DocumentMeanReciprocalRank
( #7470 )
...
* Rename DocumentMeanAveragePrecision and DocumentMeanReciprocalRank
* Update releasenotes
* Simplify names
2024-04-04 17:04:59 +02:00
Silvano Cerza
bdc25ca2a0
feat: Add DocumentMeanReciprocalRank
( #7468 )
...
* Add DocumentMeanReciprocalRank
* Fix float precision error
2024-04-04 14:55:37 +02:00
Silvano Cerza
7799909069
feat: Add DocumentMeanAveragePrecision
( #7461 )
...
* Add DocumentMeanAveragePrecision
* Remove questions input
* Update docstrings
* Update haystack/components/evaluators/document_map.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-04 14:15:45 +02:00
Silvano Cerza
dc87f51759
refactor: Remove questions
inputs from evaluators ( #7466 )
...
* Remove questions input from AnswerExactMatchEvaluator
* Remove questions input from DocumentRecallEvaluator
2024-04-04 14:14:18 +02:00
Silvano Cerza
12acb3f12e
feat: Add SASEvaluator
( #7428 )
...
* Add SASEvaluator
* Add release notes
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Simplify similarity calculation with bi-encoders models
* Fix linting
* Update docstrings
* Move tensor to CPU after calculating cosine similarity
* Fix CI failing
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-04 10:10:41 +02:00
Silvano Cerza
685343d13f
feat: Add DocumentRecallEvaluator
( #7399 )
...
* Add DocumentRecallEvaluator
* Fix mypy error
* Simplify recall logic and change output for single hit mode
* Remove unused import
* Add comment for RecallMode fields
* Reword RecallMode comments
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-03-26 16:15:03 +01:00
Silvano Cerza
f398b29e7f
feat: Change outputs of AnswerExactMatchEvaluator ( #7390 )
...
* Change outputs of AnswerExactMatchEvaluator
* Changes scores to return the number of matches per question
* Revert "Changes scores to return the number of matches per question"
This reverts commit e4358720793d4584b0b961402d4557c50c4c2381.
* Change output names
2024-03-26 10:57:59 +01:00
Julian Risch
bfd0d3eacd
feat: Add new LLMEvaluator component ( #7401 )
...
* draft llm evaluator
* docstrings
* flexible inputs; validate inputs and outputs
* add tests
* add release note
* remove example
* docstrings
* make outputs parameter optional. default:
* validate init parameters
* linting
* remove mention of binary scores from template
* make examples and outputs params non-optional
* removed leftover from optional outputs param
* simplify building examples section for template
* validate inputs and outputs in examples are dict with str as key
* fix pylint too-many-boolean-expressions
* increase test coverage
2024-03-25 07:05:27 +01:00
Silvano Cerza
610ad6f6b2
Add AnswerExactMatchEvaluator
( #7381 )
...
* Add AnswerExactMatchEvaluator
* Add release notes
* Fix linting
* Update docstrings
* Update docstrings
* Remove to_dict and from_dict
* Fix linting
2024-03-19 16:58:01 +01:00
Silvano Cerza
0a7dfc1b32
Revert "Add AnswerExactMatchEvaluator
( #7050 )" ( #7075 )
...
This reverts commit b4011af8e9bc4ae2f72e51db254bfda69e20b651.
2024-02-23 14:05:57 +01:00
Silvano Cerza
b4011af8e9
Add AnswerExactMatchEvaluator
( #7050 )
...
* Add AnswerExactMatchEvaluator
* Add release notes
* Fix linting
* Update docstrings
2024-02-23 10:37:18 +01:00
Silvano Cerza
8ca4bf405b
Remove all evaluator components ( #7053 )
2024-02-21 18:24:14 +01:00
Ashwin Mathur
327c2d260d
feat: Add Mean Reciprocal Rank (MRR) metric to StatisticalEvaluator
( #7042 )
...
* Add MRR Metric
* Add release notes
* Update logic
2024-02-20 13:58:48 +01:00
Silvano Cerza
9215882779
Add Recall Multi Hit and Single Hit metric ( #7038 )
2024-02-19 18:00:39 +01:00
Silvano Cerza
6fe1d3b595
refactor: Clean eval components ( #7005 )
...
* Remove preprocess.py
* Rename eval components to evaluators
2024-02-15 17:17:59 +01:00