14 Commits

Author SHA1 Message Date
David S. Batista
7d51793727
chore: cleaning up unused imports in tests (#8887) 2025-02-20 16:56:16 +00:00
Hemanth Taduka
b5fb0d3ff8
fix: make pandas DataFrame optional in EvaluationRunResult (#8838)
* feat: AsyncPipeline that can schedule components to run concurrently (#8812)

* add component checks

* pipeline should run deterministically

* add FIFOQueue

* add agent tests

* add order dependent tests

* run new tests

* remove code that is not needed

* test: intermediate from cycle outputs are available outside cycle

* add tests for component checks (Claude)

* adapt tests for component checks (o1 review)

* chore: format

* remove tests that aren't needed anymore

* add _calculate_priority tests

* revert accidental change in pyproject.toml

* test format conversion

* adapt to naming convention

* chore: proper docstrings and type hints for PQ

* format

* add more unit tests

* rm unneeded comments

* test input consumption

* lint

* fix: docstrings

* lint

* format

* format

* fix license header

* fix license header

* add component run tests

* fix: pass correct input format to tracing

* fix types

* format

* format

* types

* add defaults from Socket instead of signature

- otherwise components with dynamic inputs would fail

* fix test names

* still wait for optional inputs on greedy variadic sockets

- mirrors previous behavior

* fix format

* wip: warn for ambiguous running order

* wip: alternative warning

* fix license header

* make code more readable

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Introduce content tracing to a behavioral test

* Fixing linting

* Remove debug print statements

* Fix tracer tests

* remove print

* test: test for component inputs

* test: remove testing for run order

* chore: update component checks from experimental

* chore: update pipeline and base from experimental

* refactor: remove unused method

* refactor: remove unused method

* refactor: outdated comment

* refactor: inputs state is updated as side effect

- to prepare for AsyncPipeline implementation

* format

* test: add file conversion test

* format

* fix: original implementation deepcopies outputs

* lint

* fix: from_dict was updated

* fix: format

* fix: test

* test: add test for thread safety

* remove unused imports

* format

* test: FIFOPriorityQueue

* chore: add release note

* feat: add AsyncPipeline

* chore: Add release notes

* fix: format

* debug: switch run order to debug ubuntu and windows tests

* fix: consider priorities of other components while waiting for DEFER

* refactor: simplify code

* fix: resolve merge conflict with mermaid changes

* fix: format

* fix: remove unused import

* refactor: rename to avoid accidental conflicts

* fix: track pipeline type

* fix: and extend test

* fix: format

* style: sort alphabetically

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update releasenotes/notes/feat-async-pipeline-338856a142e1318c.yaml

* fix: indentation, do not close loop

* fix: use asyncio.run

* fix: format

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>

* feat: AsyncPipeline that can schedule components to run concurrently (#8812)

* add component checks

* pipeline should run deterministically

* add FIFOQueue

* add agent tests

* add order dependent tests

* run new tests

* remove code that is not needed

* test: intermediate from cycle outputs are available outside cycle

* add tests for component checks (Claude)

* adapt tests for component checks (o1 review)

* chore: format

* remove tests that aren't needed anymore

* add _calculate_priority tests

* revert accidental change in pyproject.toml

* test format conversion

* adapt to naming convention

* chore: proper docstrings and type hints for PQ

* format

* add more unit tests

* rm unneeded comments

* test input consumption

* lint

* fix: docstrings

* lint

* format

* format

* fix license header

* fix license header

* add component run tests

* fix: pass correct input format to tracing

* fix types

* format

* format

* types

* add defaults from Socket instead of signature

- otherwise components with dynamic inputs would fail

* fix test names

* still wait for optional inputs on greedy variadic sockets

- mirrors previous behavior

* fix format

* wip: warn for ambiguous running order

* wip: alternative warning

* fix license header

* make code more readable

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Introduce content tracing to a behavioral test

* Fixing linting

* Remove debug print statements

* Fix tracer tests

* remove print

* test: test for component inputs

* test: remove testing for run order

* chore: update component checks from experimental

* chore: update pipeline and base from experimental

* refactor: remove unused method

* refactor: remove unused method

* refactor: outdated comment

* refactor: inputs state is updated as side effect

- to prepare for AsyncPipeline implementation

* format

* test: add file conversion test

* format

* fix: original implementation deepcopies outputs

* lint

* fix: from_dict was updated

* fix: format

* fix: test

* test: add test for thread safety

* remove unused imports

* format

* test: FIFOPriorityQueue

* chore: add release note

* feat: add AsyncPipeline

* chore: Add release notes

* fix: format

* debug: switch run order to debug ubuntu and windows tests

* fix: consider priorities of other components while waiting for DEFER

* refactor: simplify code

* fix: resolve merge conflict with mermaid changes

* fix: format

* fix: remove unused import

* refactor: rename to avoid accidental conflicts

* fix: track pipeline type

* fix: and extend test

* fix: format

* style: sort alphabetically

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update test/core/pipeline/features/conftest.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update releasenotes/notes/feat-async-pipeline-338856a142e1318c.yaml

* fix: indentation, do not close loop

* fix: use asyncio.run

* fix: format

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>

* updated changes for refactoring evaluations without pandas package

* added release notes for eval_run_result.py for refactoring  EvaluationRunResult to work without pandas

* wip: cleaning and refactoring

* removing BaseEvaluationRunResult

* wip: fixing tests

* fixing tests and docstrings

* updating release notes

* fixing typing

* pylint fix

* adding deprecation warning

* fixing tests

* fixin types consistency

* adding stacklevel=2 to warning messages

* fixing docstrings

* fixing docstrings

* updating release notes

---------

Co-authored-by: mathislucka <mathis.lucka@gmail.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-17 14:43:54 +01:00
David S. Batista
0c9dc008f0
fix: improve context relevancy metric (#7964)
* fixing tests

* fixing tests

* updating tests

* updating tests

* updating docstring

* adding release notes

* making the insufficient information more robust

* updating docstring and release notes

* empty list instead of informative string

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* fixing tests

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* reverting commit

* reverting again commit

* fixing docstrings

* removing deprecation warning

* removing warning import

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-22 15:13:46 +02:00
David S. Batista
55513f7521
feat: EvaluationRunResult add parameter to specify columns to keep in the comparative Dataframe (#7879)
* adding param to explictily state which cols to keep

* adding param to explictily state which cols to keep

* adding param to explictily state which cols to keep

* updating tests

* adding release notes

* Update haystack/evaluation/eval_run_result.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update releasenotes/notes/add-keep-columns-to-EvalRunResult-comparative-be3e15ce45de3e0b.yaml

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* updating docstring

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-06-17 18:08:52 +02:00
David S. Batista
ce9b0ecb19
fix: EvaluationRunResult.score_report() is missing the metrics column (#7817)
* fixing the DataFrame with the aggregated scores

* fixing tests
2024-06-06 14:33:45 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Madeesh Kannan
a881451d3a
refactor: Refactor EvaluationResult into BaseEvaluationRunResult and EvaluationRunResult (#7594)
The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.
2024-04-25 12:16:48 +02:00
Silvano Cerza
cf221a9701
Delete old eval API (#6983) 2024-02-14 17:11:08 +01:00
Silvano Cerza
36ab23d360
feat: Add StatisticalEvaluator component (#6982)
* Add StatisticalEvaluator component

* Remove F1 and Exact Metric from old API

* Add release notes

* Update docstrings
2024-02-14 16:48:03 +01:00
Silvano Cerza
9297fca520
feat: Add SASEvaluator component (#6980)
* Add SASEvaluator component

* Add release notes

* Delete old tests

* Remove SAS metric in old API

* Avoid importing whole numpy package
2024-02-14 16:16:22 +01:00
Ashwin Mathur
393a7993c3
feat: Add Semantic Answer Similarity metric (#6877)
* Add SAS metric

* Add release notes

* Round similarity scores for precision consistency

* Add tolerance to tests

* Update haystack/evaluation/eval.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add types for preprocess_text; Add additional types for f1 and em methods

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-02-02 17:07:52 +01:00
Ashwin Mathur
7217f9d9f0
feat: Add F1 metric (#6822)
* Add F1 metric

* Add release notes
2024-01-26 11:04:43 +01:00
Ashwin Mathur
a238c6dd51
feat: Add Exact Match metric (#6696)
* Add exact match metric

* Add release notes

* Cleanup comments in test_eval_exact_match.py

* Create separate preprocessing function; Add output_key parameter

* Update release note

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-01-22 09:57:04 +01:00
Ashwin Mathur
374a937663
feat: Add calculate_metrics and MetricsResult (#6680)
* Add calculate_metrics, MetricsResult, Exact Match

* Add additional tests for metric calculation

* Add release notes

* Add docstring for Exact Match metric

* Remove Exact Match Implementation

* Update release notes

* Remove unnecessary metrics implementation

* Simplify logic to run supported metrics

* Add some evaluation tests

* Fix linting

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-01-10 10:26:44 +01:00