93 Commits

Author SHA1 Message Date
Sebastian Husch Lee
b4fd38dcbe
remove unneeded test (#10221) 2025-12-11 11:11:38 +01:00
Abdelrahman Kaseb
b9a34dfebf
Fix: prevent in-place mutation of documents in Document Classifiers and Extractors (#9703)
* modify Documents Classifiers and Extractors to not make in-place changes

* Add e2e test for NER

* Add unit test for NER

* fixes + refinements

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-12 15:20:44 +02:00
Abdelrahman Kaseb
5f3c37d287
chore: adopt PEP 585 type hints (#9678)
* chore(lint): enforce and apply PEP 585 type hinting

* Run fmt fixes

* Fix all typing imports using some regex

* Fix all typing written in string in tests

* undo changes in the e2e tests

* make e2e test use list instead of List

* type fixes

* remove type:ignore

* pylint

* Remove typing from Usage example comments

* Remove typing from most of comments

* try to fix e2e tests on comm PRs

* fix

* Add tests typing.List in to adjust test compatiplity
- test/components/agents/test_state_class.py
- test/components/converters/test_output_adapter.py
- test/components/joiners/test_list_joiner.py

* simplify pyproject

* improve relnote

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-07 10:23:14 +02:00
Stefano Fiorucci
d059cf2c23
feat: add skip_empty_documents init parameter to DocumentSplitter (#9649)
* feat: add skip_empty_documents init parameter to DocumentSplitter

* improve test

* fix + relnote
2025-07-24 11:26:11 +02:00
Stefano Fiorucci
bcaef53cbc
test: export HF_TOKEN env var in e2e environment (#9551)
* try to fix e2e tests for private NER models

* explanatory comment

* extend skipif condition
2025-06-25 15:00:28 +02:00
Stefano Fiorucci
de5c7ea3d2
feat: add py.typed; adjust Component protocol (#9329)
* experimenting with py.typed

* try changing run method in protocol

* Trigger Build

* better docstring + release note

* remove type:ignore where possible

* Removed a few more type: ignores

---------

Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-05-07 09:34:31 +02:00
David S. Batista
03505678e2
removing unused imports (#9172) 2025-04-04 11:16:44 +02:00
Stefano Fiorucci
019c238dd0
test: stop drawing pipelines in e2e tests (#9164) 2025-04-04 10:50:05 +02:00
David S. Batista
ed931b4c2b
fix: adding pylint disable for EvalRunResult end2endtest (#9054) 2025-03-18 11:20:11 +01:00
David S. Batista
de76d20f12
fix: updating end2end evaluation tests (#9053)
* updating tests

* fixing tests, default now is JSON object and no longer dataframe

* cleaning up leftovers
2025-03-18 10:52:05 +01:00
Michele Pangrazzi
c192488bf6
Named entity extractor private models (#8658)
* add 'token' support to NamedEntityExtractor to enable using private models on HF backend

* fix existing error message format

* add release note

* add HF_API_TOKEN to e2e workflow

* add informative comment

* Updated to_dict / from_dict to handle 'token' correctly ; Added tests

* Fix lint

* Revert unwanted change
2024-12-20 11:15:55 +01:00
David S. Batista
db89b9a2e5
fix: removing unused import (#8636) 2024-12-13 12:35:58 +01:00
David S. Batista
176db5dbf9
initial import (#8635) 2024-12-13 12:12:40 +01:00
David S. Batista
97126eb544
fix: changing default model to gpt-4o-mini on OpenAI API calls (#8360)
* chaning default model to gpt-4o-mini

* adding release notes

* fixing some missed tests

* fixing some more missed tests

* fixing one last missed test

* fixing linting issues

* making pylint happy about an end2end test

* chaning if test to walruss operator

* fixing azure embedder from ada to text-embedding-ada-002
2024-09-17 10:36:42 +02:00
David S. Batista
276ff3c104
test evaluation pipeline failing (#7823) 2024-06-07 11:26:18 +02:00
Silvano Cerza
26b263e349
Fix InMemoryDocumentStore not sharing some document stats with other instances (#7792) 2024-06-04 10:15:50 +02:00
Julian Risch
6723dc3801
check for RuntimeError instead of ComponentError in test (#7769) 2024-05-31 08:42:40 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Julian Risch
48c7c6ad26
test: Rename responses and use preds instead of ground truth answers in e2e eval test (#7640)
* rename responses, use preds instead of ground truth answers

* fix typo in component name
2024-05-03 12:48:42 +02:00
David S. Batista
8d04e530da
test: end2end evaluation tests (#7601)
* initial import

* wip

* cleaning up tests

* fixing tests

* adding context relevance

* reverting some wrong changes to due PyCharm error in refactoring

* building eval pipeline only once

* handling mypy issues
2024-04-26 14:07:05 +00:00
Silvano Cerza
d66b5358a1
Remove eval end to end tests (#7093) 2024-02-26 12:27:15 +01:00
Vladimir Blagojevic
d2497d54e8
Update to use the default Secret.from_env_var(OPENAI_API_KEY) approach (#6941) 2024-02-09 14:15:45 +01:00
Ashwin Mathur
393a7993c3
feat: Add Semantic Answer Similarity metric (#6877)
* Add SAS metric

* Add release notes

* Round similarity scores for precision consistency

* Add tolerance to tests

* Update haystack/evaluation/eval.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add types for preprocess_text; Add additional types for f1 and em methods

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-02-02 17:07:52 +01:00
Ashwin Mathur
7217f9d9f0
feat: Add F1 metric (#6822)
* Add F1 metric

* Add release notes
2024-01-26 11:04:43 +01:00
Ashwin Mathur
a238c6dd51
feat: Add Exact Match metric (#6696)
* Add exact match metric

* Add release notes

* Cleanup comments in test_eval_exact_match.py

* Create separate preprocessing function; Add output_key parameter

* Update release note

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-01-22 09:57:04 +01:00
Madeesh Kannan
6a1514550e
test: Update E2E tests to use Pipeline.dump/load (#6756) 2024-01-17 15:09:27 +01:00
Madeesh Kannan
7376838922
feat!: Framework-agnostic device management (#6748)
* feat: Framework-agnostic device management

* Add release note

* Linting

* Fix test

* Add `first_device` property, expand release notes, validate `ComponentDevice` state
2024-01-17 10:41:34 +01:00
Madeesh Kannan
d6cafeaff3
test: Rename RAG E2E test file (#6750)
Prior to this change, this broke `pytest` workflows in VSCode due to identical test names in this file and the integration/unit test file.
2024-01-16 13:40:22 +01:00
ZanSara
96c0b59aaa
feat!: Rename model_name_or_path to model in ExtractiveReader (#6736)
* rename model parameter and internam model attribute in ExtractiveReader

* fix tests for ExtractiveReader

* fix e2e

* reno

* another fix

* review feedback

* Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml
2024-01-15 14:48:33 +01:00
ZanSara
b236ea49e3
fix: hybrid pipeline e2e test (#6740)
* fix hybrid pipeline e2e test

* warmup

* write to the right docstore
2024-01-15 14:20:02 +01:00
ZanSara
288ed150c9
feat!: Rename model_name or model_name_or_path to model in all Embedder classes (#6733)
* rename model parameter in the openai doc embedder

* fix tests for openai doc embedder

* rename model parameter in the openai text embedder

* fix tests for openai text embedder

* rename model parameter in the st doc embedder

* fix tests for st doc embedder

* rename model parameter in the st backend

* fix tests for st backend

* rename model parameter in the st text embedder

* fix tests for st text embedder

* fix docstring

* fix pipeline utils

* fix e2e

* reno

* fix the indexing pipeline _create_embedder function

* fix e2e eval rag pipeline

* pytest
2024-01-12 15:30:17 +01:00
ZanSara
3156343dce
fix leftover model_name_or_path param (#6737) 2024-01-12 15:03:06 +01:00
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores namespace (#6714)
* remove symbols under the haystack.document_stores namespace

* Update haystack/document_stores/types/protocol.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* fix

* same for retrievers

* leftovers

* more leftovers

* add relnote

* leftovers

* one more

* fix examples

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00
Ashwin Mathur
374a937663
feat: Add calculate_metrics and MetricsResult (#6680)
* Add calculate_metrics, MetricsResult, Exact Match

* Add additional tests for metric calculation

* Add release notes

* Add docstring for Exact Match metric

* Remove Exact Match Implementation

* Update release notes

* Remove unnecessary metrics implementation

* Simplify logic to run supported metrics

* Add some evaluation tests

* Fix linting

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-01-10 10:26:44 +01:00
Madeesh Kannan
e6d6ce1c73
feat: Add NamedEntityExtractorcomponent (#6689)
* feat: Add `NamedEntityExtractor`component

This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key.

The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy.

* Address comments

* Expand release note

* Add the `[torch]` extra package specifier to the lazy import

* Remove dead code

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-01-09 17:56:20 +01:00
Massimiliano Pippi
93b2aaee09
chore: move DocumentJoiner to new joiners package (#6692)
* move DocumentJoiner to new joiners package

* relnote

* leftovers

* fix docstrings generation

* fix unrelated pydoc misconfiguration

* more unrelated work, yay!

* fix assertions
2024-01-08 22:06:27 +01:00
Vladimir Blagojevic
506ab81d26
chore: Rename GPT generators, deprecate old names (#6626) 2023-12-22 19:37:29 +01:00
Julian Risch
d90f95be2e
test: Check only top answer in extractive QA e2e test (#6614) 2023-12-22 11:11:24 +01:00
Stefano Fiorucci
7cc6080dfa
chore: replace metadata w meta in tests/examples (#6612)
* replace metadata w meta in tests/examples

* do not touch already broken e2e tests

* Revert "do not touch already broken e2e tests"

This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.
2023-12-21 14:09:31 +01:00
Ashwin Mathur
46b395eec3
feat: Add Eval and EvaluationResult (#6505)
* Add initial implementation for Eval and EvaluationResult

* Add release notes

* Update files with suggestions from review

* Remove serialization

* Add eval e2e tests

* Update eval e2e tests
2023-12-18 11:29:09 +01:00
Silvano Cerza
18dbce25fc
refacotr: Refactor answer dataclasses (#6523)
* Refactor answer dataclasses

* Add release notes

* Fix tests

* Fix end to end tests

* Enhance ExtractiveReader
2023-12-11 18:50:49 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
Silvano Cerza
fd16ec63cb
refactor: Add support for new filters declaration (#6397)
* Rework filter logic for InMemoryDocumentStore to support new filters
declaration

* Fix legacy filters tests

* Simplify logic and handle dates comparison

* Rework MetadataRouter to support new filters

* Update docstrings

* Add release notes

* Fix linting

* Avoid duplicating filters specifications

* Handle corner case

* Simplify docstring

* Fix filters logic and tests

* Fix Document Store testing legacy filters tests
2023-11-24 11:22:46 +01:00
Julian Risch
67780a62d5
test: Add end-to-end test for dense doc search 2.0 (#6102)
* draft e2e test for dense doc search

* fix import path

* add DocumentJoiner

* update converter import; fix getting filled doc store

* add text embedder

* add sample txt and pdf for preview e2e tests

* run the query pipeline before serializing

* define samples path

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 16:59:02 +01:00
Vladimir Blagojevic
cfff0d5212
Rename file_converters to converters (#6390) 2023-11-23 10:28:40 +01:00
Julian Risch
4ef2a680bb
feat: Add DocumentJoiner component 2.0 (#6105)
* draft DocumentJoiner

* implement merge and rrf

* draft end-to-end test with DocumentJoiner in hybrid doc search pipeline

* adjust for variadics Canals PR #122

* fix text_embedder input

* adapt to the new Document class

* adapt to new doc id

* specify documents input as Variadic in run method

* compare doc ids instead of full docs

* rename text_file_converter input to sources

* update docstring

* Update haystack/preview/components/routers/document_joiner.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from docstring review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* capitalize Documents and Retrievers in docstrings

* fix log message in test

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-20 10:56:56 +01:00
ZanSara
dfc1d452bb
feat: upgrade canals to 0.10.1 (#6309)
* upgrade canals

* reno

* trigger preview e2e

* bump canals

* fix decorator

* fix test

* test factory

* tests inmemory

* tests writer

* test audio

* tests builders

* tests caching

* tests embedders

* tests converters

* tests generators

* tests rankers

* tests retrievers

* fix pipeline and telemetry tests

* remove trigger
2023-11-17 14:46:23 +01:00
Julian Risch
1c85e44156
test: Add langdetect installation to e2e tests (#6327)
* Add langdetect installation to e2e tests

* compare doc content and id only
2023-11-17 10:12:05 +01:00
Julian Risch
8b092a90c0
test: Add MetadataRouter to preprocessing pipeline in e2e test (#6321)
* add MetadataRouter to preprocessing pipeline

* replace mimetype check with language check
2023-11-16 11:22:37 +01:00