Sebastian Husch Lee
b4fd38dcbe
remove unneeded test ( #10221 )
2025-12-11 11:11:38 +01:00
Abdelrahman Kaseb
b9a34dfebf
Fix: prevent in-place mutation of documents in Document Classifiers and Extractors ( #9703 )
...
* modify Documents Classifiers and Extractors to not make in-place changes
* Add e2e test for NER
* Add unit test for NER
* fixes + refinements
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-12 15:20:44 +02:00
Abdelrahman Kaseb
5f3c37d287
chore: adopt PEP 585 type hints ( #9678 )
...
* chore(lint): enforce and apply PEP 585 type hinting
* Run fmt fixes
* Fix all typing imports using some regex
* Fix all typing written in string in tests
* undo changes in the e2e tests
* make e2e test use list instead of List
* type fixes
* remove type:ignore
* pylint
* Remove typing from Usage example comments
* Remove typing from most of comments
* try to fix e2e tests on comm PRs
* fix
* Add tests typing.List in to adjust test compatiplity
- test/components/agents/test_state_class.py
- test/components/converters/test_output_adapter.py
- test/components/joiners/test_list_joiner.py
* simplify pyproject
* improve relnote
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-07 10:23:14 +02:00
Stefano Fiorucci
d059cf2c23
feat: add skip_empty_documents init parameter to DocumentSplitter ( #9649 )
...
* feat: add skip_empty_documents init parameter to DocumentSplitter
* improve test
* fix + relnote
2025-07-24 11:26:11 +02:00
Stefano Fiorucci
bcaef53cbc
test: export HF_TOKEN env var in e2e environment ( #9551 )
...
* try to fix e2e tests for private NER models
* explanatory comment
* extend skipif condition
2025-06-25 15:00:28 +02:00
Stefano Fiorucci
de5c7ea3d2
feat: add py.typed; adjust Component protocol ( #9329 )
...
* experimenting with py.typed
* try changing run method in protocol
* Trigger Build
* better docstring + release note
* remove type:ignore where possible
* Removed a few more type: ignores
---------
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-05-07 09:34:31 +02:00
David S. Batista
03505678e2
removing unused imports ( #9172 )
2025-04-04 11:16:44 +02:00
Stefano Fiorucci
019c238dd0
test: stop drawing pipelines in e2e tests ( #9164 )
2025-04-04 10:50:05 +02:00
David S. Batista
ed931b4c2b
fix: adding pylint disable for EvalRunResult end2endtest ( #9054 )
2025-03-18 11:20:11 +01:00
David S. Batista
de76d20f12
fix: updating end2end evaluation tests ( #9053 )
...
* updating tests
* fixing tests, default now is JSON object and no longer dataframe
* cleaning up leftovers
2025-03-18 10:52:05 +01:00
Michele Pangrazzi
c192488bf6
Named entity extractor private models ( #8658 )
...
* add 'token' support to NamedEntityExtractor to enable using private models on HF backend
* fix existing error message format
* add release note
* add HF_API_TOKEN to e2e workflow
* add informative comment
* Updated to_dict / from_dict to handle 'token' correctly ; Added tests
* Fix lint
* Revert unwanted change
2024-12-20 11:15:55 +01:00
David S. Batista
db89b9a2e5
fix: removing unused import ( #8636 )
2024-12-13 12:35:58 +01:00
David S. Batista
176db5dbf9
initial import ( #8635 )
2024-12-13 12:12:40 +01:00
David S. Batista
97126eb544
fix: changing default model to gpt-4o-mini on OpenAI API calls ( #8360 )
...
* chaning default model to gpt-4o-mini
* adding release notes
* fixing some missed tests
* fixing some more missed tests
* fixing one last missed test
* fixing linting issues
* making pylint happy about an end2end test
* chaning if test to walruss operator
* fixing azure embedder from ada to text-embedding-ada-002
2024-09-17 10:36:42 +02:00
David S. Batista
276ff3c104
test evaluation pipeline failing ( #7823 )
2024-06-07 11:26:18 +02:00
Silvano Cerza
26b263e349
Fix InMemoryDocumentStore not sharing some document stats with other instances ( #7792 )
2024-06-04 10:15:50 +02:00
Julian Risch
6723dc3801
check for RuntimeError instead of ComponentError in test ( #7769 )
2024-05-31 08:42:40 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules ( #7675 )
...
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Julian Risch
48c7c6ad26
test: Rename responses and use preds instead of ground truth answers in e2e eval test ( #7640 )
...
* rename responses, use preds instead of ground truth answers
* fix typo in component name
2024-05-03 12:48:42 +02:00
David S. Batista
8d04e530da
test: end2end evaluation tests ( #7601 )
...
* initial import
* wip
* cleaning up tests
* fixing tests
* adding context relevance
* reverting some wrong changes to due PyCharm error in refactoring
* building eval pipeline only once
* handling mypy issues
2024-04-26 14:07:05 +00:00
Silvano Cerza
d66b5358a1
Remove eval end to end tests ( #7093 )
2024-02-26 12:27:15 +01:00
Vladimir Blagojevic
d2497d54e8
Update to use the default Secret.from_env_var(OPENAI_API_KEY) approach ( #6941 )
2024-02-09 14:15:45 +01:00
Ashwin Mathur
393a7993c3
feat: Add Semantic Answer Similarity metric ( #6877 )
...
* Add SAS metric
* Add release notes
* Round similarity scores for precision consistency
* Add tolerance to tests
* Update haystack/evaluation/eval.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Add types for preprocess_text; Add additional types for f1 and em methods
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-02-02 17:07:52 +01:00
Ashwin Mathur
7217f9d9f0
feat: Add F1 metric ( #6822 )
...
* Add F1 metric
* Add release notes
2024-01-26 11:04:43 +01:00
Ashwin Mathur
a238c6dd51
feat: Add Exact Match metric ( #6696 )
...
* Add exact match metric
* Add release notes
* Cleanup comments in test_eval_exact_match.py
* Create separate preprocessing function; Add output_key parameter
* Update release note
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-01-22 09:57:04 +01:00
Madeesh Kannan
6a1514550e
test: Update E2E tests to use Pipeline.dump/load ( #6756 )
2024-01-17 15:09:27 +01:00
Madeesh Kannan
7376838922
feat!: Framework-agnostic device management ( #6748 )
...
* feat: Framework-agnostic device management
* Add release note
* Linting
* Fix test
* Add `first_device` property, expand release notes, validate `ComponentDevice` state
2024-01-17 10:41:34 +01:00
Madeesh Kannan
d6cafeaff3
test: Rename RAG E2E test file ( #6750 )
...
Prior to this change, this broke `pytest` workflows in VSCode due to identical test names in this file and the integration/unit test file.
2024-01-16 13:40:22 +01:00
ZanSara
96c0b59aaa
feat!: Rename model_name_or_path to model in ExtractiveReader ( #6736 )
...
* rename model parameter and internam model attribute in ExtractiveReader
* fix tests for ExtractiveReader
* fix e2e
* reno
* another fix
* review feedback
* Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml
2024-01-15 14:48:33 +01:00
ZanSara
b236ea49e3
fix: hybrid pipeline e2e test ( #6740 )
...
* fix hybrid pipeline e2e test
* warmup
* write to the right docstore
2024-01-15 14:20:02 +01:00
ZanSara
288ed150c9
feat!: Rename model_name or model_name_or_path to model in all Embedder classes ( #6733 )
...
* rename model parameter in the openai doc embedder
* fix tests for openai doc embedder
* rename model parameter in the openai text embedder
* fix tests for openai text embedder
* rename model parameter in the st doc embedder
* fix tests for st doc embedder
* rename model parameter in the st backend
* fix tests for st backend
* rename model parameter in the st text embedder
* fix tests for st text embedder
* fix docstring
* fix pipeline utils
* fix e2e
* reno
* fix the indexing pipeline _create_embedder function
* fix e2e eval rag pipeline
* pytest
2024-01-12 15:30:17 +01:00
ZanSara
3156343dce
fix leftover model_name_or_path param ( #6737 )
2024-01-12 15:03:06 +01:00
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores namespace ( #6714 )
...
* remove symbols under the haystack.document_stores namespace
* Update haystack/document_stores/types/protocol.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* fix
* same for retrievers
* leftovers
* more leftovers
* add relnote
* leftovers
* one more
* fix examples
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00
Ashwin Mathur
374a937663
feat: Add calculate_metrics and MetricsResult ( #6680 )
...
* Add calculate_metrics, MetricsResult, Exact Match
* Add additional tests for metric calculation
* Add release notes
* Add docstring for Exact Match metric
* Remove Exact Match Implementation
* Update release notes
* Remove unnecessary metrics implementation
* Simplify logic to run supported metrics
* Add some evaluation tests
* Fix linting
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-01-10 10:26:44 +01:00
Madeesh Kannan
e6d6ce1c73
feat: Add NamedEntityExtractorcomponent ( #6689 )
...
* feat: Add `NamedEntityExtractor`component
This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key.
The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy.
* Address comments
* Expand release note
* Add the `[torch]` extra package specifier to the lazy import
* Remove dead code
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-01-09 17:56:20 +01:00
Massimiliano Pippi
93b2aaee09
chore: move DocumentJoiner to new joiners package ( #6692 )
...
* move DocumentJoiner to new joiners package
* relnote
* leftovers
* fix docstrings generation
* fix unrelated pydoc misconfiguration
* more unrelated work, yay!
* fix assertions
2024-01-08 22:06:27 +01:00
Vladimir Blagojevic
506ab81d26
chore: Rename GPT generators, deprecate old names ( #6626 )
2023-12-22 19:37:29 +01:00
Julian Risch
d90f95be2e
test: Check only top answer in extractive QA e2e test ( #6614 )
2023-12-22 11:11:24 +01:00
Stefano Fiorucci
7cc6080dfa
chore: replace metadata w meta in tests/examples ( #6612 )
...
* replace metadata w meta in tests/examples
* do not touch already broken e2e tests
* Revert "do not touch already broken e2e tests"
This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.
2023-12-21 14:09:31 +01:00
Ashwin Mathur
46b395eec3
feat: Add Eval and EvaluationResult ( #6505 )
...
* Add initial implementation for Eval and EvaluationResult
* Add release notes
* Update files with suggestions from review
* Remove serialization
* Add eval e2e tests
* Update eval e2e tests
2023-12-18 11:29:09 +01:00
Silvano Cerza
18dbce25fc
refacotr: Refactor answer dataclasses ( #6523 )
...
* Refactor answer dataclasses
* Add release notes
* Fix tests
* Fix end to end tests
* Enhance ExtractiveReader
2023-12-11 18:50:49 +01:00
Silvano Cerza
e6637f5ec2
Fix all tests
2023-11-24 14:48:43 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
Silvano Cerza
fd16ec63cb
refactor: Add support for new filters declaration ( #6397 )
...
* Rework filter logic for InMemoryDocumentStore to support new filters
declaration
* Fix legacy filters tests
* Simplify logic and handle dates comparison
* Rework MetadataRouter to support new filters
* Update docstrings
* Add release notes
* Fix linting
* Avoid duplicating filters specifications
* Handle corner case
* Simplify docstring
* Fix filters logic and tests
* Fix Document Store testing legacy filters tests
2023-11-24 11:22:46 +01:00
Julian Risch
67780a62d5
test: Add end-to-end test for dense doc search 2.0 ( #6102 )
...
* draft e2e test for dense doc search
* fix import path
* add DocumentJoiner
* update converter import; fix getting filled doc store
* add text embedder
* add sample txt and pdf for preview e2e tests
* run the query pipeline before serializing
* define samples path
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 16:59:02 +01:00
Vladimir Blagojevic
cfff0d5212
Rename file_converters to converters ( #6390 )
2023-11-23 10:28:40 +01:00
Julian Risch
4ef2a680bb
feat: Add DocumentJoiner component 2.0 ( #6105 )
...
* draft DocumentJoiner
* implement merge and rrf
* draft end-to-end test with DocumentJoiner in hybrid doc search pipeline
* adjust for variadics Canals PR #122
* fix text_embedder input
* adapt to the new Document class
* adapt to new doc id
* specify documents input as Variadic in run method
* compare doc ids instead of full docs
* rename text_file_converter input to sources
* update docstring
* Update haystack/preview/components/routers/document_joiner.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply suggestions from docstring review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* capitalize Documents and Retrievers in docstrings
* fix log message in test
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-20 10:56:56 +01:00
ZanSara
dfc1d452bb
feat: upgrade canals to 0.10.1 ( #6309 )
...
* upgrade canals
* reno
* trigger preview e2e
* bump canals
* fix decorator
* fix test
* test factory
* tests inmemory
* tests writer
* test audio
* tests builders
* tests caching
* tests embedders
* tests converters
* tests generators
* tests rankers
* tests retrievers
* fix pipeline and telemetry tests
* remove trigger
2023-11-17 14:46:23 +01:00
Julian Risch
1c85e44156
test: Add langdetect installation to e2e tests ( #6327 )
...
* Add langdetect installation to e2e tests
* compare doc content and id only
2023-11-17 10:12:05 +01:00
Julian Risch
8b092a90c0
test: Add MetadataRouter to preprocessing pipeline in e2e test ( #6321 )
...
* add MetadataRouter to preprocessing pipeline
* replace mimetype check with language check
2023-11-16 11:22:37 +01:00