haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-05 11:38:20 +00:00

Author	SHA1	Message	Date
Sebastian Husch Lee	b4fd38dcbe	remove unneeded test (#10221 )	2025-12-11 11:11:38 +01:00
Abdelrahman Kaseb	b9a34dfebf	Fix: prevent in-place mutation of documents in Document Classifiers and Extractors (#9703 ) * modify Documents Classifiers and Extractors to not make in-place changes * Add e2e test for NER * Add unit test for NER * fixes + refinements --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2025-08-12 15:20:44 +02:00
Abdelrahman Kaseb	5f3c37d287	chore: adopt PEP 585 type hints (#9678 ) * chore(lint): enforce and apply PEP 585 type hinting * Run fmt fixes * Fix all typing imports using some regex * Fix all typing written in string in tests * undo changes in the e2e tests * make e2e test use list instead of List * type fixes * remove type:ignore * pylint * Remove typing from Usage example comments * Remove typing from most of comments * try to fix e2e tests on comm PRs * fix * Add tests typing.List in to adjust test compatiplity - test/components/agents/test_state_class.py - test/components/converters/test_output_adapter.py - test/components/joiners/test_list_joiner.py * simplify pyproject * improve relnote --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2025-08-07 10:23:14 +02:00
Stefano Fiorucci	d059cf2c23	feat: add `skip_empty_documents` init parameter to `DocumentSplitter` (#9649 ) * feat: add skip_empty_documents init parameter to DocumentSplitter * improve test * fix + relnote	2025-07-24 11:26:11 +02:00
Stefano Fiorucci	bcaef53cbc	test: export `HF_TOKEN` env var in e2e environment (#9551 ) * try to fix e2e tests for private NER models * explanatory comment * extend skipif condition	2025-06-25 15:00:28 +02:00
Stefano Fiorucci	de5c7ea3d2	feat: add py.typed; adjust `Component` protocol (#9329 ) * experimenting with py.typed * try changing run method in protocol * Trigger Build * better docstring + release note * remove type:ignore where possible * Removed a few more type: ignores --------- Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>	2025-05-07 09:34:31 +02:00
David S. Batista	03505678e2	removing unused imports (#9172 )	2025-04-04 11:16:44 +02:00
Stefano Fiorucci	019c238dd0	test: stop drawing pipelines in e2e tests (#9164 )	2025-04-04 10:50:05 +02:00
David S. Batista	ed931b4c2b	fix: adding pylint disable for EvalRunResult end2endtest (#9054 )	2025-03-18 11:20:11 +01:00
David S. Batista	de76d20f12	fix: updating end2end evaluation tests (#9053 ) * updating tests * fixing tests, default now is JSON object and no longer dataframe * cleaning up leftovers	2025-03-18 10:52:05 +01:00
Michele Pangrazzi	c192488bf6	Named entity extractor private models (#8658 ) * add 'token' support to NamedEntityExtractor to enable using private models on HF backend * fix existing error message format * add release note * add HF_API_TOKEN to e2e workflow * add informative comment * Updated to_dict / from_dict to handle 'token' correctly ; Added tests * Fix lint * Revert unwanted change	2024-12-20 11:15:55 +01:00
David S. Batista	db89b9a2e5	fix: removing unused import (#8636 )	2024-12-13 12:35:58 +01:00
David S. Batista	176db5dbf9	initial import (#8635 )	2024-12-13 12:12:40 +01:00
David S. Batista	97126eb544	fix: changing default model to `gpt-4o-mini` on OpenAI API calls (#8360 ) * chaning default model to gpt-4o-mini * adding release notes * fixing some missed tests * fixing some more missed tests * fixing one last missed test * fixing linting issues * making pylint happy about an end2end test * chaning if test to walruss operator * fixing azure embedder from ada to text-embedding-ada-002	2024-09-17 10:36:42 +02:00
David S. Batista	276ff3c104	test evaluation pipeline failing (#7823 )	2024-06-07 11:26:18 +02:00
Silvano Cerza	26b263e349	Fix InMemoryDocumentStore not sharing some document stats with other instances (#7792 )	2024-06-04 10:15:50 +02:00
Julian Risch	6723dc3801	check for RuntimeError instead of ComponentError in test (#7769 )	2024-05-31 08:42:40 +02:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Julian Risch	48c7c6ad26	test: Rename `responses` and use preds instead of ground truth answers in e2e eval test (#7640 ) * rename responses, use preds instead of ground truth answers * fix typo in component name	2024-05-03 12:48:42 +02:00
David S. Batista	8d04e530da	test: end2end evaluation tests (#7601 ) * initial import * wip * cleaning up tests * fixing tests * adding context relevance * reverting some wrong changes to due PyCharm error in refactoring * building eval pipeline only once * handling mypy issues	2024-04-26 14:07:05 +00:00
Silvano Cerza	d66b5358a1	Remove eval end to end tests (#7093 )	2024-02-26 12:27:15 +01:00
Vladimir Blagojevic	d2497d54e8	Update to use the default Secret.from_env_var(OPENAI_API_KEY) approach (#6941 )	2024-02-09 14:15:45 +01:00
Ashwin Mathur	393a7993c3	feat: Add Semantic Answer Similarity metric (#6877 ) * Add SAS metric * Add release notes * Round similarity scores for precision consistency * Add tolerance to tests * Update haystack/evaluation/eval.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Add types for preprocess_text; Add additional types for f1 and em methods --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-02-02 17:07:52 +01:00
Ashwin Mathur	7217f9d9f0	feat: Add F1 metric (#6822 ) * Add F1 metric * Add release notes	2024-01-26 11:04:43 +01:00
Ashwin Mathur	a238c6dd51	feat: Add Exact Match metric (#6696 ) * Add exact match metric * Add release notes * Cleanup comments in test_eval_exact_match.py * Create separate preprocessing function; Add output_key parameter * Update release note --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-01-22 09:57:04 +01:00
Madeesh Kannan	6a1514550e	test: Update E2E tests to use `Pipeline.dump/load` (#6756 )	2024-01-17 15:09:27 +01:00
Madeesh Kannan	7376838922	feat!: Framework-agnostic device management (#6748 ) * feat: Framework-agnostic device management * Add release note * Linting * Fix test * Add `first_device` property, expand release notes, validate `ComponentDevice` state	2024-01-17 10:41:34 +01:00
Madeesh Kannan	d6cafeaff3	test: Rename RAG E2E test file (#6750 ) Prior to this change, this broke `pytest` workflows in VSCode due to identical test names in this file and the integration/unit test file.	2024-01-16 13:40:22 +01:00
ZanSara	96c0b59aaa	feat!: Rename `model_name_or_path` to `model` in `ExtractiveReader` (#6736 ) * rename model parameter and internam model attribute in ExtractiveReader * fix tests for ExtractiveReader * fix e2e * reno * another fix * review feedback * Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml	2024-01-15 14:48:33 +01:00
ZanSara	b236ea49e3	fix: hybrid pipeline e2e test (#6740 ) * fix hybrid pipeline e2e test * warmup * write to the right docstore	2024-01-15 14:20:02 +01:00
ZanSara	288ed150c9	feat!: Rename `model_name` or `model_name_or_path` to `model` in all Embedder classes (#6733 ) * rename model parameter in the openai doc embedder * fix tests for openai doc embedder * rename model parameter in the openai text embedder * fix tests for openai text embedder * rename model parameter in the st doc embedder * fix tests for st doc embedder * rename model parameter in the st backend * fix tests for st backend * rename model parameter in the st text embedder * fix tests for st text embedder * fix docstring * fix pipeline utils * fix e2e * reno * fix the indexing pipeline _create_embedder function * fix e2e eval rag pipeline * pytest	2024-01-12 15:30:17 +01:00
ZanSara	3156343dce	fix leftover model_name_or_path param (#6737 )	2024-01-12 15:03:06 +01:00
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00
Ashwin Mathur	374a937663	feat: Add `calculate_metrics` and `MetricsResult` (#6680 ) * Add calculate_metrics, MetricsResult, Exact Match * Add additional tests for metric calculation * Add release notes * Add docstring for Exact Match metric * Remove Exact Match Implementation * Update release notes * Remove unnecessary metrics implementation * Simplify logic to run supported metrics * Add some evaluation tests * Fix linting --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-01-10 10:26:44 +01:00
Madeesh Kannan	e6d6ce1c73	feat: Add `NamedEntityExtractor`component (#6689 ) * feat: Add `NamedEntityExtractor`component This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key. The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy. * Address comments * Expand release note * Add the `[torch]` extra package specifier to the lazy import * Remove dead code --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-01-09 17:56:20 +01:00
Massimiliano Pippi	93b2aaee09	chore: move `DocumentJoiner` to new `joiners` package (#6692 ) * move DocumentJoiner to new joiners package * relnote * leftovers * fix docstrings generation * fix unrelated pydoc misconfiguration * more unrelated work, yay! * fix assertions	2024-01-08 22:06:27 +01:00
Vladimir Blagojevic	506ab81d26	chore: Rename GPT generators, deprecate old names (#6626 )	2023-12-22 19:37:29 +01:00
Julian Risch	d90f95be2e	test: Check only top answer in extractive QA e2e test (#6614 )	2023-12-22 11:11:24 +01:00
Stefano Fiorucci	7cc6080dfa	chore: replace metadata w meta in tests/examples (#6612 ) * replace metadata w meta in tests/examples * do not touch already broken e2e tests * Revert "do not touch already broken e2e tests" This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.	2023-12-21 14:09:31 +01:00
Ashwin Mathur	46b395eec3	feat: Add Eval and EvaluationResult (#6505 ) * Add initial implementation for Eval and EvaluationResult * Add release notes * Update files with suggestions from review * Remove serialization * Add eval e2e tests * Update eval e2e tests	2023-12-18 11:29:09 +01:00
Silvano Cerza	18dbce25fc	refacotr: Refactor answer dataclasses (#6523 ) * Refactor answer dataclasses * Add release notes * Fix tests * Fix end to end tests * Enhance ExtractiveReader	2023-12-11 18:50:49 +01:00
Silvano Cerza	e6637f5ec2	Fix all tests	2023-11-24 14:48:43 +01:00
Massimiliano Pippi	09e7831f60	clean up 1.x code --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:47:47 +01:00
Silvano Cerza	fd16ec63cb	refactor: Add support for new filters declaration (#6397 ) * Rework filter logic for InMemoryDocumentStore to support new filters declaration * Fix legacy filters tests * Simplify logic and handle dates comparison * Rework MetadataRouter to support new filters * Update docstrings * Add release notes * Fix linting * Avoid duplicating filters specifications * Handle corner case * Simplify docstring * Fix filters logic and tests * Fix Document Store testing legacy filters tests	2023-11-24 11:22:46 +01:00
Julian Risch	67780a62d5	test: Add end-to-end test for dense doc search 2.0 (#6102 ) * draft e2e test for dense doc search * fix import path * add DocumentJoiner * update converter import; fix getting filled doc store * add text embedder * add sample txt and pdf for preview e2e tests * run the query pipeline before serializing * define samples path --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-11-23 16:59:02 +01:00
Vladimir Blagojevic	cfff0d5212	Rename file_converters to converters (#6390 )	2023-11-23 10:28:40 +01:00
Julian Risch	4ef2a680bb	feat: Add DocumentJoiner component 2.0 (#6105 ) * draft DocumentJoiner * implement merge and rrf * draft end-to-end test with DocumentJoiner in hybrid doc search pipeline * adjust for variadics Canals PR #122 * fix text_embedder input * adapt to the new Document class * adapt to new doc id * specify documents input as Variadic in run method * compare doc ids instead of full docs * rename text_file_converter input to sources * update docstring * Update haystack/preview/components/routers/document_joiner.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from docstring review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * capitalize Documents and Retrievers in docstrings * fix log message in test --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: anakin87 <stefanofiorucci@gmail.com> Co-authored-by: ZanSara <sara.zanzottera@deepset.ai> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2023-11-20 10:56:56 +01:00
ZanSara	dfc1d452bb	feat: upgrade canals to 0.10.1 (#6309 ) * upgrade canals * reno * trigger preview e2e * bump canals * fix decorator * fix test * test factory * tests inmemory * tests writer * test audio * tests builders * tests caching * tests embedders * tests converters * tests generators * tests rankers * tests retrievers * fix pipeline and telemetry tests * remove trigger	2023-11-17 14:46:23 +01:00
Julian Risch	1c85e44156	test: Add langdetect installation to e2e tests (#6327 ) * Add langdetect installation to e2e tests * compare doc content and id only	2023-11-17 10:12:05 +01:00
Julian Risch	8b092a90c0	test: Add MetadataRouter to preprocessing pipeline in e2e test (#6321 ) * add MetadataRouter to preprocessing pipeline * replace mimetype check with language check	2023-11-16 11:22:37 +01:00

1 2

93 Commits