haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-24 17:30:38 +00:00

Author	SHA1	Message	Date
Michele Pangrazzi	c192488bf6	Named entity extractor private models (#8658 ) * add 'token' support to NamedEntityExtractor to enable using private models on HF backend * fix existing error message format * add release note * add HF_API_TOKEN to e2e workflow * add informative comment * Updated to_dict / from_dict to handle 'token' correctly ; Added tests * Fix lint * Revert unwanted change	2024-12-20 11:15:55 +01:00
David S. Batista	db89b9a2e5	fix: removing unused import (#8636 )	2024-12-13 12:35:58 +01:00
David S. Batista	176db5dbf9	initial import (#8635 )	2024-12-13 12:12:40 +01:00
David S. Batista	97126eb544	fix: changing default model to `gpt-4o-mini` on OpenAI API calls (#8360 ) * chaning default model to gpt-4o-mini * adding release notes * fixing some missed tests * fixing some more missed tests * fixing one last missed test * fixing linting issues * making pylint happy about an end2end test * chaning if test to walruss operator * fixing azure embedder from ada to text-embedding-ada-002	2024-09-17 10:36:42 +02:00
David S. Batista	276ff3c104	test evaluation pipeline failing (#7823 )	2024-06-07 11:26:18 +02:00
Silvano Cerza	26b263e349	Fix InMemoryDocumentStore not sharing some document stats with other instances (#7792 )	2024-06-04 10:15:50 +02:00
Julian Risch	6723dc3801	check for RuntimeError instead of ComponentError in test (#7769 )	2024-05-31 08:42:40 +02:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Julian Risch	48c7c6ad26	test: Rename `responses` and use preds instead of ground truth answers in e2e eval test (#7640 ) * rename responses, use preds instead of ground truth answers * fix typo in component name	2024-05-03 12:48:42 +02:00
David S. Batista	8d04e530da	test: end2end evaluation tests (#7601 ) * initial import * wip * cleaning up tests * fixing tests * adding context relevance * reverting some wrong changes to due PyCharm error in refactoring * building eval pipeline only once * handling mypy issues	2024-04-26 14:07:05 +00:00
Silvano Cerza	d66b5358a1	Remove eval end to end tests (#7093 )	2024-02-26 12:27:15 +01:00
Vladimir Blagojevic	d2497d54e8	Update to use the default Secret.from_env_var(OPENAI_API_KEY) approach (#6941 )	2024-02-09 14:15:45 +01:00
Ashwin Mathur	393a7993c3	feat: Add Semantic Answer Similarity metric (#6877 ) * Add SAS metric * Add release notes * Round similarity scores for precision consistency * Add tolerance to tests * Update haystack/evaluation/eval.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Add types for preprocess_text; Add additional types for f1 and em methods --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-02-02 17:07:52 +01:00
Ashwin Mathur	7217f9d9f0	feat: Add F1 metric (#6822 ) * Add F1 metric * Add release notes	2024-01-26 11:04:43 +01:00
Ashwin Mathur	a238c6dd51	feat: Add Exact Match metric (#6696 ) * Add exact match metric * Add release notes * Cleanup comments in test_eval_exact_match.py * Create separate preprocessing function; Add output_key parameter * Update release note --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-01-22 09:57:04 +01:00
Madeesh Kannan	6a1514550e	test: Update E2E tests to use `Pipeline.dump/load` (#6756 )	2024-01-17 15:09:27 +01:00
Madeesh Kannan	7376838922	feat!: Framework-agnostic device management (#6748 ) * feat: Framework-agnostic device management * Add release note * Linting * Fix test * Add `first_device` property, expand release notes, validate `ComponentDevice` state	2024-01-17 10:41:34 +01:00
Madeesh Kannan	d6cafeaff3	test: Rename RAG E2E test file (#6750 ) Prior to this change, this broke `pytest` workflows in VSCode due to identical test names in this file and the integration/unit test file.	2024-01-16 13:40:22 +01:00
ZanSara	96c0b59aaa	feat!: Rename `model_name_or_path` to `model` in `ExtractiveReader` (#6736 ) * rename model parameter and internam model attribute in ExtractiveReader * fix tests for ExtractiveReader * fix e2e * reno * another fix * review feedback * Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml	2024-01-15 14:48:33 +01:00
ZanSara	b236ea49e3	fix: hybrid pipeline e2e test (#6740 ) * fix hybrid pipeline e2e test * warmup * write to the right docstore	2024-01-15 14:20:02 +01:00
ZanSara	288ed150c9	feat!: Rename `model_name` or `model_name_or_path` to `model` in all Embedder classes (#6733 ) * rename model parameter in the openai doc embedder * fix tests for openai doc embedder * rename model parameter in the openai text embedder * fix tests for openai text embedder * rename model parameter in the st doc embedder * fix tests for st doc embedder * rename model parameter in the st backend * fix tests for st backend * rename model parameter in the st text embedder * fix tests for st text embedder * fix docstring * fix pipeline utils * fix e2e * reno * fix the indexing pipeline _create_embedder function * fix e2e eval rag pipeline * pytest	2024-01-12 15:30:17 +01:00
ZanSara	3156343dce	fix leftover model_name_or_path param (#6737 )	2024-01-12 15:03:06 +01:00
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00
Ashwin Mathur	374a937663	feat: Add `calculate_metrics` and `MetricsResult` (#6680 ) * Add calculate_metrics, MetricsResult, Exact Match * Add additional tests for metric calculation * Add release notes * Add docstring for Exact Match metric * Remove Exact Match Implementation * Update release notes * Remove unnecessary metrics implementation * Simplify logic to run supported metrics * Add some evaluation tests * Fix linting --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-01-10 10:26:44 +01:00
Madeesh Kannan	e6d6ce1c73	feat: Add `NamedEntityExtractor`component (#6689 ) * feat: Add `NamedEntityExtractor`component This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key. The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy. * Address comments * Expand release note * Add the `[torch]` extra package specifier to the lazy import * Remove dead code --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-01-09 17:56:20 +01:00
Massimiliano Pippi	93b2aaee09	chore: move `DocumentJoiner` to new `joiners` package (#6692 ) * move DocumentJoiner to new joiners package * relnote * leftovers * fix docstrings generation * fix unrelated pydoc misconfiguration * more unrelated work, yay! * fix assertions	2024-01-08 22:06:27 +01:00
Vladimir Blagojevic	506ab81d26	chore: Rename GPT generators, deprecate old names (#6626 )	2023-12-22 19:37:29 +01:00
Julian Risch	d90f95be2e	test: Check only top answer in extractive QA e2e test (#6614 )	2023-12-22 11:11:24 +01:00
Stefano Fiorucci	7cc6080dfa	chore: replace metadata w meta in tests/examples (#6612 ) * replace metadata w meta in tests/examples * do not touch already broken e2e tests * Revert "do not touch already broken e2e tests" This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.	2023-12-21 14:09:31 +01:00
Ashwin Mathur	46b395eec3	feat: Add Eval and EvaluationResult (#6505 ) * Add initial implementation for Eval and EvaluationResult * Add release notes * Update files with suggestions from review * Remove serialization * Add eval e2e tests * Update eval e2e tests	2023-12-18 11:29:09 +01:00
Silvano Cerza	18dbce25fc	refacotr: Refactor answer dataclasses (#6523 ) * Refactor answer dataclasses * Add release notes * Fix tests * Fix end to end tests * Enhance ExtractiveReader	2023-12-11 18:50:49 +01:00
Silvano Cerza	e6637f5ec2	Fix all tests	2023-11-24 14:48:43 +01:00
Massimiliano Pippi	09e7831f60	clean up 1.x code --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:47:47 +01:00
Silvano Cerza	fd16ec63cb	refactor: Add support for new filters declaration (#6397 ) * Rework filter logic for InMemoryDocumentStore to support new filters declaration * Fix legacy filters tests * Simplify logic and handle dates comparison * Rework MetadataRouter to support new filters * Update docstrings * Add release notes * Fix linting * Avoid duplicating filters specifications * Handle corner case * Simplify docstring * Fix filters logic and tests * Fix Document Store testing legacy filters tests	2023-11-24 11:22:46 +01:00
Julian Risch	67780a62d5	test: Add end-to-end test for dense doc search 2.0 (#6102 ) * draft e2e test for dense doc search * fix import path * add DocumentJoiner * update converter import; fix getting filled doc store * add text embedder * add sample txt and pdf for preview e2e tests * run the query pipeline before serializing * define samples path --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-11-23 16:59:02 +01:00
Vladimir Blagojevic	cfff0d5212	Rename file_converters to converters (#6390 )	2023-11-23 10:28:40 +01:00
Julian Risch	4ef2a680bb	feat: Add DocumentJoiner component 2.0 (#6105 ) * draft DocumentJoiner * implement merge and rrf * draft end-to-end test with DocumentJoiner in hybrid doc search pipeline * adjust for variadics Canals PR #122 * fix text_embedder input * adapt to the new Document class * adapt to new doc id * specify documents input as Variadic in run method * compare doc ids instead of full docs * rename text_file_converter input to sources * update docstring * Update haystack/preview/components/routers/document_joiner.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from docstring review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * capitalize Documents and Retrievers in docstrings * fix log message in test --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: anakin87 <stefanofiorucci@gmail.com> Co-authored-by: ZanSara <sara.zanzottera@deepset.ai> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2023-11-20 10:56:56 +01:00
ZanSara	dfc1d452bb	feat: upgrade canals to 0.10.1 (#6309 ) * upgrade canals * reno * trigger preview e2e * bump canals * fix decorator * fix test * test factory * tests inmemory * tests writer * test audio * tests builders * tests caching * tests embedders * tests converters * tests generators * tests rankers * tests retrievers * fix pipeline and telemetry tests * remove trigger	2023-11-17 14:46:23 +01:00
Julian Risch	1c85e44156	test: Add langdetect installation to e2e tests (#6327 ) * Add langdetect installation to e2e tests * compare doc content and id only	2023-11-17 10:12:05 +01:00
Julian Risch	8b092a90c0	test: Add MetadataRouter to preprocessing pipeline in e2e test (#6321 ) * add MetadataRouter to preprocessing pipeline * replace mimetype check with language check	2023-11-16 11:22:37 +01:00
Vladimir Blagojevic	5497ca2a45	feat: Adapt `GPTGenerator` to use str input/output format in Haystack 2.x (#6214 ) * Adapt GPTGenerator to string input/output * Finishing touches * punctuation upd * PR feedback * Small naming fixes * Update haystack/preview/components/generators/openai.py Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> * Update class pydoc with a printed response --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2023-11-07 18:00:43 +01:00
Stefano Fiorucci	982ac3df01	fix: fix failing e2e test (after moving classifiers) (#6243 ) * mv classifiers * release note * fix e2e test	2023-11-06 17:08:20 +01:00
Stefano Fiorucci	063d27c522	refactor!: rename `TextDocumentSplitter` to `DocumentSplitter` (#6223 ) * rename TextDocumentSplitter to DocumentSplitter * reno * fix init	2023-11-03 11:33:20 +01:00
Julian Risch	29b1fefaa4	feat: Add DocumentLanguageClassifier 2.0 (#6037 ) * add DocumentLanguageClassifier and tests * reno * fix import, rename DocumentCleaner * mark example usage as python code * add assertions to e2e test * use deserialized document_store * Apply suggestions from code review Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * remove from/to_dict * use renamed InMemoryDocumentStore * adapt to Document refactoring * improve docstring * fix test for new Document --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2023-10-31 15:35:05 +01:00
Silvano Cerza	7287657f0e	refactor: Rename `Document`'s `text` field to `content` (#6181 ) * Rework Document serialisation Make Document backward compatible Fix InMemoryDocumentStore filters Fix InMemoryDocumentStore.bm25_retrieval Add release notes Fix pylint failures Enhance Document kwargs handling and docstrings Rename Document's text field to content Fix e2e tests Fix SimilarityRanker tests Fix typo in release notes Rename Document's metadata field to meta (#6183) * fix bugs * make linters happy * fix * more fix * match regex --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-10-31 12:44:04 +01:00
Nripesh Niketan	708d33a657	feat: add apple silicon GPU acceleration (#6151 ) * feat: add apple silicon GPU acceleration * add release notes * small fix * Update utils.py * Update utils.py * ci fix mps * Revert "ci fix mps" This reverts commit 783ae503940d9ff8270a970a321549fb9e69dce7. * mps fix * Update experiment_tracking.py * try removing upper watermark limit * disable mps CI * Use xl runner * initialise env * small fix * black linting --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-10-30 11:26:46 +01:00
Stefano Fiorucci	4e4af99a5e	refactor!: rename `MemoryDocumentStore` and related Retrievers (#6076 ) * rename doc store and retrievers * release note * fix patch	2023-10-17 16:15:16 +02:00
ZanSara	71f2430fd1	test: enhance e2e tests to also draw and serialize/deserialize the test pipelines (#5910 ) * add draw and serialization/deserialization to e2e pipeline examples * add comment about json serialization * fix a small gptgenerator bug and move indexing in tests * to json * review feedback	2023-10-09 13:54:17 +02:00
Stefano Fiorucci	c8398eeb6d	test: e2e test for Extractive QA Pipeline (#5879 ) * e2e test for e. qa pipeline	2023-09-26 15:44:34 +02:00
Silvano Cerza	cf7f0ebc22	Add Pipelines async run (#5864 ) * Add Pipeline.arun() * Sleeper node * Fix async running * Add e2e tests To run a Pipeline that doesn't have any async node in async mode: pytest e2e/pipelines/test_standard_pipelines.py::test_query_and_indexing_pipeline To run a Pipeline that has a single async node in concurrent mode: pytest e2e/pipelines/test_standard_pipelines.py::test_async_concurrent_complex_pipeline To run a Pipeline that has a single async node in sequential mode: pytest e2e/pipelines/test_standard_pipelines.py::test_async_sequential_complex_pipeline * Remove unused _adispatch_run method * Make Pipeline.run work with async nodes * Revert "Make Pipeline.run work with async nodes" This reverts commit 22d7a94e4d41aca1b59dad18c0b366fbb6e8f431. * Rename Pipeline.arun to Pipeline._arun * Enhance docstring * Add Sleeper docstring * Add release notes * ignore typing across the node * make pylint happy * skip pylint on needed unused import * fix * if a node has an arun method, use it --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-09-26 15:37:27 +02:00

1 2

83 Commits