haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-23 17:00:41 +00:00

Author	SHA1	Message	Date
Michele Pangrazzi	c192488bf6	Named entity extractor private models (#8658 ) * add 'token' support to NamedEntityExtractor to enable using private models on HF backend * fix existing error message format * add release note * add HF_API_TOKEN to e2e workflow * add informative comment * Updated to_dict / from_dict to handle 'token' correctly ; Added tests * Fix lint * Revert unwanted change	2024-12-20 11:15:55 +01:00
David S. Batista	db89b9a2e5	fix: removing unused import (#8636 )	2024-12-13 12:35:58 +01:00
David S. Batista	176db5dbf9	initial import (#8635 )	2024-12-13 12:12:40 +01:00
David S. Batista	97126eb544	fix: changing default model to `gpt-4o-mini` on OpenAI API calls (#8360 ) * chaning default model to gpt-4o-mini * adding release notes * fixing some missed tests * fixing some more missed tests * fixing one last missed test * fixing linting issues * making pylint happy about an end2end test * chaning if test to walruss operator * fixing azure embedder from ada to text-embedding-ada-002	2024-09-17 10:36:42 +02:00
David S. Batista	276ff3c104	test evaluation pipeline failing (#7823 )	2024-06-07 11:26:18 +02:00
Silvano Cerza	26b263e349	Fix InMemoryDocumentStore not sharing some document stats with other instances (#7792 )	2024-06-04 10:15:50 +02:00
Julian Risch	6723dc3801	check for RuntimeError instead of ComponentError in test (#7769 )	2024-05-31 08:42:40 +02:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Julian Risch	48c7c6ad26	test: Rename `responses` and use preds instead of ground truth answers in e2e eval test (#7640 ) * rename responses, use preds instead of ground truth answers * fix typo in component name	2024-05-03 12:48:42 +02:00
David S. Batista	8d04e530da	test: end2end evaluation tests (#7601 ) * initial import * wip * cleaning up tests * fixing tests * adding context relevance * reverting some wrong changes to due PyCharm error in refactoring * building eval pipeline only once * handling mypy issues	2024-04-26 14:07:05 +00:00
Silvano Cerza	d66b5358a1	Remove eval end to end tests (#7093 )	2024-02-26 12:27:15 +01:00
Vladimir Blagojevic	d2497d54e8	Update to use the default Secret.from_env_var(OPENAI_API_KEY) approach (#6941 )	2024-02-09 14:15:45 +01:00
Ashwin Mathur	393a7993c3	feat: Add Semantic Answer Similarity metric (#6877 ) * Add SAS metric * Add release notes * Round similarity scores for precision consistency * Add tolerance to tests * Update haystack/evaluation/eval.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Add types for preprocess_text; Add additional types for f1 and em methods --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-02-02 17:07:52 +01:00
Ashwin Mathur	7217f9d9f0	feat: Add F1 metric (#6822 ) * Add F1 metric * Add release notes	2024-01-26 11:04:43 +01:00
Ashwin Mathur	a238c6dd51	feat: Add Exact Match metric (#6696 ) * Add exact match metric * Add release notes * Cleanup comments in test_eval_exact_match.py * Create separate preprocessing function; Add output_key parameter * Update release note --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-01-22 09:57:04 +01:00
Madeesh Kannan	6a1514550e	test: Update E2E tests to use `Pipeline.dump/load` (#6756 )	2024-01-17 15:09:27 +01:00
Madeesh Kannan	7376838922	feat!: Framework-agnostic device management (#6748 ) * feat: Framework-agnostic device management * Add release note * Linting * Fix test * Add `first_device` property, expand release notes, validate `ComponentDevice` state	2024-01-17 10:41:34 +01:00
Madeesh Kannan	d6cafeaff3	test: Rename RAG E2E test file (#6750 ) Prior to this change, this broke `pytest` workflows in VSCode due to identical test names in this file and the integration/unit test file.	2024-01-16 13:40:22 +01:00
ZanSara	96c0b59aaa	feat!: Rename `model_name_or_path` to `model` in `ExtractiveReader` (#6736 ) * rename model parameter and internam model attribute in ExtractiveReader * fix tests for ExtractiveReader * fix e2e * reno * another fix * review feedback * Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml	2024-01-15 14:48:33 +01:00
ZanSara	b236ea49e3	fix: hybrid pipeline e2e test (#6740 ) * fix hybrid pipeline e2e test * warmup * write to the right docstore	2024-01-15 14:20:02 +01:00
ZanSara	288ed150c9	feat!: Rename `model_name` or `model_name_or_path` to `model` in all Embedder classes (#6733 ) * rename model parameter in the openai doc embedder * fix tests for openai doc embedder * rename model parameter in the openai text embedder * fix tests for openai text embedder * rename model parameter in the st doc embedder * fix tests for st doc embedder * rename model parameter in the st backend * fix tests for st backend * rename model parameter in the st text embedder * fix tests for st text embedder * fix docstring * fix pipeline utils * fix e2e * reno * fix the indexing pipeline _create_embedder function * fix e2e eval rag pipeline * pytest	2024-01-12 15:30:17 +01:00
ZanSara	3156343dce	fix leftover model_name_or_path param (#6737 )	2024-01-12 15:03:06 +01:00
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00
Ashwin Mathur	374a937663	feat: Add `calculate_metrics` and `MetricsResult` (#6680 ) * Add calculate_metrics, MetricsResult, Exact Match * Add additional tests for metric calculation * Add release notes * Add docstring for Exact Match metric * Remove Exact Match Implementation * Update release notes * Remove unnecessary metrics implementation * Simplify logic to run supported metrics * Add some evaluation tests * Fix linting --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-01-10 10:26:44 +01:00
Madeesh Kannan	e6d6ce1c73	feat: Add `NamedEntityExtractor`component (#6689 ) * feat: Add `NamedEntityExtractor`component This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key. The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy. * Address comments * Expand release note * Add the `[torch]` extra package specifier to the lazy import * Remove dead code --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-01-09 17:56:20 +01:00
Massimiliano Pippi	93b2aaee09	chore: move `DocumentJoiner` to new `joiners` package (#6692 ) * move DocumentJoiner to new joiners package * relnote * leftovers * fix docstrings generation * fix unrelated pydoc misconfiguration * more unrelated work, yay! * fix assertions	2024-01-08 22:06:27 +01:00
Vladimir Blagojevic	506ab81d26	chore: Rename GPT generators, deprecate old names (#6626 )	2023-12-22 19:37:29 +01:00
Julian Risch	d90f95be2e	test: Check only top answer in extractive QA e2e test (#6614 )	2023-12-22 11:11:24 +01:00
Stefano Fiorucci	7cc6080dfa	chore: replace metadata w meta in tests/examples (#6612 ) * replace metadata w meta in tests/examples * do not touch already broken e2e tests * Revert "do not touch already broken e2e tests" This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.	2023-12-21 14:09:31 +01:00
Ashwin Mathur	46b395eec3	feat: Add Eval and EvaluationResult (#6505 ) * Add initial implementation for Eval and EvaluationResult * Add release notes * Update files with suggestions from review * Remove serialization * Add eval e2e tests * Update eval e2e tests	2023-12-18 11:29:09 +01:00
Silvano Cerza	18dbce25fc	refacotr: Refactor answer dataclasses (#6523 ) * Refactor answer dataclasses * Add release notes * Fix tests * Fix end to end tests * Enhance ExtractiveReader	2023-12-11 18:50:49 +01:00
Silvano Cerza	e6637f5ec2	Fix all tests	2023-11-24 14:48:43 +01:00
Massimiliano Pippi	09e7831f60	clean up 1.x code --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:47:47 +01:00
Silvano Cerza	cf7f0ebc22	Add Pipelines async run (#5864 ) * Add Pipeline.arun() * Sleeper node * Fix async running * Add e2e tests To run a Pipeline that doesn't have any async node in async mode: pytest e2e/pipelines/test_standard_pipelines.py::test_query_and_indexing_pipeline To run a Pipeline that has a single async node in concurrent mode: pytest e2e/pipelines/test_standard_pipelines.py::test_async_concurrent_complex_pipeline To run a Pipeline that has a single async node in sequential mode: pytest e2e/pipelines/test_standard_pipelines.py::test_async_sequential_complex_pipeline * Remove unused _adispatch_run method * Make Pipeline.run work with async nodes * Revert "Make Pipeline.run work with async nodes" This reverts commit 22d7a94e4d41aca1b59dad18c0b366fbb6e8f431. * Rename Pipeline.arun to Pipeline._arun * Enhance docstring * Add Sleeper docstring * Add release notes * ignore typing across the node * make pylint happy * skip pylint on needed unused import * fix * if a node has an arun method, use it --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-09-26 15:37:27 +02:00
Christian Clauss	bf6d306d68	ci: Simplify Python code with ruff rules SIM (#5833 ) * ci: Simplify Python code with ruff rules SIM * Revert #5828 * ruff --select=I --fix haystack/modeling/infer.py --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-09-20 08:32:44 +02:00
Julian Risch	4ae0924ea0	feat!: Remove SklearnQueryClassifier (#5779 ) * remove SklearnQueryClassifier * reno	2023-09-13 12:55:33 +02:00
Stefano Fiorucci	d860a5c604	make tests more robust (#5747 )	2023-09-08 15:50:56 +02:00
ZanSara	ce06268990	test: fix e2e test failures (#5685 ) * fix test errors * fix pipeline yaml * disable cache * fix errors * remove stray fixture	2023-08-30 12:24:03 +02:00
ZanSara	5985b6d358	chore: refactor pipeline tests for e2e testing (#5576 ) * enable pipeline filder in e2e * merge standard pipeline tests with stanrdard pipeline batch tests * merge summarization tests into standard pipelines tests * Update test_standard_pipelines.py * black	2023-08-29 11:22:39 +02:00
Stefano Fiorucci	637433841e	chore: remove deprecated `Seq2SeqGenerator` and `RAGenerator` (#5180 ) * first draft of removal * more removals * don't download unused models	2023-06-21 16:38:45 +02:00
Silvano Cerza	5ac3dffbef	test: Rework conftest (#4614 ) * Split root conftest into multiple ones and remove unused fixtures * Remove some constants and make them fixtures * Remove unnecessary fixture scoping * Fix failing whisper tests * Fix image_file_paths fixture	2023-04-11 10:33:43 +02:00
Vladimir Blagojevic	be25655663	feat: Add agent tools (#4437 ) * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Initial commit, add search_engine * Add TopPSampler * Add more TopPSampler unit tests * Remove SearchEngineSampler (converted to TopPSampler) * Add some basic WebSearch unit tests * Rename unit tests * Add WebRetriever into agent_tools * Adjust to WebRetriever * Add WebRetriever mode [snippet\|document] * Minor changes * SerperDev: add peopleAlsoAsk search results * First agent for hotpotqa * Making WebRetriever work on hotpotqa * refactor: minor WebRetriever improvements (#4377) * refactor: remove doc ids rebuild + antecipate cache * refactor: improve caching, fix Document ids * Minor WebRetriever improvements * Overlooked minor fixes * feat: add Bing API as search engine * refactor: let kwargs pass-through * feat: increase search context * check sampler result, improve batch typing * refactor: increase mypy compliance * Fix mypy * Minor example fixes * Fix the descriptions * PR feedback updates * More fixes * TopPSampler: handle top p None value, add unit test * Add top_k to WebSearch * Use boilerpy3 instead trafilatura * Remove date finding * Add more WebRetriever docs * Refactor long methods * making the preprocessor optional * hide WebSearch and make NeuralWebSearch a pipeline * remove unused imports * add WebQAPipeline and split example into two * change example search engine to SerperDev * Turn off progress bars in WebRetriever's PreProcesssor * Agent tool examples - final updates * Add webqa test, search results ranking scores * Better answer box handling for SerperDev and SerpAPI * Minor fixes * pylint * pylint fixes * extract TopPSampler from WebRetriever * use sampler only for WebRetriever modes other than snippet * add web retriever tests * add web retriever tests * exclude rdflib@6.3.2 due to license issues * add test for preprocessed docs and kwargs examples in docstrings * Move test_webqa_pipeline to test/pipelines * change docstring for join_documents_and_scores * Use WebQAPipeline in examples/web_lfqa.py * Use WebQAPipeline in examples/web_lfqa.py * Move test_webqa_pipeline to e2e * Updated lg * Sampler added automatically in WebQAPipeline, no need to add it * Updated lg * Updated lg * :ignore Update agent tools examples to new templates (#4503) * Update examples to new templates * Add print back * fix linting and black format issues --------- Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-03-27 18:14:58 +02:00
ZanSara	c802305ccf	test: move tests on standard pipelines in `e2e/` (#4309 ) * move out standard pipelines e2e * fixing unit tests * add test data * feedback * pylint * black	2023-03-06 17:26:19 +01:00
ZanSara	ae04ce3c6a	test: mock all Summarizer tests and move a few into e2e (#4299 ) * stub e2e folders * simplify pipeline test * mocking * unit tests fixed * clean up e2e * pipeline tests work * pylint * leftover * small fix from #2994 and additional tests * review feedback * change summaries * black * revert models and summaries	2023-03-01 17:30:55 +01:00
ZanSara	165a0a5faa	test: mock all `Translator` tests and move one to `e2e` (#4290 ) * mock all translator tests and move one to e2e * typo * extract pipeline tests using translator * remove duplicate test * move generator test in e2e * Update e2e/pipelines/test_extractive_qa.py * pytest.mark.unit * black * remove model name as well * remove unused fixture * rename original and improve pipeline tests * fixes * pylint	2023-03-01 14:52:05 +01:00

45 Commits