haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-02 09:56:55 +00:00

Author	SHA1	Message	Date
Stefano Fiorucci	7181f6b7e9	feat: change HTML conversion backend from boilerpy3 to Trafilatura (#7705 ) * change HTML conversion backed to Trafilatura * rm unused var	2024-05-17 10:38:47 +02:00
Carlos Fernández	57af95d7ea	add keep-id to DocumentCleaner (#7703 )	2024-05-16 19:18:48 +02:00
Carlos Fernández	686a4999cf	feat: widen support of env vars in OpenAI components (#7653 ) * add enviroment variables to the _enviroment.py file * add support for two of the three variables * Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder. * Replicate support for env vars in OpenAITextEmbedder. * Add support for env vars in OpenAIGenerator.. * Add support for env vars in OpenAIChatGenerator. * add docstrings and reno * add params to __init__ in OpenAIDocumentEmbedder * add params to __init__ in OpenAITextEmbedder * make fully functional implementation of env vars and unit tests * update reno * Update haystack/components/embedders/openai_text_embedder.py * reverse changes to telemetry/_enviroment.py * Update haystack/components/embedders/openai_text_embedder.py --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-05-15 21:58:41 +00:00
Sebastian Husch Lee	af53e8430d	feat: Add inference mode to ExtractiveReader (#7699 ) * Add inference mode to ExtractiveReader * Add release notes	2024-05-15 19:33:57 +00:00
Vladimir Blagojevic	c8d53b3ebf	fix: Adjust serialization to handle PEP-585 generic types (#7690 ) * Adjust serialization to handle PEP-585 generic types * Add reno note * Simplify * PEP 585 serialization handling in sys.version_info < (3, 9)	2024-05-15 14:25:19 +02:00
David S. Batista	96b9d3e32a	fix: Adding missing `component` decorator to AzureOpenAIGenerator (#7698 ) * initial import * adding release notes * tests avoiding I/O operations * Update fix-azure-generators-serialization-18fcdc9cbcb3732e.yaml	2024-05-15 10:00:38 +02:00
Massimiliano Pippi	cc1d4b1c80	chore: Simplify `Pipeline.run` method by moving code to the base class (#7680 ) * move graph initialization to the base class * simplify data normalization * deepcopy data in base class * initialize inputs state * move to_run preparation to the base class * Test Pipeline._init_to_run() * Test Pipeline._init_inputs_state() * Test Pipeline._prepare_component_input_data() --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-05-14 23:25:46 +02:00
David S. Batista	798dc4a4a5	fix: avoid FaithfulnessEvaluator and ContextRelevanceEvaluator return `Nan` (#7685 ) * initial import * fixing tests * relaxing condition * adding safeguard for ContextRelevanceEvaluator as well * adding release notes	2024-05-14 17:08:51 +02:00
Daria Fokina	cc869b10ad	add pdfminer (#7688 )	2024-05-14 13:42:29 +02:00
Madeesh Kannan	2428bc2a92	fix: `Pipeline.run` correctly returns all outputs when the `include_outputs_from` parameter is used (#7697 ) * fix: `Pipeline.run` correctly returns all outputs when the `include_outputs_from` parameter is used * Add release note	2024-05-14 12:29:41 +02:00
Vladimir Blagojevic	4352b1688e	fix: Fix NamedEntityExtractor serde (#7684 ) * Fix NamedEntityExtractor serde * Add release note * Linting, remove unit markers	2024-05-14 12:24:55 +02:00
David S. Batista	75cf35c743	fix: forcing response format to be JSON valid (#7692 ) * forcing response format to be JSON valid * adding release notes * cleaning up * Update haystack/components/evaluators/llm_evaluator.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-05-14 10:22:38 +00:00
Sebastian Husch Lee	a2be90b95a	fix: Update device deserialization for components that use local models (#7686 ) * fix: Update device deserializtion for SentenceTransformersTextEmbedder * Add unit test * Fix unit test * Make same change to doc embedder * Add release notes * Add same change to Diversity Ranker and Named Entity Extractor * Add unit test * Add the same for whisper local * Update release notes	2024-05-14 08:36:14 +02:00
Vladimir Blagojevic	811b93db91	feat: Set ByteStream's mime_type attribute for web based resources (#7681 )	2024-05-13 19:44:02 +02:00
Massimiliano Pippi	1d20ac3c5e	chore: extract BasePipeline (#7673 ) * extract BasePipeline * release note * add missing headers * move __eq__ to the base class * proper check type equality, bless the tests	2024-05-10 11:35:15 +02:00
DL	27acb3c970	Update pipeline.py (#7679 )	2024-05-09 18:51:48 +00:00
Silvano Cerza	0e1a5a65e8	Make SparseEmbedding a dataclass (#7678 )	2024-05-09 15:11:43 +00:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Massimiliano Pippi	02b8a07e31	re-enable linting for the core package (#7677 ) * re-enable linting for the core package * fix docstring	2024-05-09 13:00:16 +00:00
Stefano Fiorucci	dd95def0d1	introduce any-of-labels (#7676 )	2024-05-09 11:36:45 +02:00
Massimiliano Pippi	78e11bf764	Remove leftover from Haystack 1.x (#7664 )	2024-05-08 17:34:21 +02:00
Massimiliano Pippi	c07cedf168	chore: Stop labelling PRs with 2.x, assuming it's default now (#7665 )	2024-05-08 17:34:05 +02:00
Stefano Fiorucci	7c9532b200	fix broken serialization of HFAPI components (#7661 )	2024-05-08 17:14:37 +02:00
Stefano Fiorucci	94467149c1	fix: fix serialization of `DocumentRecallEvaluator` (#7662 ) * fix serialization of DocumentRecallEvaluator * add requested tests	2024-05-08 16:00:49 +02:00
Bilge Yücel	f14bc5330f	Add "SentenceTransformersDiversityRanker" api reference (#7659 )	2024-05-07 19:16:05 +02:00
Guest400123064	cd66a80ba2	perf: enhanced `InMemoryDocumentStore` BM25 query efficiency with incremental indexing (#7549 ) * incorporating better bm25 impl without breaking interface * all three bm25 algos * 1. setting algo post-init not allowed; 2. remove extra underscore for naming consistency; 3. remove unused import * 1. rename attribute name for IDF computation 2. organize document statistics as a dataclass instead of tuple to improve readability * fix score type initialization (int -> float) to pass mypy check * release note included * fixing linting issues and mypy * fixing tests * removing heapq import and cleaning up logging * changing indexing order * adding more tests * increasing tests * removing rank_bm25 from pyproject.toml --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-05-03 12:10:15 +00:00
Julian Risch	48c7c6ad26	test: Rename `responses` and use preds instead of ground truth answers in e2e eval test (#7640 ) * rename responses, use preds instead of ground truth answers * fix typo in component name	2024-05-03 12:48:42 +02:00
Silvano Cerza	34a79e368e	Enhance version bump PR body description (#7644 )	2024-05-03 12:45:18 +02:00
Haystack Bot	489349bcae	Update unstable version to 2.2.0-rc0 (#7643 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> v2.2.0-rc0	2024-05-03 11:55:43 +02:00
Silvano Cerza	3b64664eb7	Fix call to git in minor_version_release.yml v2.1.0-rc0	2024-05-03 11:49:00 +02:00
Silvano Cerza	c0fe5e660b	Rework minor_version_release.yml to create PR that bumps version (#7642 )	2024-05-03 11:45:25 +02:00
Vladimir Blagojevic	5f813373eb	chore: Update huggingface_hub classes used after library upgrade (#7631 ) * Update huggingface_hub classes used after library upgrade * Fix chat tests * Update lazy import guard and other references to huggingface_hub>=0.23.0 * In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional * More fixes * Add reno note	2024-05-03 10:14:54 +02:00
Silvano Cerza	db87074e68	Fix minor_version_release.yml workflow to work with both 1.x and 2.x (#7630 )	2024-05-02 15:23:07 +02:00
Julian Risch	b0284977db	feat: Add document page number of ExtractedAnswer to meta (#7572 ) * calculate page number of answer and add to meta * fix mypy, add reno * add test * simplify unit test * update release note * undo @patch updates * extend tests, check page_number type	2024-05-02 14:48:27 +02:00
Mo	2e35f13085	feat: add converter based on pdfminer (#7607 ) * Initial commit pdfminer converter * Revert back naming of argument all_text per pdfminer documentation * Add the component decorator * Add release notes * Reformat code with black * Remove LTPage and comments * Update dependencies in pyproject.toml * Added some tests and incorporated reference doc in docstring * Added some tests and incorporated reference doc in docstring	2024-05-02 10:36:54 +02:00
Julian Risch	2509eeea7e	refactor: Rename FaithfulnessEvaluator input responses to predicted_answers (#7621 )	2024-04-30 16:30:57 +02:00
evanderiel	5de5619abd	Add instance argument to code samples in docstrings for component.py (#7622 )	2024-04-30 16:04:06 +02:00
Vladimir Blagojevic	8cb3cecf34	feat: Trace pipeline run input/output data (#7590 ) * Trace pipeline run * Add reno note * Update tracing tests to check input_data and output_data * empty --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com> Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-04-29 17:29:27 +02:00
Tobias Wochinger	451fae880e	ci: fix catch-all (#7215 ) * ci: trigger separate workflow * ci: temporary use current branch * ci: fix workflow name * ci: try with same job name * ci: try with dispatch * Revert "ci: try with dispatch" This reverts commit bd66e56c0697ae97fc2599eebaceff417d9be65c. * Revert "ci: try with same job name" This reverts commit 9e2ae5b402758c14a9f812c2e06f820bd3ece767. * ci: try with workflow call in both cases * ci: introduce change to trigger CI * Revert "ci: introduce change to trigger CI" This reverts commit e3ec07c5e26f114364babea69535183253c801b7. * ci: add name * Revert "Revert "ci: introduce change to trigger CI"" This reverts commit 6718585fd24069112e0f773e010056e1d96e3eee. * ci: improve naming * ci: further improve naming * Unset reusable workflow version and use relative path * Remove CI trigger --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-04-29 14:54:12 +02:00
Bohan Qu	40360e44ff	feat: add required flag for prompt builder inputs (#7553 )	2024-04-29 14:21:53 +02:00
Carlos Fernández	d2c87b2fd9	feat: add page_number to metadata in DocumentSplitter (#7599 ) * Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705. * Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x Solve some minor bugs spotted by tests. * Update docstrings. * Add reno. * Update haystack/components/preprocessors/document_splitter.py Update docstring from suggestion Co-authored-by: David S. Batista <dsbatista@gmail.com> * solve suggestion to improve readability * fragment tests * Update haystack/components/preprocessors/document_splitter.py Co-authored-by: David S. Batista <dsbatista@gmail.com> * Update .gitignore * Update .gitignore * Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml * blackening --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-04-29 12:51:18 +02:00
David S. Batista	8d04e530da	test: end2end evaluation tests (#7601 ) * initial import * wip * cleaning up tests * fixing tests * adding context relevance * reverting some wrong changes to due PyCharm error in refactoring * building eval pipeline only once * handling mypy issues	2024-04-26 14:07:05 +00:00
David S. Batista	0047cd115e	fix: `EvaluationRunResult` should compare only input keys (#7603 ) * initial import * Update haystack/evaluation/eval_run_result.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-04-26 13:44:18 +00:00
Stefano Fiorucci	704293d491	add pydoc config for evaluation (#7602 )	2024-04-26 12:30:21 +02:00
Vladimir Blagojevic	36b9a05212	Rollback macos-latest to macos-12 (#7597 )	2024-04-26 10:44:49 +02:00
Madeesh Kannan	a881451d3a	refactor: Refactor `EvaluationResult` into `BaseEvaluationRunResult` and `EvaluationRunResult` (#7594 ) The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.	2024-04-25 12:16:48 +02:00
Madeesh Kannan	ec0e22265a	feat: Expand `Pipeline.inputs` and `Pipeline.outputs` to include connected sockets (#7586 )	2024-04-24 12:27:18 +02:00
Stefano Fiorucci	19a46af9da	add `__eq__` method to `SparseEmbedding` (#7574 ) * add __eq__ method to SparseEmbedding * reno * improve reno	2024-04-23 19:03:41 +02:00
David S. Batista	958f1eb3a3	doc: adding `docstring` linting based on `ruff` (#7463 ) * wip: docstrings linting * set ruff rules	2024-04-23 18:43:09 +02:00
Julian Risch	9c56dbe288	test: Make ContextRelevanceEvaluator integration test more robust (#7584 )	2024-04-23 16:01:25 +00:00

1 2 3 4 5 ...

3489 Commits