haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-08 21:28:00 +00:00

Author	SHA1	Message	Date
Stefano Fiorucci	8de639bd70	DocxDocument forward reference (#7852 )	2024-06-13 11:29:31 +02:00
Carlos Fernández	c1c339923f	feat: add DocxToDocument converter (#7838 ) * first fucntioning DocxFileToDocument * fix lazy import message * add reno * Add license headder Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * change DocxFileToDocument to DocxToDocument * Update library install to the maintained version Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * clan try-exvept to only take non haystack errors into account * Add wanring on docstring of component ignoring page brakes, mark test as skip * make warnings lazy evaluations Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * make warnings lazy evaluations Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Make warnings lazy evaluated Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Solve f bug * Get more metadata from docx files * add 'python-docx' dependency and docs * Change logging import Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Fix typo Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * remake metadata extraction for docx * solve bug regarding _get_docx_metadata method * Update haystack/components/converters/docx.py Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Update haystack/components/converters/docx.py Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Delete unused test --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-06-12 11:58:36 +02:00
Rob Pasternak	28dd0f5596	feat: Add options for what to do with missing metadata fields in `MetaFieldRanker` (#7700 ) * Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation. * Implement `missing_meta` functionality in `run()`. * Finish first draft of revised `MetaFieldRanker` functionality. * Add tests for `MetaFieldRanker` `missing_meta` functionality. * Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation. * Implement `missing_meta` functionality in `run()`. * Finish first draft of revised `MetaFieldRanker` functionality. * Add tests for `MetaFieldRanker` `missing_meta` functionality. * Add release notes for new `missing_meta` param of `MetaFieldRanker` * Move part of docs_missing_meta_field warning string outside of `if...elif...else`.	2024-06-12 10:42:02 +02:00
Madeesh Kannan	63226dad34	fix: Fix `LLMEvaluator` serialization (#7818 ) * fix: Fix `LLMEvaluator` serialization * `reno`	2024-06-07 12:49:23 +02:00
Sebastian Husch Lee	2c2c7c9f56	feat: Add PPTXToDocument converter (#7808 ) * Add first pass at PPTXToDocument converter * Add test and update code * Add doc string * Update docstrings * Add release notes * remove unused imports, add to api docs, update pyproject.toml * Add a new test * Add dep so tests can run	2024-06-07 09:43:29 +00:00
Sebastian Husch Lee	d815c78198	feat: Add `TransformersTextRouter` component (#7801 ) * First pass at adding TransformerTextRouter * Fix tests * Add release notes * Add optional labels param * Add verification in the warm_up * Fix tests * Add labels to to_dict * Feedback from review * Add component to docs * Added extra tests	2024-06-05 15:28:53 +02:00
Vladimir Blagojevic	678f193f10	feat: Add filter_policy init parameter to in memory retrievers (#7795 ) * Add filter_policy init parameter to in-memory retrievers	2024-06-04 17:51:16 +02:00
Silvano Cerza	854c4173f2	feat: Add memory sharing between different instances of `InMemoryDocumentStore` (#7781 ) * Add memory sharing between different instances of InMemoryDocumentStore * Fix FilterRetriever tests * Fix InMemoryBM25Retriever tests	2024-05-31 16:44:14 +02:00
Massimiliano Pippi	8d80ff86d9	Add BranchJoiner and deprecate Multiplexer (#7765 )	2024-05-30 15:34:52 +02:00
Massimiliano Pippi	0ceeb733ba	chore: make `warm_up()` usage consistent (#7752 ) * make usage consistent * fix error type * release notes * pylint fix * change of plan * revert * fix test * revert * fix HF tests * Apply suggestions from code review Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * fix formatting * reformat * fix regex match with the new error message * fix integration test --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-05-29 10:54:21 +02:00
Alessio Cesaretti	d0da31a047	feat: Add split_threshold to DocumentSplitter to avoid excessively short splits (#7721 ) * feat: add split_threshold to document splitter to avoid excessively small splits * Update haystack/components/preprocessors/document_splitter.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update haystack/components/preprocessors/document_splitter.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * extend release note --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-05-27 14:48:38 +02:00
tstadel	98fd270428	feat: add ChatPromptBuilder, deprecate DynamicChatPromptBuilder (#7663 )	2024-05-23 19:04:55 +02:00
David S. Batista	38747ff7a3	fix: failsafe for non-valid json and failed LLM calls (#7723 ) * wip * initial import * adding tests * adding params * adding safeguards for nan in evaluators * adding docstrings * fixing tests * removing unused imports * adding tests to context and faithfullness evaluators * fixing docstrings * nit * removing unused imports * adding release notes * attending PR comments * fixing tests * fixing tests * adding types * removing unused imports * Update haystack/components/evaluators/context_relevance.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update haystack/components/evaluators/faithfulness.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * attending PR comments --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-05-23 15:41:29 +00:00
Massimiliano Pippi	e3dccf4406	add timeout to AzureOpenAIGenerator (#7724 ) * add timeout to AzureOpenAIGenerator * add to chat also * Update azure-openai-generator-timeout-c39ecd6d4b0cdb4b.yaml	2024-05-23 16:28:24 +02:00
tstadel	83d3970405	feat: extend PromptBuilder and deprecate DynamicPromptBuilder (#7655 ) * feat: add default template to DynamicPromptBuilder * fix mypy * fix mypy * extend PromptBuilder and deprecate DynamicPromptBuilder * make backward-compatible: optional -> required * make backward-compatible: _template_string * make backward-compatible: missing_required_vars error * add test for no template case * better docstrings * some chors * some chors * add reno * revert test_dynamic_prompt_builder.py * better docstring * make backward-compatible: reorder init args * fix tests * add raises docstring * make default template required and rework docstrings * docs chores * keep to_dict in place for easier review * remove unnecessary logger * update docstring	2024-05-23 16:03:39 +02:00
Varun Krishnan	badb05b3ab	feat: allow DocumentJoiner to accept top_k parameter in run method (#7709 ) * feat: allow DocumentJoiner to accept top_k parameter in run method * Added release note for DocumentJoiner top_k fix	2024-05-23 16:03:26 +02:00
Massimiliano Pippi	482f60ec99	fix: exit early if the component receives no documents (#7732 ) * exit early if the component receives no documents * relnote	2024-05-23 09:35:10 +02:00
David S. Batista	a4fc2b66e6	style: adding progress bar to llm-based evaluators (#7726 ) * adding progress bar * fixing typo * fixing tests * Update test_llm_evaluator.py * fixing missing colon * passing directly to parent * adding docstrings	2024-05-23 09:22:14 +02:00
Massimiliano Pippi	76224fc781	make SerperDevWebSearch more robust (#7725 )	2024-05-22 13:14:39 +02:00
Stefano Fiorucci	7181f6b7e9	feat: change HTML conversion backend from boilerpy3 to Trafilatura (#7705 ) * change HTML conversion backed to Trafilatura * rm unused var	2024-05-17 10:38:47 +02:00
Carlos Fernández	57af95d7ea	add keep-id to DocumentCleaner (#7703 )	2024-05-16 19:18:48 +02:00
Carlos Fernández	686a4999cf	feat: widen support of env vars in OpenAI components (#7653 ) * add enviroment variables to the _enviroment.py file * add support for two of the three variables * Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder. * Replicate support for env vars in OpenAITextEmbedder. * Add support for env vars in OpenAIGenerator.. * Add support for env vars in OpenAIChatGenerator. * add docstrings and reno * add params to __init__ in OpenAIDocumentEmbedder * add params to __init__ in OpenAITextEmbedder * make fully functional implementation of env vars and unit tests * update reno * Update haystack/components/embedders/openai_text_embedder.py * reverse changes to telemetry/_enviroment.py * Update haystack/components/embedders/openai_text_embedder.py --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-05-15 21:58:41 +00:00
David S. Batista	96b9d3e32a	fix: Adding missing `component` decorator to AzureOpenAIGenerator (#7698 ) * initial import * adding release notes * tests avoiding I/O operations * Update fix-azure-generators-serialization-18fcdc9cbcb3732e.yaml	2024-05-15 10:00:38 +02:00
David S. Batista	798dc4a4a5	fix: avoid FaithfulnessEvaluator and ContextRelevanceEvaluator return `Nan` (#7685 ) * initial import * fixing tests * relaxing condition * adding safeguard for ContextRelevanceEvaluator as well * adding release notes	2024-05-14 17:08:51 +02:00
Vladimir Blagojevic	4352b1688e	fix: Fix NamedEntityExtractor serde (#7684 ) * Fix NamedEntityExtractor serde * Add release note * Linting, remove unit markers	2024-05-14 12:24:55 +02:00
Sebastian Husch Lee	a2be90b95a	fix: Update device deserialization for components that use local models (#7686 ) * fix: Update device deserializtion for SentenceTransformersTextEmbedder * Add unit test * Fix unit test * Make same change to doc embedder * Add release notes * Add same change to Diversity Ranker and Named Entity Extractor * Add unit test * Add the same for whisper local * Update release notes	2024-05-14 08:36:14 +02:00
Vladimir Blagojevic	811b93db91	feat: Set ByteStream's mime_type attribute for web based resources (#7681 )	2024-05-13 19:44:02 +02:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Stefano Fiorucci	7c9532b200	fix broken serialization of HFAPI components (#7661 )	2024-05-08 17:14:37 +02:00
Stefano Fiorucci	94467149c1	fix: fix serialization of `DocumentRecallEvaluator` (#7662 ) * fix serialization of DocumentRecallEvaluator * add requested tests	2024-05-08 16:00:49 +02:00
Vladimir Blagojevic	5f813373eb	chore: Update huggingface_hub classes used after library upgrade (#7631 ) * Update huggingface_hub classes used after library upgrade * Fix chat tests * Update lazy import guard and other references to huggingface_hub>=0.23.0 * In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional * More fixes * Add reno note	2024-05-03 10:14:54 +02:00
Julian Risch	b0284977db	feat: Add document page number of ExtractedAnswer to meta (#7572 ) * calculate page number of answer and add to meta * fix mypy, add reno * add test * simplify unit test * update release note * undo @patch updates * extend tests, check page_number type	2024-05-02 14:48:27 +02:00
Mo	2e35f13085	feat: add converter based on pdfminer (#7607 ) * Initial commit pdfminer converter * Revert back naming of argument all_text per pdfminer documentation * Add the component decorator * Add release notes * Reformat code with black * Remove LTPage and comments * Update dependencies in pyproject.toml * Added some tests and incorporated reference doc in docstring * Added some tests and incorporated reference doc in docstring	2024-05-02 10:36:54 +02:00
Julian Risch	2509eeea7e	refactor: Rename FaithfulnessEvaluator input responses to predicted_answers (#7621 )	2024-04-30 16:30:57 +02:00
Bohan Qu	40360e44ff	feat: add required flag for prompt builder inputs (#7553 )	2024-04-29 14:21:53 +02:00
Carlos Fernández	d2c87b2fd9	feat: add page_number to metadata in DocumentSplitter (#7599 ) * Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705. * Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x Solve some minor bugs spotted by tests. * Update docstrings. * Add reno. * Update haystack/components/preprocessors/document_splitter.py Update docstring from suggestion Co-authored-by: David S. Batista <dsbatista@gmail.com> * solve suggestion to improve readability * fragment tests * Update haystack/components/preprocessors/document_splitter.py Co-authored-by: David S. Batista <dsbatista@gmail.com> * Update .gitignore * Update .gitignore * Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml * blackening --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-04-29 12:51:18 +02:00
Madeesh Kannan	a881451d3a	refactor: Refactor `EvaluationResult` into `BaseEvaluationRunResult` and `EvaluationRunResult` (#7594 ) The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.	2024-04-25 12:16:48 +02:00
Julian Risch	9c56dbe288	test: Make ContextRelevanceEvaluator integration test more robust (#7584 )	2024-04-23 16:01:25 +00:00
Julian Risch	07307709ee	test: Make FaithfulnessEvaluator integration test more robust (#7582 )	2024-04-23 15:44:00 +00:00
Stefano Fiorucci	081757c6b9	test: replace mistral-7b with zephyr-7b-beta in tests (#7576 ) * replace mistral-7b with gemma-2b-it in tests * rm wrong comment * change model	2024-04-23 13:56:07 +02:00
Julian Risch	d7638cfd4b	refactor: FaithfulnessEvaluator specifies inputs explicitly (#7548 ) * specify inputs explicitly. move out examples * Update haystack/components/evaluators/faithfulness.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-04-22 12:52:10 +00:00
Julian Risch	b12e0db134	feat: Add ContextRelevanceEvaluator component (#7519 ) * feat: Add ContextRelevanceEvaluator component * reno * fix expected inputs and example docstring * remove responses parameter from tests * specify inputs explicitly * add new evaluator to api reference docs	2024-04-22 14:10:00 +02:00
Massimiliano Pippi	3a80c866c9	fix: do not use reserved attributes in the logger (#7545 ) * avoid using reserved keywords in the logger * make the tests independent from the log level * relnotes	2024-04-12 14:07:18 +00:00
Massimiliano Pippi	2bad5bcb96	refactor: AnswerExactMatchEvaluator component inputs (#7536 ) * refactor component inputs * release notes * Update class docstring * pylint * update existing note instead of creating a new one --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-04-12 06:59:16 +00:00
David S. Batista	9a9c8aa1c8	feat: implementing evalualtion results API (#7520 ) * initial import * adding tests * attending PR comments * fixing tests * updating tests * updating tests and code * renaming * fixing linting issues * adding release notes * adding docstrings * latest fixes	2024-04-10 13:34:03 +00:00
Julian Risch	e974a23fa3	docs: Fix eval metric examples in docstrings (#7505 ) * fix eval metric docstrings, change type of individual scores * change import order * change exactmatch docstring to single ground truth answer * change exactmatch comment to single ground truth answer * reverted changing docs to single ground truth * add warm up in SASEvaluator example * fix FaithfulnessEvaluator docstring example * extend FaithfulnessEvaluator docstring example * Update FaithfulnessEvaluator init docstring * Remove outdated default from LLMEvaluator docstring * Add examples param to LLMEvaluator docstring example * Add import and print to LLMEvaluator docstring example	2024-04-10 11:00:20 +02:00
Stefano Fiorucci	39be515ba6	skip HF integrations tests if running from fork (#7517 )	2024-04-09 17:47:13 +02:00
Vladimir Blagojevic	988c360b6d	feat: Azure converter updates (#7409 ) * Initial commit * Remove old mock tests * Fix current_last_page_number calculation * Carry over unit tests from the other side * Update pydocs, skip failing tests * Fix pylint and mypy * Minor adjustments * Add release note * Minor touch ups * Resolve Document unique id issue by using custom id calculation * Better hashing, add unit tests * Small fixes	2024-04-09 09:45:06 +02:00
Stefano Fiorucci	eff53a9131	feat: `HuggingFaceAPIDocumentEmbedder` (#7485 ) * add HuggingFaceAPITextEmbedder * add HuggingFaceAPITextEmbedder * rm unneeded else * wip * small fixes * deprecation; reno * Apply suggestions from code review Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * make params mandatory * changes requested * fix test * fix test --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-04-08 15:06:26 +02:00
Stefano Fiorucci	c91bd49cae	feat: `HuggingFaceAPITextEmbedder` (#7484 ) * add HuggingFaceAPITextEmbedder * add HuggingFaceAPITextEmbedder * rm unneeded else * small fixes * changes requested * fix test	2024-04-08 14:22:54 +02:00

1 2 3 4

200 Commits