haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-09 06:13:43 +00:00

Author	SHA1	Message	Date
Silvano Cerza	3e3f79b928	feat: Add `unsafe` init arg in `ConditionalRouter` and `OutputAdapter` to enable previous behaviour (#8176 ) * Add unsafe behaviour to OutputAdapter * Add unsafe behaviour to ConditionalRouter * Add release notes * Fix mypy * Add documentation links --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-09-02 14:14:54 +00:00
Alper	e614fa0c62	refactor: Rename deserialize_document_store_in_init_parameters (#8302 ) * 8259 * update function name * rename and update docstring * fix linting * add a release note	2024-09-02 11:42:23 +02:00
Stefano Fiorucci	842a7b80a8	rm sentence_window_retrieval (#8303 )	2024-08-28 10:51:07 +02:00
David S. Batista	2f3257b77a	chore: removing deprecated `SentenceWindowRetrieval` (#8294 ) * removing deprecated SentenceWindowRetrieval * adding release notes * Rename TestSentenceWindowRetrieval to TestSentenceWindowRetriever --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-08-28 10:04:52 +02:00
Madeesh Kannan	f0b45c873f	feat: Extend core component machinery to support an async run method (experimental) (#8279 ) * feat: Extend core component machinery to support an async run method * Add reno * Fix incorrect docstring * Make `async_run` a coroutine * Make `supports_async` a dunder field	2024-08-27 14:20:13 +02:00
Madeesh Kannan	1fa30d4aaa	chore: Remove deprecated `debug` param from `Pipeline.run` (#8288 ) * chore: Remove deprecated `debug` param from `Pipeline.run` * Fix tests	2024-08-27 11:27:38 +02:00
David S. Batista	b411c14414	feat: The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window (#8283 ) * initial import * adding release notes * linting * improving docs and release notes * updating example	2024-08-27 10:30:12 +02:00
Stefano Fiorucci	2e619f06c8	fix: make meta produced by `DOCXToDocument` JSON serializable (#8263 ) * make meta from DOCXToDocument JSON serializable * unused import * update docstrings	2024-08-22 12:24:32 +00:00
Jon Strutz	471f07c8fe	fix: extract page breaks from .docx files (#8232 ) * fix: extract page breaks from .docx files Context: Currently, DOCXToDocument does not extract page breaks from word documents. This makes it impossible to do things like split by page or get correct page number metadata after using something like DocumentSplitter. For example, if you split by word, the 'page_number' metadata field will be 1 for all documents. Solution: Added a method to DOCXToDocument that extracts page breaks from word documents as '\f' characters so that they are recognized by DocumentSplitter. Caveat: Due to the way the python-docx library is set up, you can only accurately determine the location of the first page break for a given paragraph. In the rare case that a paragraph contains more than one page break (which means it is an extremely long paragraph spanning multiple pages), the 2nd, 3rd, etc. page break locations are not known. To sort of fix this, I just appended the page break characters to the end of the paragraph text to keep the overall page number values for the document consistent. * Apply suggestions from code review --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-08-21 09:48:02 +00:00
Sebastian Husch Lee	7fd0b6a013	feat: Add `min_top_k` to TopPSampler (#8228 ) * Add feature to Top P Sampler * Add release notes * Fix zip call * Fix mypy * Restore doc string and make mypy happy hopefully * Make mypy happy * PR comment * Revert change to make mypy happy * Add back type ignore * try to fix typing * Update haystack/components/samplers/top_p.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/samplers/top_p.py --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com> Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-08-21 11:29:23 +02:00
Madeesh Kannan	cf5fd2a821	chore: Remove deprecated `ChatMessage.to_openai_format` (#8242 ) * chore: Remove deprecated `ChatMessage.to_openai_format` * lint	2024-08-16 10:34:44 +02:00
Stefano Fiorucci	bcc4104729	refactor: utility function for docstore deserialization (#8226 ) * refactor docstore deserialization * more tests * reno; headers * expose key	2024-08-14 13:29:27 +02:00
Silvano Cerza	ab7eb25856	Add utility then step in feature testing to draw pipeline to file (#8209 )	2024-08-13 14:49:13 +02:00
Vladimir Blagojevic	3318d894c0	Add sede_with_list_output_type_in_pipeline unit test (#8196 )	2024-08-13 14:37:24 +02:00
Amna Mubashar	373de97426	Deprecate SentenceWindowRetrieval (#8206 )	2024-08-13 13:49:41 +02:00
Vladimir Blagojevic	21c507331c	feat: Implement apply_filter_policy and FilterPolicy.MERGE for the new filters (#8042 )	2024-08-09 12:04:24 +02:00
Nicola Procopio	4c798470b2	added `precision` parameter to sentence transformers embeddings (#8179 ) * added `precision` parameter to sentence transformers embeddings * fixed test * Update haystack/components/embedders/sentence_transformers_document_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/embedders/test_sentence_transformers_text_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/embedders/test_sentence_transformers_text_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * fix format * Update sentence_transformers_text_embedder.py --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-08-09 11:38:47 +02:00
Marie-Luise Klaus	ec02817f14	fix: OutputAdapter from_dict with custom_filters None (#8173 ) Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>	2024-08-08 14:02:40 +02:00
Corentin Meyer	58517014ec	fix: DocumentCleaner: keep the \f in text (#8078 ) * Keep the \f in Document Cleaner * Add Reno * Add Test * Simplified _remove_empty_lines() code	2024-08-07 14:50:14 +02:00
Marie-Luise Klaus	031b0bfbd8	fix: ChatPromptBuilder from_dict if template is None (#8165 ) * fix ChatPromptBuilder from dict if template=None * fix ChatPromptBuilder from dict if template=None * leave template None --------- Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>	2024-08-06 14:48:04 +02:00
Tim Wellbrock	2e2f5f17bb	feat: add unicode normalization & ascii_only mode for DocumentCleaner (#8103 ) * feat: add unicode normalization & ascii_only mode for DocumentCleaner. * feat: add unicode_normalization parameter valdiation to DocumentCleaner. * test: fix the unit test to work after code linting.	2024-08-05 13:00:39 +02:00
Stefano Fiorucci	e17d0c4192	chore: deprecate `to_openai_format` and create similar utility functions (#8146 ) * deprecate and add new specific functions * reno	2024-08-02 16:47:17 +02:00
Sebastian Husch Lee	c90495c2e8	feat: Add model and tokenizer kwargs to `TransformersSimilarityRanker`, `SentenceTransformersDocumentEmbedder`, `SentenceTransformersTextEmbedder` (#8145 ) * Start adding model and tokenizer kwargs support * Add model and tokenizer kwargs to doc embedder * Some updates and fixes in tests * Fix more tests * Fix tests * Add release note * Fix test * Add from_dict tests	2024-08-02 10:37:10 +02:00
Vladimir Blagojevic	25d3520f5a	feat: Add `AnswerJoiner` new component (#8122 ) * Initial AnswerJoiner * Initial tests * Add release note * Resove mypy warning * Add custom join function * Serialize custom join function * Handle all Answer types, add integration test, improve pydoc * Make fixes * Add to API docs * Add more tests * Update haystack/components/joiners/answer_joiner.py Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * Update docstrings and release notes * update docstrings --------- Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com> Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> Co-authored-by: Darja Fokina <daria.fokina@deepset.ai>	2024-08-01 12:51:17 +02:00
Stefano Fiorucci	3d1ad10385	fix html test (#8127 )	2024-07-31 10:59:53 +02:00
Silvano Cerza	c7e29a83c1	fix: Fix infinite loop when running Pipeline (#8123 ) * Fix infinite loop when running Pipeline * Simplify if	2024-07-30 15:00:12 +02:00
Corentin Meyer	1c53aae8f0	fix: Tika converter not yielding page break tags (`\f`) (#8082 ) * Fix TikaConverter not having \f page tag by using HTML mode of parsing and then parsing the HTML to text using the old Haystack 1.X integration as template. * Add Reno * Fix test by making Mock Tika return XML (before parsing) * refinements and test --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2024-07-26 20:13:47 +02:00
Amna Mubashar	e0de423ee0	Rename SentenceWindowRetrieval to SentenceWindowRetriever	2024-07-26 17:46:44 +02:00
Silvano Cerza	3fed1366c4	fix: Fix issue that could lead to RCE if using unsecure Jinja templates (#8095 ) * Fix issue that could lead to RCE if using unsecure Jinja templates * Add comment explaining exception suppression * Update release note * Update release note	2024-07-26 14:02:09 +00:00
Nicola Procopio	47f4db8698	added truncate_dim to sentence transformers embedder (#8077 ) * added truncate_dim to sentence transformers embedder * Update haystack/components/embedders/sentence_transformers_document_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update releasenotes/notes/release-note-2b603a123cd36214.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * fixed parameter description * added test for truncation to text embedder * fix format --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-07-26 10:39:48 +02:00
Madeesh Kannan	b2aef217da	chore: Remove deprecated `DynamicPromptBuilder` and `DynamicChatPromptBuilder` components (#8085 )	2024-07-26 10:00:59 +02:00
Tobias Wochinger	4dde6fbaec	build: unpin structlog (#8071 )	2024-07-24 20:58:34 +02:00
Amna Mubashar	b374c528b2	Assign streaming_callback to OpenAIGenerator and OpenAIChatGenerator in run() method (#8054 ) * Add optional parameter for streaming_callback in run() method	2024-07-24 15:49:19 +02:00
Sebastian Husch Lee	baed478f23	fix: Fix `split_start_idx` and `_split_overlap` information in `DocumentSplitter` (#8046 ) * Fix bug in DocumentSplitter and expand tests to catch said bug * Fix split overlap information calc and actually test it * Add release notes * Remove comments * Same fix in SentenceWindowRetrieval --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2024-07-24 15:15:36 +02:00
David S. Batista	0c9dc008f0	fix: improve context relevancy metric (#7964 ) * fixing tests * fixing tests * updating tests * updating tests * updating docstring * adding release notes * making the insufficient information more robust * updating docstring and release notes * empty list instead of informative string * Update haystack/components/evaluators/context_relevance.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update haystack/components/evaluators/context_relevance.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * fixing tests * Update haystack/components/evaluators/context_relevance.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * reverting commit * reverting again commit * fixing docstrings * removing deprecation warning * removing warning import --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-07-22 15:13:46 +02:00
Vladimir Blagojevic	a59de1d7b3	chore: Combined main unblock (#8045 ) * Pin structlog to 24.2.0 due to unit test failures * Remove object init parameter in huggingface_hub unit tests * Use less restrictive structlog pin * Add release note	2024-07-19 10:39:10 +02:00
David S. Batista	431aa4a406	updating sentence window retriever tests (#8034 ) * updating sentence window retriever tests * fix	2024-07-16 22:10:55 +02:00
Amna Mubashar	499fbcc59f	Remove Multiplexer and related tests (#8020 )	2024-07-16 15:39:40 +02:00
Silvano Cerza	0411cd938a	Fix bug in Pipeline.run() executing Components in a wrong and unexpected order (#8021 ) * Fix bug in Pipeline.run() executing Components in a wrong and unexpected order * Update haystack/core/pipeline/base.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-07-12 15:30:10 +00:00
Madeesh Kannan	94b806815c	refactor: Improve error messages shown during pipeline deserialization (#8016 ) * refactor: Improve error messages shown during pipeline deserialization * Add link to release notes * Update release notes link	2024-07-12 14:47:00 +00:00
Anushree Bannadabhavi	1f05e633a9	refactor: refactor DocumentJoiner to follow enum pattern for join_mode parameter (#8010 ) * refactor document joiner to follow enum pattern for join mode * Added to_dict and from_dict	2024-07-12 11:29:44 +02:00
Silvano Cerza	0cec82e55e	refactor: Pipeline.run() (#8019 ) * Move utility functions from _enqueue_next_runnable_component (#7895) * Isolate logic to check if we're stuck in a loop * Simplify for else * Add missing return in docstring * Emit warning when stuck in a loop * Fix docstring Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Add utility function to move Components in queues * Add function to find next Component to run * Comment update * Add missing break in loop * Make _add_missing_input_defaults less error prone and add tests * Fix tests * Update docstring * Simplify enqueue logic * Remove unused _enqueue_next_runnable_component function * Add method to find Component with lazy variadic input or all inputs with defaults * Simplify _find_next_runnable_lazy_variadic_or_default_component * Remove unnecessary type ignore * Split _dequeue_components_that_received_no_input into separate functions * Fix linting * Simplify variadic check when running Component * Simplify code * Reorganize functions used by Pipeline.run * Rename variables used in Pipeline.run() for clarity * Add comment clarifying last_waiting_queue and before_last_waiting_queue * Add functions to easily update waiting_queue --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-07-12 08:35:23 +00:00
Madeesh Kannan	8faa3fa465	Revert "fix: make PyPDF backward compatible (#7996 )" (#8014 ) This reverts commit 58b48e36eb56a896365133ab4a9d8e327989948c.	2024-07-11 13:06:08 +00:00
Ulises M	6f8834d036	feat: add and expose api_params for OpenAIGenerator in LLMEvaluator based classes (#7987 ) * initial support for api_params * add tests and reno * resolve suggestions and add integration test * fix mypy --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-07-11 13:14:03 +02:00
David S. Batista	ebfeb571d7	feat: add sentence window retrieval (#7997 ) * initial import * adding tests * adding license and release notes * adding missing release notes * working with any type of doc store * nit * adding get_class_object to serialization package * nit * refactoring get_class_object() * refactoring get_class_object() * chaning type and var names * more refactoring * Update haystack/core/serialization.py Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> * Update haystack/core/serialization.py Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> * Update test/core/test_serialization.py Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> * more refactoring * more refactoring * Pydoc syntax --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2024-07-10 13:13:46 +00:00
Sebastian Husch Lee	c121c86c4c	fix: Fix from_dict methods of components using HF models to work with default values (#8003 ) * Fix from_dict to work if device isn't provided in init params * Minor refactoring of from_dict for components that load HF models * Add tests * Update tests to test loading with all default parameters * Add more tests * Add release notes * Add unit test for whisper local * Update reno * Add fix for ExtractiveReader * Fix NamedEntityExtractor	2024-07-10 12:18:05 +02:00
tstadel	7e35280d4f	fix: LinkContentFetcher html text encoding (#7975 ) * fix: content encoding of LinkContentFetcher * fix tests * add reno * only touch html	2024-07-09 15:28:49 +02:00
Sebastian Husch Lee	583eb8a293	fix: `TransformersZeroShotTextRouter` and `TransformersTextRouter` from_dict to work with default value for huggingface_pipeline_kwargs (#8002 ) * Fix default value for huggingface_pipeline_kwargs * Add reno note * Update HuggingFaceLocalGenerator.from_dict to use the same logic as HuggingFaceLocalChatGenerator.from_dict * Update tests slightly * Update release note	2024-07-09 13:32:44 +02:00
Tobias Wochinger	58b48e36eb	fix: make PyPDF backward compatible (#7996 ) * fix: make PyPDF backward compatible * Add release note --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2024-07-09 10:08:37 +02:00
Nitanshu Vashistha	cd8a5b98fe	feat: Configure max_retries & timeout for AzureOpenAITextEmbedder (#7993 ) max_retries: if not set is read from the OPENAI_MAX_RETRIES env variable or set to 5. timeout: if not set is read from the OPENAI_TIMEOUT env variable or set to 30. Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>	2024-07-09 09:56:46 +02:00

1 2 3 4 5 ...

1524 Commits