haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-03 18:36:04 +00:00

Author	SHA1	Message	Date
Ajit Singh	2dd8089409	chore: Removed deprecated max_loop_allowed argument from Pipeline init (#8409 ) * Added equality check for sender and receiver in connection function of pipeline * Update base.py irrelevant changes reverted * added release note * removed deprecated param max_loops_allowed from pipeline init * added release note * revert non relevant test * Delete releasenotes/notes/remove-support-to-connect-component-to-self-6eedfb287f2a2a02.yaml * revery non relevant change * Remove unused test_pipeline_deprecated.yaml * Remove PipelineMaxLoops error * Update release notes --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-09-30 15:58:05 +02:00
Ajit Singh	7ba30d5691	feat: `Pipeline.connect()` will now raise a `PipelineConnectError` if `sender` and `receiver` are the same Component (#8403 ) * Added equality check for sender and receiver in connection function of pipeline * Update base.py irrelevant changes reverted * added release note * altered a walk with cycle test * added a test to verify that pipeline raises PipelineConnectError when adding a component to itself * Update release notes * Remove self connection feature tests * Tidy up connect unit test --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-09-30 15:52:36 +02:00
Silvano Cerza	29672d4b42	feat: Add `JSONConverter` Component (#8397 ) * Add JSONConverter Component * Handle some corner cases * Add JSONConverter to pydoc config * Add a way to extract all non content fields as metadata * Small fix in docstring * Fix tests * docstrings upd * Update json.py --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-09-25 12:34:51 +02:00
Silvano Cerza	0df379e6a2	feat: Deprecate `@component` decorator `is_greedy` argument (#8400 ) * Deprecate @component decorator is_greedy argument * Fix some typos and docstrings * Add _is_lazy_variadic test	2024-09-25 11:28:30 +02:00
Sebastian Husch Lee	74f7c6fdfb	Set max_runs_per_component to 1 for pipelines that are linear (#8393 )	2024-09-24 14:44:45 +02:00
Vladimir Blagojevic	09b95746a2	feat: HuggingFaceAPIChatGenerator add token `usage` data (#8375 ) * Ensure HuggingFaceAPIChatGenerator has token usage data * Add reno note * Fix release note	2024-09-23 15:40:50 +02:00
Sriniketh J	066e2e3ec5	Make api_key param optional in LLMEvaluator (#8340 )	2024-09-20 10:47:13 +02:00
Sebastian Husch Lee	2235ce673f	test: Move pipeline test to behavorials (#8377 )	2024-09-19 16:59:35 +02:00
Vladimir Blagojevic	514e0abc39	fix: Fix nltk imports (#8381 )	2024-09-18 11:25:21 +00:00
Madeesh Kannan	b22014b915	fix: Prevent `set_output_types` from being called when the `output_types` decorator is used (#8376 )	2024-09-18 13:05:31 +02:00
Vladimir Blagojevic	badd0594cc	feat: Port NLTKDocumentSplitter from dC to Haystack (#8350 ) * Port NLTKDocumentSplitter from dC to Haystack * Improve pydocs * Use haystack logging * Add NLTKDocumentSplitter to __init__.py * Use haystack logging, rename test classes * Fixing _needs_join return * Linting * PR feedback * More static methods * Increase test coverage * Compile pattern --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-09-17 13:59:19 +02:00
Madeesh Kannan	5071e47843	refactor: Rename `Component.async_run` to `Component.run_async` for better readablility (#8370 ) Using a suffix will keep names logically sorted, less noisy and relegate the async aspect to an implementation/API detail.	2024-09-17 10:10:34 +00:00
David S. Batista	97126eb544	fix: changing default model to `gpt-4o-mini` on OpenAI API calls (#8360 ) * chaning default model to gpt-4o-mini * adding release notes * fixing some missed tests * fixing some more missed tests * fixing one last missed test * fixing linting issues * making pylint happy about an end2end test * chaning if test to walruss operator * fixing azure embedder from ada to text-embedding-ada-002	2024-09-17 10:36:42 +02:00
Giovanni Alzetta, PhD	4106e7e8d1	feat : DocumentSplitter, adding the option to split_by function (#8336 ) * Adding splitting function * Adding test for split by function * Adding release note for feat adding split by function * Fixing release note for split_by_function * Fixing issue with splitting_function non callable * nit: fixing value error in documentsplitter for split_by * Add custom serde --------- Co-authored-by: Giovanni Alzetta <giovannialzetta@gmail.com> Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2024-09-12 16:38:37 +02:00
Vladimir Blagojevic	7e9f153e78	chore: Remove all references to old filter syntax (#8342 ) * Remove all references to old filter syntax * More removals * Lint * Do not remove test_filter_retriever.py * Add reno note * Update ValueError text to match text in haystack-core-integrations	2024-09-12 16:28:31 +02:00
Madeesh Kannan	672bcf7e03	fix: Add constraints to `set_input_type(s)` based on `run` method (#8358 ) * fix: Prevent the usage of `set_input_type(s)` when the `run` method doesn't have kwargs, raise if `set_input_type(s)` overrides `run` method parameters * fix: update components and tests * reno	2024-09-12 15:58:16 +02:00
Silvano Cerza	5514676b5e	feat: Deprecate `max_loops_allowed` in favour of new argument `max_runs_per_component` (#8354 ) * Deprecate max_loops_allowed in favour of new argument max_runs_per_component * Add missing test file * Some enhancements * Add version that will remove deprecate stuff	2024-09-12 11:00:12 +02:00
Sebastian Husch Lee	7227bcf9df	feat: TransformerSimilarityRanker add batching across Documents during inference (#8344 ) * First pass at adding batch support to TransformersSimilarityRanker * Add test * Add reno	2024-09-11 12:47:29 +02:00
Silvano Cerza	4d67b552e1	Fix Pipeline skipping a Component with Variadic input (#8347 ) * Fix Pipeline skipping a Component with Variadic input * Simplify _find_components_that_will_receive_no_input	2024-09-10 14:59:53 +02:00
Ulises M	145ca89a3f	feat: Expose default_headers and add kwargs for Azure Client (#8244 ) * default_headers and azure_kwargs added * update docstrings * dont forget about chat generator * Remove azure_kwargs argument --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-09-10 10:29:56 +00:00
jpatra72	b126c14e51	feat: Adds support for zero-shot document classification (#7669 ) (#8193 ) * feat: adds support for zero short document classification (#7669) Also, supports multi-label classification * pytests for zero shot document classification * release note * added licence info to py scripts * updated the format of licence info * Added doc string and example code * added review points highlighted in the PR * feat: adds support for zero short document classification (#7669) Also, supports multi-label classification * pytests for zero shot document classification * release note * added licence info to py scripts * updated the format of licence info * Added doc string and example code * added review points highlighted in the PR * Applied suggestions from doc string review Co-authored-by: Daria Fokina <daria.f93@gmail.com> * fixed pytest for init * added output type * added test for pipeline (de-) serialization --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Daria Fokina <daria.f93@gmail.com>	2024-09-10 11:00:05 +02:00
Silvano Cerza	da49e782e2	chore: Make `arrow` an optional dependency (#8345 ) * Make arrow an optional dependency * Fix imports	2024-09-09 16:09:51 +02:00
ArzelaAscoIi	720e54970f	fix: make from dict conditional router more resilient (#8343 ) * fix: make from dict conditional router more resilient * refactor: remove * dos: add release notes * fix: format	2024-09-09 15:11:52 +02:00
Mo Sriha	75955922b9	feat: Add current date in UTC to PromptBuilder (#8233 ) * initial commit * add unit tests * add release notes * update function name	2024-09-09 09:47:03 +02:00
Sebastian Husch Lee	06dd5c2f37	feat (v2): Update so `model_max_length` updates `max_seq_length` for Sentence Transformers (#8334 ) * Update so model_max_length does what is expected * Add release notes * Some fixes * Another test	2024-09-06 11:37:56 +02:00
Sriniketh J	e98a6fea04	Convertor: CSVToDocument (#8328 ) * carry forwarded initial commit * fix: doc strings * fix: update docstrings * fix: docstring update * fix: csv encoding in actions * fix: line endings through hooks * fix: converter docs addition	2024-09-06 10:59:12 +02:00
Vladimir Blagojevic	b2c19a8c7a	feat: `ChatPromptBuilder` copies entire `ChatMessage` rather than copying content field only (#8317 ) * Initial implementation of ChatMessage copy and deepcopy * Add reno release note * Satisfy hawkeye * Remove copy and deepcopy, no need to complicate things * Add new reno note * Add unit test	2024-09-02 18:06:38 +02:00
Silvano Cerza	3e3f79b928	feat: Add `unsafe` init arg in `ConditionalRouter` and `OutputAdapter` to enable previous behaviour (#8176 ) * Add unsafe behaviour to OutputAdapter * Add unsafe behaviour to ConditionalRouter * Add release notes * Fix mypy * Add documentation links --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-09-02 14:14:54 +00:00
Alper	e614fa0c62	refactor: Rename deserialize_document_store_in_init_parameters (#8302 ) * 8259 * update function name * rename and update docstring * fix linting * add a release note	2024-09-02 11:42:23 +02:00
Stefano Fiorucci	842a7b80a8	rm sentence_window_retrieval (#8303 )	2024-08-28 10:51:07 +02:00
David S. Batista	2f3257b77a	chore: removing deprecated `SentenceWindowRetrieval` (#8294 ) * removing deprecated SentenceWindowRetrieval * adding release notes * Rename TestSentenceWindowRetrieval to TestSentenceWindowRetriever --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-08-28 10:04:52 +02:00
Madeesh Kannan	f0b45c873f	feat: Extend core component machinery to support an async run method (experimental) (#8279 ) * feat: Extend core component machinery to support an async run method * Add reno * Fix incorrect docstring * Make `async_run` a coroutine * Make `supports_async` a dunder field	2024-08-27 14:20:13 +02:00
Madeesh Kannan	1fa30d4aaa	chore: Remove deprecated `debug` param from `Pipeline.run` (#8288 ) * chore: Remove deprecated `debug` param from `Pipeline.run` * Fix tests	2024-08-27 11:27:38 +02:00
David S. Batista	b411c14414	feat: The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window (#8283 ) * initial import * adding release notes * linting * improving docs and release notes * updating example	2024-08-27 10:30:12 +02:00
Stefano Fiorucci	2e619f06c8	fix: make meta produced by `DOCXToDocument` JSON serializable (#8263 ) * make meta from DOCXToDocument JSON serializable * unused import * update docstrings	2024-08-22 12:24:32 +00:00
Jon Strutz	471f07c8fe	fix: extract page breaks from .docx files (#8232 ) * fix: extract page breaks from .docx files Context: Currently, DOCXToDocument does not extract page breaks from word documents. This makes it impossible to do things like split by page or get correct page number metadata after using something like DocumentSplitter. For example, if you split by word, the 'page_number' metadata field will be 1 for all documents. Solution: Added a method to DOCXToDocument that extracts page breaks from word documents as '\f' characters so that they are recognized by DocumentSplitter. Caveat: Due to the way the python-docx library is set up, you can only accurately determine the location of the first page break for a given paragraph. In the rare case that a paragraph contains more than one page break (which means it is an extremely long paragraph spanning multiple pages), the 2nd, 3rd, etc. page break locations are not known. To sort of fix this, I just appended the page break characters to the end of the paragraph text to keep the overall page number values for the document consistent. * Apply suggestions from code review --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-08-21 09:48:02 +00:00
Sebastian Husch Lee	7fd0b6a013	feat: Add `min_top_k` to TopPSampler (#8228 ) * Add feature to Top P Sampler * Add release notes * Fix zip call * Fix mypy * Restore doc string and make mypy happy hopefully * Make mypy happy * PR comment * Revert change to make mypy happy * Add back type ignore * try to fix typing * Update haystack/components/samplers/top_p.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/samplers/top_p.py --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com> Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-08-21 11:29:23 +02:00
Madeesh Kannan	cf5fd2a821	chore: Remove deprecated `ChatMessage.to_openai_format` (#8242 ) * chore: Remove deprecated `ChatMessage.to_openai_format` * lint	2024-08-16 10:34:44 +02:00
Stefano Fiorucci	bcc4104729	refactor: utility function for docstore deserialization (#8226 ) * refactor docstore deserialization * more tests * reno; headers * expose key	2024-08-14 13:29:27 +02:00
Silvano Cerza	ab7eb25856	Add utility then step in feature testing to draw pipeline to file (#8209 )	2024-08-13 14:49:13 +02:00
Vladimir Blagojevic	3318d894c0	Add sede_with_list_output_type_in_pipeline unit test (#8196 )	2024-08-13 14:37:24 +02:00
Amna Mubashar	373de97426	Deprecate SentenceWindowRetrieval (#8206 )	2024-08-13 13:49:41 +02:00
Vladimir Blagojevic	21c507331c	feat: Implement apply_filter_policy and FilterPolicy.MERGE for the new filters (#8042 )	2024-08-09 12:04:24 +02:00
Nicola Procopio	4c798470b2	added `precision` parameter to sentence transformers embeddings (#8179 ) * added `precision` parameter to sentence transformers embeddings * fixed test * Update haystack/components/embedders/sentence_transformers_document_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/embedders/test_sentence_transformers_text_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/embedders/test_sentence_transformers_text_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * fix format * Update sentence_transformers_text_embedder.py --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-08-09 11:38:47 +02:00
Marie-Luise Klaus	ec02817f14	fix: OutputAdapter from_dict with custom_filters None (#8173 ) Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>	2024-08-08 14:02:40 +02:00
Corentin Meyer	58517014ec	fix: DocumentCleaner: keep the \f in text (#8078 ) * Keep the \f in Document Cleaner * Add Reno * Add Test * Simplified _remove_empty_lines() code	2024-08-07 14:50:14 +02:00
Marie-Luise Klaus	031b0bfbd8	fix: ChatPromptBuilder from_dict if template is None (#8165 ) * fix ChatPromptBuilder from dict if template=None * fix ChatPromptBuilder from dict if template=None * leave template None --------- Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>	2024-08-06 14:48:04 +02:00
Tim Wellbrock	2e2f5f17bb	feat: add unicode normalization & ascii_only mode for DocumentCleaner (#8103 ) * feat: add unicode normalization & ascii_only mode for DocumentCleaner. * feat: add unicode_normalization parameter valdiation to DocumentCleaner. * test: fix the unit test to work after code linting.	2024-08-05 13:00:39 +02:00
Stefano Fiorucci	e17d0c4192	chore: deprecate `to_openai_format` and create similar utility functions (#8146 ) * deprecate and add new specific functions * reno	2024-08-02 16:47:17 +02:00
Sebastian Husch Lee	c90495c2e8	feat: Add model and tokenizer kwargs to `TransformersSimilarityRanker`, `SentenceTransformersDocumentEmbedder`, `SentenceTransformersTextEmbedder` (#8145 ) * Start adding model and tokenizer kwargs support * Add model and tokenizer kwargs to doc embedder * Some updates and fixes in tests * Fix more tests * Fix tests * Add release note * Fix test * Add from_dict tests	2024-08-02 10:37:10 +02:00

1 2 3 4 5 ...

1451 Commits