haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-06 12:53:35 +00:00

Author	SHA1	Message	Date
Madeesh Kannan	33675b4caf	chore: Remove deprecated `DefaultConverter` for `PyPDFToDocument` (#8501 ) * chore: Remove deprecated `DefaultConverter` for `PyPDFToDocument` * Remove unused imports --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-10-29 16:42:48 +00:00
Bohan Qu	081b143aae	feat!: tracing with concurrency (#8489 )	2024-10-29 17:39:41 +01:00
Stefano Fiorucci	2045f6f16a	try test jsonschema (#8496 )	2024-10-29 16:21:51 +01:00
Vladimir Blagojevic	28161f7bb9	feat: DOCXToDocument: add table extraction (#8457 ) * DOCXToDocument: add table extraction * Add reno note * mypy fixes * add unit tests * Add csv table support * Update release note * Add TableFormat enum * Add table_format as str init param * Update docx.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * PR feedback * PR feedback --------- Co-authored-by: medsriha <medsriha@gmail.com> Co-authored-by: Mo Sriha <22803208+medsriha@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-10-29 16:20:27 +01:00
Silvano Cerza	8205724395	feat: Rework `Pipeline.run()` to better handle cycles (#8431 ) * draft * Enhance * Almost works * Simplify some parts and handle intermediate outputs * Handle connections with default * Handle cycles with multiple connections from two components * Update distributed outputs at the correct time * Remove Component inputs after it runs * Add agent pipeline test case * Fix infite loop test * Handle some corner cases with loops checking and inputs deletion * Fix tests * Add new behavioral test * Remove unused code in behavioural test * Fix behavioural test * Fix max run check * Simplify outputs distribution * Simplify subgraph run check * Remove unused _init_run_queue function * Remove commented code * Add some missing type hints * Simplify cycles breaking * Fix _distribute_output test * Fix _find_components_that_will_receive_no_input test * Fix validation test * Fix tracer losing Component inputs * Fix some linting issues * Remove ignore pylint rule * Rename method that break cycles and make it raise * Add docstring to _run_subgraph * Update Pipeline.run() docstring * Update comment to clarify cycles execution * Remove SelfLoop sample Component * Add behavioural test for unsupported cycles * Rename behavioural test to be more specific * Add new behavioural test * Add release notes * Remove commented out code and random pass * Use more efficient function to find cycles * Simplify _break_supported_cycles_in_graph by using defaultdict * Stop breaking edges as soon as we make the graph acyclic * Fix docstring and add some more comments * Fix _distribute_output docstring * Fix _find_receivers_from docstring * More detailed release notes * Minimize calls to networkx.is_directed_acyclic_graph * Add some more info on edges keys * Adjust components_in_cycles comment * Add new Pipeline behavioural test * Enhance _find_components_that_will_receive_no_input to cover more cases * Explain why run_queue is reset after running a subgraph cycle * Rename _init_inputs_state to _normalize_input_data * Better explain the subgraph output distribution * Remove for else * Fix some comments and docstrings * Fix linting * Add missing return type * Fix typo * Rename _normalize_input_data to _normalize_varidiac_input_data and add more documentation * Remove unused import --------- Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>	2024-10-29 15:43:16 +01:00
tstadel	d430833f8f	feat: streaming_callback as run param from HF generators (#8406 ) * feat: streaming_callback as run param from HF generators * apply feedback * add reno * fix test * fix test * fix mypy * fix excessive linting rule	2024-10-29 15:32:06 +01:00
Stefano Fiorucci	78292422f0	feat: allow passing `meta` in the `run` method of `FileTypeRouter` (#8486 ) * initial refactoring * progress * refinements * serde methods + tests * release note * comment * make additional_mimetypes internal attribute	2024-10-24 16:21:15 +02:00
Madeesh Kannan	906177329b	fix: Enforce basic Python types restriction on serialized component data (#8473 )	2024-10-22 17:08:36 +02:00
Alper	a556e11bf1	fix: window_size set during run instead of construction (#8463 ) * window_size set during runtime * revert init and update run with window_size * improved doc, removed print * adding release notes * updating tests * reverting docstring example * Update haystack/components/retrievers/sentence_window_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update haystack/components/retrievers/sentence_window_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * Update haystack/components/retrievers/sentence_window_retriever.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> --------- Co-authored-by: David S. Batista <dsbatista@gmail.com> Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-10-22 14:01:26 +00:00
David S. Batista	3a50d35f06	feat: allow `Generators` to run with a system prompt defined at run time (#8423 ) * initial import * Update haystack/components/generators/openai.py Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * docs: fixing * supporting the three use cases: no system prompt, using system prompt defined at init, using system prompt defined at run time * renaming 'run_time_system_prompt' to 'system_prompt' * adding tests, converting methods to static --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-10-22 11:21:10 +02:00
Stefano Fiorucci	322f63de6d	feat: Logging Tracer (#8447 ) * logging tracer: first draft * progress * more tests * license header * avoid interference with other tests * release note * incorporate feedback from review * Update haystack/tracing/logging_tracer.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-10-21 09:47:46 +02:00
Ajit Singh	6cf13e8b98	enhancement: reduced usage of numpy and substituted built-in libraries (#8418 ) * reduced usage of numpy and substituted built-in libraries * added release note * edited expit function to support both float as well as list (this case was giving error CI) * revert code , numpy can't be removed here * more cleaning * fix relnote --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2024-10-18 15:42:19 +02:00
Stefano Fiorucci	dfd339ca2d	ensure compatibility with huggingface_hub==0.26.0 (#8464 )	2024-10-18 08:38:48 +00:00
tstadel	8613bb7653	fix: logs containing JSON getting lost (#8456 ) * fix: logs getting lost * add test * add reno	2024-10-15 14:11:14 +02:00
Alper	b40f0c8b5d	feat: SentenceTransformersTextEmbedder supports `config_kwargs` (#8432 ) * add config_kwargs * disable PLR0913 for a specific function * add a release note * refer to AutoConfig in config_kwargs docstring --------- Co-authored-by: David S. Batista <dsbatista@gmail.com> Co-authored-by: Julian Risch <julianrisch@gmx.de>	2024-10-14 16:08:53 +00:00
David S. Batista	b81abc0c85	feat: SentenceTransformersDocumentEmbedder supports `config_kwargs` (#8433 ) * initial import * adding release notes	2024-10-14 17:43:04 +02:00
David S. Batista	5867fa1f34	fix: whisper transcription test use github url + update test (#8455 ) * adding audio file * changing URL * updating tests * temporary removing failing test * updating tests * removing failing test * typo * linting * fixing URL * updating tests	2024-10-14 16:24:52 +02:00
David S. Batista	a50593ede0	fix: whisper tests using audio file from our github repo (#8454 ) * adding audio file * temporary removing failing test * removing failing test	2024-10-14 12:56:37 +02:00
Madeesh Kannan	e7bfd80f3b	fix: (Temporarily) Re-add suport for pre-2.6.0 YAMLs with `PyPDFConverter` (#8443 )	2024-10-08 14:35:43 +02:00
Madeesh Kannan	ee89f6ad57	fix: `PyPDFToDocument` correctly serializes custom converters, deprecate `DefaultConverter` (#8430 ) * fix: `PyPDFToDocument` correctly serializes custom converters, deprecate `DefaultConverter` * Remove `auto` prefix from serde util function names, add unit tests	2024-10-01 16:35:38 +02:00
Julian Risch	08686d90af	feat: Add DocumentNDCGEvaluator component (#8419 ) * draft new component and tests * draft new component and tests * fix tests, replace usage of get_attr * improve docstrings, refactor tests * add test for mixed documents w/wo scores * add test with multiple lists and update docstring * validate inputs, add tests, make methods static * change fallback to binary relevance * rename validate_init_parameters to validate_inputs	2024-10-01 16:15:02 +02:00
Silvano Cerza	d6f073f9b3	Revert "fix: make pypdf converter more robust (#8427 )" (#8428 ) This reverts commit d234c75168dcb49866a6714aa232f37d56f72cab.	2024-10-01 11:55:25 +02:00
Tobias Wochinger	d234c75168	fix: make pypdf converter more robust (#8427 ) * fix: make `from_dict` of `PyPDFToDocument` more robust * chore: drop trailing space * converting method to static and making the comment shorter * reverting method to static --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-09-30 16:47:23 +00:00
Ajit Singh	2dd8089409	chore: Removed deprecated max_loop_allowed argument from Pipeline init (#8409 ) * Added equality check for sender and receiver in connection function of pipeline * Update base.py irrelevant changes reverted * added release note * removed deprecated param max_loops_allowed from pipeline init * added release note * revert non relevant test * Delete releasenotes/notes/remove-support-to-connect-component-to-self-6eedfb287f2a2a02.yaml * revery non relevant change * Remove unused test_pipeline_deprecated.yaml * Remove PipelineMaxLoops error * Update release notes --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-09-30 15:58:05 +02:00
Ajit Singh	7ba30d5691	feat: `Pipeline.connect()` will now raise a `PipelineConnectError` if `sender` and `receiver` are the same Component (#8403 ) * Added equality check for sender and receiver in connection function of pipeline * Update base.py irrelevant changes reverted * added release note * altered a walk with cycle test * added a test to verify that pipeline raises PipelineConnectError when adding a component to itself * Update release notes * Remove self connection feature tests * Tidy up connect unit test --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-09-30 15:52:36 +02:00
Silvano Cerza	29672d4b42	feat: Add `JSONConverter` Component (#8397 ) * Add JSONConverter Component * Handle some corner cases * Add JSONConverter to pydoc config * Add a way to extract all non content fields as metadata * Small fix in docstring * Fix tests * docstrings upd * Update json.py --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2024-09-25 12:34:51 +02:00
Silvano Cerza	0df379e6a2	feat: Deprecate `@component` decorator `is_greedy` argument (#8400 ) * Deprecate @component decorator is_greedy argument * Fix some typos and docstrings * Add _is_lazy_variadic test	2024-09-25 11:28:30 +02:00
Sebastian Husch Lee	74f7c6fdfb	Set max_runs_per_component to 1 for pipelines that are linear (#8393 )	2024-09-24 14:44:45 +02:00
Vladimir Blagojevic	09b95746a2	feat: HuggingFaceAPIChatGenerator add token `usage` data (#8375 ) * Ensure HuggingFaceAPIChatGenerator has token usage data * Add reno note * Fix release note	2024-09-23 15:40:50 +02:00
Sriniketh J	066e2e3ec5	Make api_key param optional in LLMEvaluator (#8340 )	2024-09-20 10:47:13 +02:00
Sebastian Husch Lee	2235ce673f	test: Move pipeline test to behavorials (#8377 )	2024-09-19 16:59:35 +02:00
Vladimir Blagojevic	514e0abc39	fix: Fix nltk imports (#8381 )	2024-09-18 11:25:21 +00:00
Madeesh Kannan	b22014b915	fix: Prevent `set_output_types` from being called when the `output_types` decorator is used (#8376 )	2024-09-18 13:05:31 +02:00
Vladimir Blagojevic	badd0594cc	feat: Port NLTKDocumentSplitter from dC to Haystack (#8350 ) * Port NLTKDocumentSplitter from dC to Haystack * Improve pydocs * Use haystack logging * Add NLTKDocumentSplitter to __init__.py * Use haystack logging, rename test classes * Fixing _needs_join return * Linting * PR feedback * More static methods * Increase test coverage * Compile pattern --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-09-17 13:59:19 +02:00
Madeesh Kannan	5071e47843	refactor: Rename `Component.async_run` to `Component.run_async` for better readablility (#8370 ) Using a suffix will keep names logically sorted, less noisy and relegate the async aspect to an implementation/API detail.	2024-09-17 10:10:34 +00:00
David S. Batista	97126eb544	fix: changing default model to `gpt-4o-mini` on OpenAI API calls (#8360 ) * chaning default model to gpt-4o-mini * adding release notes * fixing some missed tests * fixing some more missed tests * fixing one last missed test * fixing linting issues * making pylint happy about an end2end test * chaning if test to walruss operator * fixing azure embedder from ada to text-embedding-ada-002	2024-09-17 10:36:42 +02:00
Giovanni Alzetta, PhD	4106e7e8d1	feat : DocumentSplitter, adding the option to split_by function (#8336 ) * Adding splitting function * Adding test for split by function * Adding release note for feat adding split by function * Fixing release note for split_by_function * Fixing issue with splitting_function non callable * nit: fixing value error in documentsplitter for split_by * Add custom serde --------- Co-authored-by: Giovanni Alzetta <giovannialzetta@gmail.com> Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2024-09-12 16:38:37 +02:00
Vladimir Blagojevic	7e9f153e78	chore: Remove all references to old filter syntax (#8342 ) * Remove all references to old filter syntax * More removals * Lint * Do not remove test_filter_retriever.py * Add reno note * Update ValueError text to match text in haystack-core-integrations	2024-09-12 16:28:31 +02:00
Madeesh Kannan	672bcf7e03	fix: Add constraints to `set_input_type(s)` based on `run` method (#8358 ) * fix: Prevent the usage of `set_input_type(s)` when the `run` method doesn't have kwargs, raise if `set_input_type(s)` overrides `run` method parameters * fix: update components and tests * reno	2024-09-12 15:58:16 +02:00
Silvano Cerza	5514676b5e	feat: Deprecate `max_loops_allowed` in favour of new argument `max_runs_per_component` (#8354 ) * Deprecate max_loops_allowed in favour of new argument max_runs_per_component * Add missing test file * Some enhancements * Add version that will remove deprecate stuff	2024-09-12 11:00:12 +02:00
Sebastian Husch Lee	7227bcf9df	feat: TransformerSimilarityRanker add batching across Documents during inference (#8344 ) * First pass at adding batch support to TransformersSimilarityRanker * Add test * Add reno	2024-09-11 12:47:29 +02:00
Silvano Cerza	4d67b552e1	Fix Pipeline skipping a Component with Variadic input (#8347 ) * Fix Pipeline skipping a Component with Variadic input * Simplify _find_components_that_will_receive_no_input	2024-09-10 14:59:53 +02:00
Ulises M	145ca89a3f	feat: Expose default_headers and add kwargs for Azure Client (#8244 ) * default_headers and azure_kwargs added * update docstrings * dont forget about chat generator * Remove azure_kwargs argument --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-09-10 10:29:56 +00:00
jpatra72	b126c14e51	feat: Adds support for zero-shot document classification (#7669 ) (#8193 ) * feat: adds support for zero short document classification (#7669) Also, supports multi-label classification * pytests for zero shot document classification * release note * added licence info to py scripts * updated the format of licence info * Added doc string and example code * added review points highlighted in the PR * feat: adds support for zero short document classification (#7669) Also, supports multi-label classification * pytests for zero shot document classification * release note * added licence info to py scripts * updated the format of licence info * Added doc string and example code * added review points highlighted in the PR * Applied suggestions from doc string review Co-authored-by: Daria Fokina <daria.f93@gmail.com> * fixed pytest for init * added output type * added test for pipeline (de-) serialization --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Daria Fokina <daria.f93@gmail.com>	2024-09-10 11:00:05 +02:00
Silvano Cerza	da49e782e2	chore: Make `arrow` an optional dependency (#8345 ) * Make arrow an optional dependency * Fix imports	2024-09-09 16:09:51 +02:00
ArzelaAscoIi	720e54970f	fix: make from dict conditional router more resilient (#8343 ) * fix: make from dict conditional router more resilient * refactor: remove * dos: add release notes * fix: format	2024-09-09 15:11:52 +02:00
Mo Sriha	75955922b9	feat: Add current date in UTC to PromptBuilder (#8233 ) * initial commit * add unit tests * add release notes * update function name	2024-09-09 09:47:03 +02:00
Sebastian Husch Lee	06dd5c2f37	feat (v2): Update so `model_max_length` updates `max_seq_length` for Sentence Transformers (#8334 ) * Update so model_max_length does what is expected * Add release notes * Some fixes * Another test	2024-09-06 11:37:56 +02:00
Sriniketh J	e98a6fea04	Convertor: CSVToDocument (#8328 ) * carry forwarded initial commit * fix: doc strings * fix: update docstrings * fix: docstring update * fix: csv encoding in actions * fix: line endings through hooks * fix: converter docs addition	2024-09-06 10:59:12 +02:00
Vladimir Blagojevic	b2c19a8c7a	feat: `ChatPromptBuilder` copies entire `ChatMessage` rather than copying content field only (#8317 ) * Initial implementation of ChatMessage copy and deepcopy * Add reno release note * Satisfy hawkeye * Remove copy and deepcopy, no need to complicate things * Add new reno note * Add unit test	2024-09-02 18:06:38 +02:00

1 2 3 4 5 ...

1524 Commits