haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-07 00:51:22 +00:00

Author	SHA1	Message	Date
Vladimir Blagojevic	6e86f4e26a	Update embedding integration tests (#6823 )	2024-01-24 15:22:47 +01:00
Vladimir Blagojevic	c47b82c54f	Remove pipeline_utils package and dependent code (#6806 )	2024-01-23 18:40:43 +01:00
Ashwin Mathur	a238c6dd51	feat: Add Exact Match metric (#6696 ) * Add exact match metric * Add release notes * Cleanup comments in test_eval_exact_match.py * Create separate preprocessing function; Add output_key parameter * Update release note --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2024-01-22 09:57:04 +01:00
Silvano Cerza	d4f6531c52	feat: Refactor `Pipeline.run()` (#6729 ) * First rough implementation of refactored run * Further improve run logic * Properly handle variadic input in run * Further work * Enhance names and add more documentation * Fix issue with output distribution * This works * Enhance run comments * Mark Multiplexer as greedy * Remove MergeLoop in favour of Multiplexer in tests * Remove FirstIntSelector in favour of Multiplexer * Handle corner when waiting for input is stuck * Remove unused import * Handle mutable input data in run and misbehaving components * Handle run input validation * Test validation * Fix pylint * Fix mypy * Call warm_up in run to fix tests	2024-01-18 17:53:47 +01:00
Vladimir Blagojevic	0b177b3bc6	feat: Improve OpenAPIServiceConnector service response serialization (#6772 ) * Better service response json -> str serialization * Add unit test	2024-01-18 16:49:48 +01:00
Vladimir Blagojevic	fea1428e84	feat: Add `HuggingFaceLocalChatGenerator` (#6751 )	2024-01-18 15:53:12 +01:00
Madeesh Kannan	5d66d040cc	feat: Add serde methods to `HTMLToDocument` (#6758 )	2024-01-18 10:02:01 +01:00
Sebastian Husch Lee	c0b67432e4	feat: Add page breaks to default PDF to Document converter (#6755 ) * Speedup tests for PyPDFToDocument * Added unit test and removed skipping of empty pages * add release note * Add back some integration marks	2024-01-18 08:54:59 +01:00
sahusiddharth	a7ac4edd07	feat: added split by page to `DocumentSplitter` (#6753 ) * feat-added-split-by-page-to-DocumentSplitter * added test case and the suggested changes * Update document_splitter.py * Update haystack/components/preprocessors/document_splitter.py * Update test_document_splitter.py --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-01-17 15:36:29 +01:00
Madeesh Kannan	7376838922	feat!: Framework-agnostic device management (#6748 ) * feat: Framework-agnostic device management * Add release note * Linting * Fix test * Add `first_device` property, expand release notes, validate `ComponentDevice` state	2024-01-17 10:41:34 +01:00
ZanSara	b8b8b5d5c6	feat!: rename `model_name_or_path` to `model` in `NamedEntityExtractor` (#6744 ) * rename model_name_or_path to simply model * fix tests * reno	2024-01-16 15:32:48 +01:00
Sebastian Husch Lee	20f04f6054	feat: MetaFieldRanker update (#6742 ) * Add weight and ranking_mode as params to run for easier experimentation * renaming of metadata to meta * User logger.warning instead of warnings * Add another unit test * Add support for sort_order and fix formatting of error messages * Make MetaFieldRanker more robust. Doesn't crash pipeline if some Documents are missing keys. * Don't print same warning message twice * Add another test * Making MetaFieldRanker more robust * Move up if return statement to earlier in the function * Setting up infer_type * Remove infer_type for now * Release notes * Add init file * Update releasenotes/notes/metafieldranker_sort-order_refactor-2000d89dc40dc15a.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-01-16 08:52:58 +01:00
Vladimir Blagojevic	8cafff0645	refactor: Extract HF stop words handling in `hf_utils.py` (#6745 ) * Move StopWordsCriteria to hf_utils.py * Raise ValueError for invalid StopWordsCriteria tokenizer * StopWordsCriteria, make sure padding token exists * Use proper torch types * Update unit tests	2024-01-15 17:42:29 +01:00
ZanSara	96c0b59aaa	feat!: Rename `model_name_or_path` to `model` in `ExtractiveReader` (#6736 ) * rename model parameter and internam model attribute in ExtractiveReader * fix tests for ExtractiveReader * fix e2e * reno * another fix * review feedback * Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml	2024-01-15 14:48:33 +01:00
Stefano Fiorucci	8eba053dbc	fix pipeline test (#6741 )	2024-01-15 13:59:11 +01:00
Madeesh Kannan	a5189dd035	fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 (#6717 ) * fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 Also update tests to accommodate the new behavior. * Remove superfluous code	2024-01-12 17:50:55 +01:00
Madeesh Kannan	4647f2a506	fix: `ComponentMeta.__call__` handles keyword- and positional-only parameters correctly (#6701 ) * fix: `ComponentMeta.__call__` handles keyword- and positional-only parameters correctly * Update release note	2024-01-12 17:16:03 +01:00
ZanSara	0616197b44	feat!: Rename `model_name_or_path` to `model` in `TransformersSimilarityRanker` (#6734 ) * rename model parameter in transformers ranker * fix tests for transformers ranker * reno * reno * typo	2024-01-12 17:09:12 +01:00
ZanSara	288ed150c9	feat!: Rename `model_name` or `model_name_or_path` to `model` in all Embedder classes (#6733 ) * rename model parameter in the openai doc embedder * fix tests for openai doc embedder * rename model parameter in the openai text embedder * fix tests for openai text embedder * rename model parameter in the st doc embedder * fix tests for st doc embedder * rename model parameter in the st backend * fix tests for st backend * rename model parameter in the st text embedder * fix tests for st text embedder * fix docstring * fix pipeline utils * fix e2e * reno * fix the indexing pipeline _create_embedder function * fix e2e eval rag pipeline * pytest	2024-01-12 15:30:17 +01:00
ZanSara	ce7abc9bde	feat!: Rename `model_name` or `model_name_or_path` to `model` in all Transcriber classes (#6731 ) * rename model parameter in local transcriber * fix tests for local transcriber * rename model parameter in remote transcriber * fix tests for remote transcriber * reno --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-01-12 14:40:30 +01:00
Stefano Fiorucci	24c71bd221	rename model_name_or_path to model in test (#6732 )	2024-01-12 13:56:14 +01:00
sahusiddharth	dbdeb8259e	feat: rename `model_name` or `model_name_or_path` to `model` in generators (#6715 ) * renamed model_name or model_name_or_path to model * added release notes * Update releasenotes/notes/renamed-model_name-or-model_name_or_path-to-model-184490cbb66c4d7c.yaml --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2024-01-12 12:58:01 +01:00
Stefano Fiorucci	80c3e6825a	fix: serialize/deserialize torch dtype in the components that need it (#6713 ) * first draft for ranker * same for the reader * consider also bnb_4bit_compute_dtype * dtype serialization in hugging_face_local_generator * add release note * address dtype defined in huggingface_pipeline_kwargs * test quantization options in reader * fix * serialize quantization_config * test quantization_config serialization * address feedback * fix typo	2024-01-12 12:22:45 +01:00
ZanSara	60780ce897	feat: Tweak `CacheChecker` output type (#6719 ) * specify cache checker output type * (de)serialization * tests * add default value for type * reno * mypy * feedback * reduce diff * reduce diff * reno	2024-01-11 12:33:26 +01:00
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00
Ashwin Mathur	374a937663	feat: Add `calculate_metrics` and `MetricsResult` (#6680 ) * Add calculate_metrics, MetricsResult, Exact Match * Add additional tests for metric calculation * Add release notes * Add docstring for Exact Match metric * Remove Exact Match Implementation * Update release notes * Remove unnecessary metrics implementation * Simplify logic to run supported metrics * Add some evaluation tests * Fix linting --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-01-10 10:26:44 +01:00
Madeesh Kannan	e6d6ce1c73	feat: Add `NamedEntityExtractor`component (#6689 ) * feat: Add `NamedEntityExtractor`component This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key. The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy. * Address comments * Expand release note * Add the `[torch]` extra package specifier to the lazy import * Remove dead code --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-01-09 17:56:20 +01:00
ZanSara	abd16ab796	feat: support single metadata dictionary in `MarkdownToDocument` (#6629 ) * support single metadata dict in markdown2document * reno * unwrap list * direct key access * typing * add explicit test	2024-01-09 14:44:39 +01:00
Massimiliano Pippi	9ace6bf63d	feat: store input's default value in `InputSocket` (#6651 ) * track default value in sockets * remove dead code * include default value in socket description * add unit test * add relnote * unused import * clarify	2024-01-09 12:17:46 +01:00
ZanSara	175b5baf45	feat: support single metadata dictionary in `AzureOCRDocumentConverter` (#6635 ) * support single metadata dict in azureconverter * reno * tests * Update releasenotes/notes/single-meta-in-azureconverter-ce1cc196a9b161f3.yaml	2024-01-09 10:49:37 +01:00
ZanSara	974d65f30a	feat: support single metadata dictionary in `TikaDocumentConverter` (#6698 ) * reno * converter * test * comment	2024-01-09 09:49:47 +01:00
Massimiliano Pippi	93b2aaee09	chore: move `DocumentJoiner` to new `joiners` package (#6692 ) * move DocumentJoiner to new joiners package * relnote * leftovers * fix docstrings generation * fix unrelated pydoc misconfiguration * more unrelated work, yay! * fix assertions	2024-01-08 22:06:27 +01:00
Silvano Cerza	9445b2d466	Fix skipif with empty env var (#6704 )	2024-01-08 19:19:14 +01:00
Silvano Cerza	607e7d1488	Skip integration tests if env var is missing (#6703 )	2024-01-08 17:15:10 +01:00
Vladimir Blagojevic	9e0b58784f	feat: Improve UrlCacheChecker, make it more generic (#6699 ) * Rename UrlCacheChecker to CacheChecker, make it field generic * Add release note	2024-01-08 16:15:27 +01:00
Sebastian Husch Lee	beade1cef9	feat: Add scaling and thresholding of the similarity ranker scores (#6683 ) * Add scale_score functionality to the TransformersSimilarityRanker * Updated test to check scores * Use pytest approx when comparing floats * Updated how scale score works and added calibration factor. Started to add score threshold. * Add support for score_threshold * Add some parameters to the run method * Add release notes * Fix mypy * Be more tolerant on the score values * Adding unit test for scale_score=False * Add unit test for score threshold * Update tests * Rename test * Fix typo * PR comments	2024-01-08 09:05:24 +01:00
Vladimir Blagojevic	552f0e394b	feat: Add Azure embedders support (#6676 ) * Add Azure embedders --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-01-05 15:49:25 +01:00
Vladimir Blagojevic	b7159ad7c2	feat: Add AzureOpenAIGenerator and AzureOpenAIChatGenerator (#6648 ) * Add AzureOpenAIGenerator and AzureOpenAIChatGenerator	2024-01-05 15:48:28 +01:00
Stefano Fiorucci	bb2b1a20f8	refactor: optimize API keys reading (#6655 ) * centralize API keys handling * fix mypy and pylint * rm utility function, be more explicit	2024-01-05 10:40:03 +01:00
Vladimir Blagojevic	1336456b4f	Update prompt builders examples (#6681 )	2024-01-04 16:54:26 +01:00
Vladimir Blagojevic	090d66b531	feat: Update OpenAIChatGenerator to handle both tools and functions calling (#6639 ) * Handle tools parameter in OpenAIChatGenerator * Handle tools/functions parameter in OpenAIChatGenerator streaming mode * Adjust OpenAPIServiceConnector to handle tools parameter * We never deal with functions/tools in non-chat generator * Add release note	2023-12-28 17:29:47 +01:00
Stefano Fiorucci	c773c30c66	refactor!: rename all remaining `metadata` to `meta` (#6650 ) * change metadata to meta * release note	2023-12-28 12:18:15 +01:00
Vladimir Blagojevic	ef2f6bd681	feat: Split `DynamicPromptBuilder` and `DynamicChatPromptBuilder` (#6557 ) * Split DynamicPromptBuilder * Add release note * Julian PR feedback * dynamicchatbuilder lg upd * dynamicpromptbuilder lg upd --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2023-12-26 15:27:43 +01:00
Vladimir Blagojevic	506ab81d26	chore: Rename GPT generators, deprecate old names (#6626 )	2023-12-22 19:37:29 +01:00
ZanSara	c0f1dab454	feat: support single metadata dictionary in `PyPDFToDocument` (#6615 ) * support single metadata dict in pypdf2document * improve tests * tests * remove line	2023-12-22 14:13:11 +01:00
Stefano Fiorucci	8469c7f702	chore: upgrade transformers to 4.36.2 in test requirements (#6610 ) * Update test_requirements.txt * make tests run when tests requirements change --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-12-21 16:48:24 +01:00
ZanSara	ff55985e2d	feat: support single metadata dictionary in `HTMLToDocument` (#6613 ) * support single metadata in HTMLToDocument * reno * docstring	2023-12-21 16:45:31 +01:00
Vladimir Blagojevic	4d08be0c2a	feat: Update OpenAI Python Client in Haystack 2.x (#6584 ) * Update openai python client * Add release note * Consolidate multiple mock_chat_completion into one * Ensure all components have api_base_url, organization params * Update tests * Enable function calling * Oversight * Minor fixes, add streaming test mocks * Apply suggestions from code review Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * metadata -> meta --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2023-12-21 16:21:24 +01:00
ZanSara	cf79aa1485	feat: add support for single meta dict in `TextFileToDocument` (#6606 ) * add support for single meta dict * reno * reno * mypy * extract to function * docstring * mypy	2023-12-21 14:21:17 +01:00
Stefano Fiorucci	7cc6080dfa	chore: replace metadata w meta in tests/examples (#6612 ) * replace metadata w meta in tests/examples * do not touch already broken e2e tests * Revert "do not touch already broken e2e tests" This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.	2023-12-21 14:09:31 +01:00

... 7 8 9 10 11 ...

1524 Commits