haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-10-22 13:28:44 +00:00

Author	SHA1	Message	Date
Stefano Fiorucci	44b5ae291c	specify CPU device in warm_up test (#7014 )	2024-02-16 13:01:57 +01:00
Stefano Fiorucci	0aa788facc	refactor!: LocalWhisperTranscriber - new devices mgmt (#7008 ) * wip * whisper local transcriber: use new device mgmt * better from_dict + test * reno	2024-02-16 11:25:53 +01:00
Silvano Cerza	a7209f6413	Mark OpenAPIServiceConnector integration test as flaky (#7007 )	2024-02-15 19:33:34 +01:00
Tuana Çelik	e2cee468fc	fix: Adding `api_base_url` to `OpenAITextEmbeder` self assignments (#7004 ) * assigning api_base_url This fix resolves issues with the MistralTextEmbedder integration * adding base url to `to_dict` and the tests * adding release note * Update fix-openai-base-url-assignment-0570a494d88fe365.yaml --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-02-15 17:35:28 +01:00
Silvano Cerza	6fe1d3b595	refactor: Clean eval components (#7005 ) * Remove preprocess.py * Rename eval components to evaluators	2024-02-15 17:17:59 +01:00
Silvano Cerza	2b8a606cb8	refactor: Refactor `StatisticalEvaluator` (#6999 ) * Refactor StatisticalEvaluator * Update StatisticalEvaluator * Rename StatisticalMetric.from_string to from_str and change internal logic Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Fix tests --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-02-15 16:47:35 +01:00
Silvano Cerza	c82f787b41	feat: Add `TextCleaner` component (#6997 ) * Add TextCleaner component * Update docstrings and simplify run logic * Update docstrings	2024-02-15 16:10:38 +01:00
Silvano Cerza	2a4e6a1de2	refactor: Refactor `SASEvaluator` (#6998 ) * Remove preprocessing from SASEvaluator and add warm_up method * Update docstrings	2024-02-15 16:05:43 +01:00
Vladimir Blagojevic	5a8d02064b	feat: Add JsonSchemaValidator (#6937 ) * Add JsonSchemaValidator --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-02-15 14:07:01 +01:00
Silvano Cerza	36ab23d360	feat: Add `StatisticalEvaluator` component (#6982 ) * Add StatisticalEvaluator component * Remove F1 and Exact Metric from old API * Add release notes * Update docstrings	2024-02-14 16:48:03 +01:00
Silvano Cerza	9297fca520	feat: Add `SASEvaluator` component (#6980 ) * Add SASEvaluator component * Add release notes * Delete old tests * Remove SAS metric in old API * Avoid importing whole numpy package	2024-02-14 16:16:22 +01:00
Vladimir Blagojevic	8d46a2883e	feat: Make system_messages optional in OpenAPIServiceToFunctions run (#6825 ) * Make system_messages optional in OpenAPIServiceToFunctions run * Adjust unit test * PR feedback Massi	2024-02-14 16:04:35 +01:00
Vladimir Blagojevic	6a776e672f	Add OutputAdapter sede for custom filters (#6985 )	2024-02-13 16:56:43 +01:00
Sebastian Husch Lee	ea7275955d	feat: Meta field ranker add `meta_value_type` (#6977 ) * Update MetaFieldRanker to parse string meta values based on meta_value_type * Add some unit tests * Add another unit test * Add release notes * Fix mypy * Fix pylint * Add more unit tests * Update release notes * Update docs * Further improve doc strings	2024-02-13 13:08:35 +01:00
Vladimir Blagojevic	97a0df66d2	feat: Add OutputAdapter (#6936 ) * Add OutputAdapter component --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-02-13 13:03:50 +01:00
Vladimir Blagojevic	a311d82593	feat: Externalize callable serialization so it can be reused (#6979 ) * Callback (de)serialization * Add unit tests * Replace callback handler sede with callable sede * Remove unused functions --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-02-13 13:00:49 +01:00
Vladimir Blagojevic	37d9de3c4e	feat: Add service_credentials to OpenAPIServiceConnector run (#6962 ) * Add service_credentials to OpenAPIServiceConnector run * PR feedback Silvano	2024-02-09 16:03:27 +01:00
Bijay Gurung	74683fe74d	Feat: Add FilterRetriever (#6836 ) * Add FilterRetriever draft * Implement FilterRetriever and add tests * Update comparison to compare whole docs instead of just contents * Expose FilterRetriever at the retrievers level * Update docstring (add example usage) * Add filter_retriever in the API reference docs config Update retriever search path to start one dir level higher * simplify _documents_equal * improve usage example --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2024-02-08 08:48:46 +01:00
Vladimir Blagojevic	9e6a2e3cf9	fix: HuggingFaceTGIGenerator gets stuck when model is not supported (#6915 ) * HuggingFaceTGIGenerator/HuggingFaceTGIChatGenerator check if model is deployed on free-tier	2024-02-06 16:55:06 +01:00
ZanSara	1182c08daf	fix: Dont filter negative scores when using `BM25Okapi` and `scale_score=False` (#6889 ) * dont filter negatives for unscaled Okapi * change BM25 algorithm default to BM25L * Update haystack/document_stores/in_memory/document_store.py * improve comment	2024-02-06 11:07:27 +01:00
Massimiliano Pippi	7d29ddba42	chore: merge hf utils modules into one (#6921 ) * merge hf utils modules * relnotes * lint * Update releasenotes/notes/merge-hf-utils-modules-5c16e04025123568.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-02-06 09:59:25 +01:00
Silvano Cerza	0191b1e6e4	feat: Change Component's I/O dunder type (#6916 ) * Add Pipeline.get_component_name() method * Add utility class to ease discoverability of Component I/O * Move InputOutput in component package * Rename InputOutput to _InputOutput * Raise if inputs or outputs field already exist * Fix tests * Add release notes * Move InputSocket and OutputSocket in types package * Move _InputOutput in socket package * Rename _InputOutput class to Sockets * Simplify Sockets class * Dictch I/O dunder fields in favour of inputs and outputs fields * Update Sockets docstrings * Update release notes * Fix mypy * Remove unnecessary assignment * Remove unused logging * Change SocketsType to SocketsIOType to avoid confusion * Change sockets type and name * Change Sockets.__repr__ to return component instance * Fix linting * Fix sockets tests * Revert to dunder fields for Component IO * Use singular in IO dunder fields * Delete release notes * Update haystack/core/component/types.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-02-05 17:46:45 +01:00
sahusiddharth	3bd6ba93ca	feat:Add dimensions parameter to OpenAI Embedders to fully support th… (#6841 ) * feat:Add dimensions parameter to OpenAI Embedders to fully support the new models * fixed linting * changed != None to is not None	2024-02-05 16:20:46 +01:00
Madeesh Kannan	27d1af3068	feat!: Use `Secret` for passing authentication secrets to components (#6887 ) * feat!: Use `Secret` for passing authentication secrets to components * Add comment to clarify type ignore	2024-02-05 13:17:01 +01:00
ZanSara	9af6c7e442	add some tolerance to Roberta test (#6880 )	2024-01-31 17:19:07 +01:00
Sebastian Husch Lee	ceda4cd655	feat: Add support for `device_map` (#6679 ) * Getting device_map working to support 8bit loading and multi device inference * Update to take account the device specified by the user * add release notes * Add device_map support for ExtractiveReader * Update test * Update to model that doesn't have issues * Update test * Update pytest approx * Update release notes * Start supporting device map * Update ExtractiveReader to use new ComponentDevice * Update similarity ranker to follow extractive reader implementation * Fixing pylint * Make mypy mostly happy * Add new unit test to test device_map * Adding unit tests * Some refactoring * Add more tests * Add more tests * Add another unit test * Update first_device property to return a ComponentDevice to be able to use the to methods * Updating tests for test_device * Update tests and now explicitly modify device_map in model_kwargs * Update haystack/utils/hf.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Make mypy happy * mypy * Remove unneeded optional flag * Update ExtractiveReader with new logic * Update ranker to follow new logic * Removing unneeded code * Make mypy happy * fxi pylint * Fix test * Adding unit tests for device_map="auto" * Add unit tests for ranker * PR comments * Make util method * Adding unit tests * Fix type annotation * Fix pylint * Fix test --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2024-01-30 13:47:57 +01:00
Silvano Cerza	f5e61338ba	chore: Remove all mentions of Canals (#6844 ) * Remove unnecessary Connection class * Remove all mentions of canals * Add release notes	2024-01-29 17:26:11 +01:00
Massimiliano Pippi	acf4cd502f	refact: Rename helper function (#6831 ) * change function name * add api docs * release notes	2024-01-26 16:00:02 +01:00
Sebastian Husch Lee	3bea3b1714	feat: Add query and document prefix options for the TransformerSimilarityRanker (#6826 ) * Add query and doc prefix * Fix some tests * add release notes	2024-01-25 15:29:19 +01:00
Rob Pasternak	7358b910d7	feat: Weights and score normalization for DocumentJoiner with reciprocal rank fusion (#6735 ) * Add weighting and score normalization for DocumentJoiner w/ reciprocal rank fusion (fix trailing whitespace) * Add release notes * Add unit test * Update release note --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2024-01-24 15:45:53 +01:00
Vladimir Blagojevic	6e86f4e26a	Update embedding integration tests (#6823 )	2024-01-24 15:22:47 +01:00
Vladimir Blagojevic	0b177b3bc6	feat: Improve OpenAPIServiceConnector service response serialization (#6772 ) * Better service response json -> str serialization * Add unit test	2024-01-18 16:49:48 +01:00
Vladimir Blagojevic	fea1428e84	feat: Add `HuggingFaceLocalChatGenerator` (#6751 )	2024-01-18 15:53:12 +01:00
Madeesh Kannan	5d66d040cc	feat: Add serde methods to `HTMLToDocument` (#6758 )	2024-01-18 10:02:01 +01:00
Sebastian Husch Lee	c0b67432e4	feat: Add page breaks to default PDF to Document converter (#6755 ) * Speedup tests for PyPDFToDocument * Added unit test and removed skipping of empty pages * add release note * Add back some integration marks	2024-01-18 08:54:59 +01:00
sahusiddharth	a7ac4edd07	feat: added split by page to `DocumentSplitter` (#6753 ) * feat-added-split-by-page-to-DocumentSplitter * added test case and the suggested changes * Update document_splitter.py * Update haystack/components/preprocessors/document_splitter.py * Update test_document_splitter.py --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2024-01-17 15:36:29 +01:00
Madeesh Kannan	7376838922	feat!: Framework-agnostic device management (#6748 ) * feat: Framework-agnostic device management * Add release note * Linting * Fix test * Add `first_device` property, expand release notes, validate `ComponentDevice` state	2024-01-17 10:41:34 +01:00
ZanSara	b8b8b5d5c6	feat!: rename `model_name_or_path` to `model` in `NamedEntityExtractor` (#6744 ) * rename model_name_or_path to simply model * fix tests * reno	2024-01-16 15:32:48 +01:00
Sebastian Husch Lee	20f04f6054	feat: MetaFieldRanker update (#6742 ) * Add weight and ranking_mode as params to run for easier experimentation * renaming of metadata to meta * User logger.warning instead of warnings * Add another unit test * Add support for sort_order and fix formatting of error messages * Make MetaFieldRanker more robust. Doesn't crash pipeline if some Documents are missing keys. * Don't print same warning message twice * Add another test * Making MetaFieldRanker more robust * Move up if return statement to earlier in the function * Setting up infer_type * Remove infer_type for now * Release notes * Add init file * Update releasenotes/notes/metafieldranker_sort-order_refactor-2000d89dc40dc15a.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-01-16 08:52:58 +01:00
Vladimir Blagojevic	8cafff0645	refactor: Extract HF stop words handling in `hf_utils.py` (#6745 ) * Move StopWordsCriteria to hf_utils.py * Raise ValueError for invalid StopWordsCriteria tokenizer * StopWordsCriteria, make sure padding token exists * Use proper torch types * Update unit tests	2024-01-15 17:42:29 +01:00
ZanSara	96c0b59aaa	feat!: Rename `model_name_or_path` to `model` in `ExtractiveReader` (#6736 ) * rename model parameter and internam model attribute in ExtractiveReader * fix tests for ExtractiveReader * fix e2e * reno * another fix * review feedback * Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml	2024-01-15 14:48:33 +01:00
Madeesh Kannan	a5189dd035	fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 (#6717 ) * fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 Also update tests to accommodate the new behavior. * Remove superfluous code	2024-01-12 17:50:55 +01:00
ZanSara	0616197b44	feat!: Rename `model_name_or_path` to `model` in `TransformersSimilarityRanker` (#6734 ) * rename model parameter in transformers ranker * fix tests for transformers ranker * reno * reno * typo	2024-01-12 17:09:12 +01:00
ZanSara	288ed150c9	feat!: Rename `model_name` or `model_name_or_path` to `model` in all Embedder classes (#6733 ) * rename model parameter in the openai doc embedder * fix tests for openai doc embedder * rename model parameter in the openai text embedder * fix tests for openai text embedder * rename model parameter in the st doc embedder * fix tests for st doc embedder * rename model parameter in the st backend * fix tests for st backend * rename model parameter in the st text embedder * fix tests for st text embedder * fix docstring * fix pipeline utils * fix e2e * reno * fix the indexing pipeline _create_embedder function * fix e2e eval rag pipeline * pytest	2024-01-12 15:30:17 +01:00
ZanSara	ce7abc9bde	feat!: Rename `model_name` or `model_name_or_path` to `model` in all Transcriber classes (#6731 ) * rename model parameter in local transcriber * fix tests for local transcriber * rename model parameter in remote transcriber * fix tests for remote transcriber * reno --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-01-12 14:40:30 +01:00
Stefano Fiorucci	24c71bd221	rename model_name_or_path to model in test (#6732 )	2024-01-12 13:56:14 +01:00
sahusiddharth	dbdeb8259e	feat: rename `model_name` or `model_name_or_path` to `model` in generators (#6715 ) * renamed model_name or model_name_or_path to model * added release notes * Update releasenotes/notes/renamed-model_name-or-model_name_or_path-to-model-184490cbb66c4d7c.yaml --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2024-01-12 12:58:01 +01:00
Stefano Fiorucci	80c3e6825a	fix: serialize/deserialize torch dtype in the components that need it (#6713 ) * first draft for ranker * same for the reader * consider also bnb_4bit_compute_dtype * dtype serialization in hugging_face_local_generator * add release note * address dtype defined in huggingface_pipeline_kwargs * test quantization options in reader * fix * serialize quantization_config * test quantization_config serialization * address feedback * fix typo	2024-01-12 12:22:45 +01:00
ZanSara	60780ce897	feat: Tweak `CacheChecker` output type (#6719 ) * specify cache checker output type * (de)serialization * tests * add default value for type * reno * mypy * feedback * reduce diff * reduce diff * reno	2024-01-11 12:33:26 +01:00
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00

1 2

94 Commits