94 Commits

Author SHA1 Message Date
Stefano Fiorucci
44b5ae291c
specify CPU device in warm_up test (#7014) 2024-02-16 13:01:57 +01:00
Stefano Fiorucci
0aa788facc
refactor!: LocalWhisperTranscriber - new devices mgmt (#7008)
* wip

* whisper local transcriber: use new device mgmt

* better from_dict + test

* reno
2024-02-16 11:25:53 +01:00
Silvano Cerza
a7209f6413
Mark OpenAPIServiceConnector integration test as flaky (#7007) 2024-02-15 19:33:34 +01:00
Tuana Çelik
e2cee468fc
fix: Adding api_base_url to OpenAITextEmbeder self assignments (#7004)
* assigning api_base_url

This fix resolves issues with the MistralTextEmbedder integration

* adding base url to `to_dict` and the tests

* adding release note

* Update fix-openai-base-url-assignment-0570a494d88fe365.yaml

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-02-15 17:35:28 +01:00
Silvano Cerza
6fe1d3b595
refactor: Clean eval components (#7005)
* Remove preprocess.py

* Rename eval components to evaluators
2024-02-15 17:17:59 +01:00
Silvano Cerza
2b8a606cb8
refactor: Refactor StatisticalEvaluator (#6999)
* Refactor StatisticalEvaluator

* Update StatisticalEvaluator

* Rename StatisticalMetric.from_string to from_str and change internal logic

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Fix tests

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-02-15 16:47:35 +01:00
Silvano Cerza
c82f787b41
feat: Add TextCleaner component (#6997)
* Add TextCleaner component

* Update docstrings and simplify run logic

* Update docstrings
2024-02-15 16:10:38 +01:00
Silvano Cerza
2a4e6a1de2
refactor: Refactor SASEvaluator (#6998)
* Remove preprocessing from SASEvaluator and add warm_up method

* Update docstrings
2024-02-15 16:05:43 +01:00
Vladimir Blagojevic
5a8d02064b
feat: Add JsonSchemaValidator (#6937)
* Add JsonSchemaValidator
---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-02-15 14:07:01 +01:00
Silvano Cerza
36ab23d360
feat: Add StatisticalEvaluator component (#6982)
* Add StatisticalEvaluator component

* Remove F1 and Exact Metric from old API

* Add release notes

* Update docstrings
2024-02-14 16:48:03 +01:00
Silvano Cerza
9297fca520
feat: Add SASEvaluator component (#6980)
* Add SASEvaluator component

* Add release notes

* Delete old tests

* Remove SAS metric in old API

* Avoid importing whole numpy package
2024-02-14 16:16:22 +01:00
Vladimir Blagojevic
8d46a2883e
feat: Make system_messages optional in OpenAPIServiceToFunctions run (#6825)
* Make system_messages optional in OpenAPIServiceToFunctions run

* Adjust unit test

* PR feedback Massi
2024-02-14 16:04:35 +01:00
Vladimir Blagojevic
6a776e672f
Add OutputAdapter sede for custom filters (#6985) 2024-02-13 16:56:43 +01:00
Sebastian Husch Lee
ea7275955d
feat: Meta field ranker add meta_value_type (#6977)
* Update MetaFieldRanker to parse string meta values based on meta_value_type

* Add some unit tests

* Add another unit test

* Add release notes

* Fix mypy

* Fix pylint

* Add more unit tests

* Update release notes

* Update docs

* Further improve doc strings
2024-02-13 13:08:35 +01:00
Vladimir Blagojevic
97a0df66d2
feat: Add OutputAdapter (#6936)
* Add OutputAdapter component
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-02-13 13:03:50 +01:00
Vladimir Blagojevic
a311d82593
feat: Externalize callable serialization so it can be reused (#6979)
* Callback (de)serialization

* Add unit tests

* Replace callback handler sede with callable sede

* Remove unused functions

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-02-13 13:00:49 +01:00
Vladimir Blagojevic
37d9de3c4e
feat: Add service_credentials to OpenAPIServiceConnector run (#6962)
* Add service_credentials to OpenAPIServiceConnector run
* PR feedback Silvano
2024-02-09 16:03:27 +01:00
Bijay Gurung
74683fe74d
Feat: Add FilterRetriever (#6836)
* Add FilterRetriever draft

* Implement FilterRetriever and add tests

* Update comparison to compare whole docs instead of just contents

* Expose FilterRetriever at the retrievers level

* Update docstring (add example usage)

* Add filter_retriever in the API reference docs config

Update retriever search path to start one dir level higher

* simplify _documents_equal

* improve usage example

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-02-08 08:48:46 +01:00
Vladimir Blagojevic
9e6a2e3cf9
fix: HuggingFaceTGIGenerator gets stuck when model is not supported (#6915)
* HuggingFaceTGIGenerator/HuggingFaceTGIChatGenerator check if model is deployed on free-tier
2024-02-06 16:55:06 +01:00
ZanSara
1182c08daf
fix: Dont filter negative scores when using BM25Okapi and scale_score=False (#6889)
* dont filter negatives for unscaled Okapi

* change BM25 algorithm default to BM25L

* Update haystack/document_stores/in_memory/document_store.py

* improve comment
2024-02-06 11:07:27 +01:00
Massimiliano Pippi
7d29ddba42
chore: merge hf utils modules into one (#6921)
* merge hf utils modules

* relnotes

* lint

* Update releasenotes/notes/merge-hf-utils-modules-5c16e04025123568.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-02-06 09:59:25 +01:00
Silvano Cerza
0191b1e6e4
feat: Change Component's I/O dunder type (#6916)
* Add Pipeline.get_component_name() method

* Add utility class to ease discoverability of Component I/O

* Move InputOutput in component package

* Rename InputOutput to _InputOutput

* Raise if inputs or outputs field already exist

* Fix tests

* Add release notes

* Move InputSocket and OutputSocket in types package

* Move _InputOutput in socket package

* Rename _InputOutput class to Sockets

* Simplify Sockets class

* Dictch I/O dunder fields in favour of inputs and outputs fields

* Update Sockets docstrings

* Update release notes

* Fix mypy

* Remove unnecessary assignment

* Remove unused logging

* Change SocketsType to SocketsIOType to avoid confusion

* Change sockets type and name

* Change Sockets.__repr__ to return component instance

* Fix linting

* Fix sockets tests

* Revert to dunder fields for Component IO

* Use singular in IO dunder fields

* Delete release notes

* Update haystack/core/component/types.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-02-05 17:46:45 +01:00
sahusiddharth
3bd6ba93ca
feat:Add dimensions parameter to OpenAI Embedders to fully support th… (#6841)
* feat:Add dimensions parameter to OpenAI Embedders to fully support the new models

* fixed linting

* changed != None to is not None
2024-02-05 16:20:46 +01:00
Madeesh Kannan
27d1af3068
feat!: Use Secret for passing authentication secrets to components (#6887)
* feat!: Use `Secret` for passing authentication secrets to components

* Add comment to clarify type ignore
2024-02-05 13:17:01 +01:00
ZanSara
9af6c7e442
add some tolerance to Roberta test (#6880) 2024-01-31 17:19:07 +01:00
Sebastian Husch Lee
ceda4cd655
feat: Add support for device_map (#6679)
* Getting device_map working to support 8bit loading and multi device inference

* Update to take account the device specified by the user

* add release notes

* Add device_map support for ExtractiveReader

* Update test

* Update to model that doesn't have issues

* Update test

* Update pytest approx

* Update release notes

* Start supporting device map

* Update ExtractiveReader to use new ComponentDevice

* Update similarity ranker to follow extractive reader implementation

* Fixing pylint

* Make mypy mostly happy

* Add new unit test to test device_map

* Adding unit tests

* Some refactoring

* Add more tests

* Add more tests

* Add another unit test

* Update first_device property to return a ComponentDevice to be able to use the to methods

* Updating tests for test_device

* Update tests and now explicitly modify device_map in model_kwargs

* Update haystack/utils/hf.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Make mypy happy

* mypy

* Remove unneeded optional flag

* Update ExtractiveReader with new logic

* Update ranker to follow new logic

* Removing unneeded code

* Make mypy happy

* fxi pylint

* Fix test

* Adding unit tests for device_map="auto"

* Add unit tests for ranker

* PR comments

* Make util method

* Adding unit tests

* Fix type annotation

* Fix pylint

* Fix test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-01-30 13:47:57 +01:00
Silvano Cerza
f5e61338ba
chore: Remove all mentions of Canals (#6844)
* Remove unnecessary Connection class

* Remove all mentions of canals

* Add release notes
2024-01-29 17:26:11 +01:00
Massimiliano Pippi
acf4cd502f
refact: Rename helper function (#6831)
* change function name

* add api docs

* release notes
2024-01-26 16:00:02 +01:00
Sebastian Husch Lee
3bea3b1714
feat: Add query and document prefix options for the TransformerSimilarityRanker (#6826)
* Add query and doc prefix

* Fix some tests

* add release notes
2024-01-25 15:29:19 +01:00
Rob Pasternak
7358b910d7
feat: Weights and score normalization for DocumentJoiner with reciprocal rank fusion (#6735)
* Add weighting and score normalization for DocumentJoiner w/ reciprocal rank fusion (fix trailing whitespace)

* Add release notes

* Add unit test

* Update release note

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-01-24 15:45:53 +01:00
Vladimir Blagojevic
6e86f4e26a
Update embedding integration tests (#6823) 2024-01-24 15:22:47 +01:00
Vladimir Blagojevic
0b177b3bc6
feat: Improve OpenAPIServiceConnector service response serialization (#6772)
* Better service response json -> str serialization

* Add unit test
2024-01-18 16:49:48 +01:00
Vladimir Blagojevic
fea1428e84
feat: Add HuggingFaceLocalChatGenerator (#6751) 2024-01-18 15:53:12 +01:00
Madeesh Kannan
5d66d040cc
feat: Add serde methods to HTMLToDocument (#6758) 2024-01-18 10:02:01 +01:00
Sebastian Husch Lee
c0b67432e4
feat: Add page breaks to default PDF to Document converter (#6755)
* Speedup tests for PyPDFToDocument

* Added unit test and removed skipping of empty pages

* add release note

* Add back some integration marks
2024-01-18 08:54:59 +01:00
sahusiddharth
a7ac4edd07
feat: added split by page to DocumentSplitter (#6753)
* feat-added-split-by-page-to-DocumentSplitter

* added test case and the suggested changes

* Update document_splitter.py

* Update haystack/components/preprocessors/document_splitter.py

* Update test_document_splitter.py

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-01-17 15:36:29 +01:00
Madeesh Kannan
7376838922
feat!: Framework-agnostic device management (#6748)
* feat: Framework-agnostic device management

* Add release note

* Linting

* Fix test

* Add `first_device` property, expand release notes, validate `ComponentDevice` state
2024-01-17 10:41:34 +01:00
ZanSara
b8b8b5d5c6
feat!: rename model_name_or_path to model in NamedEntityExtractor (#6744)
* rename model_name_or_path to simply model

* fix tests

* reno
2024-01-16 15:32:48 +01:00
Sebastian Husch Lee
20f04f6054
feat: MetaFieldRanker update (#6742)
* Add weight and ranking_mode as params to run for easier experimentation

* renaming of metadata to meta

* User logger.warning instead of warnings

* Add another unit test

* Add support for sort_order and fix formatting of error messages

* Make MetaFieldRanker more robust. Doesn't crash pipeline if some Documents are missing keys.

* Don't print same warning message twice

* Add another test

* Making MetaFieldRanker more robust

* Move up if return statement to earlier in the function

* Setting up infer_type

* Remove infer_type for now

* Release notes

* Add init file

* Update releasenotes/notes/metafieldranker_sort-order_refactor-2000d89dc40dc15a.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-01-16 08:52:58 +01:00
Vladimir Blagojevic
8cafff0645
refactor: Extract HF stop words handling in hf_utils.py (#6745)
* Move StopWordsCriteria to hf_utils.py

* Raise ValueError for invalid StopWordsCriteria tokenizer

* StopWordsCriteria, make sure padding token exists

* Use proper torch types

* Update unit tests
2024-01-15 17:42:29 +01:00
ZanSara
96c0b59aaa
feat!: Rename model_name_or_path to model in ExtractiveReader (#6736)
* rename model parameter and internam model attribute in ExtractiveReader

* fix tests for ExtractiveReader

* fix e2e

* reno

* another fix

* review feedback

* Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml
2024-01-15 14:48:33 +01:00
Madeesh Kannan
a5189dd035
fix!: InMemoryBM25Retriever no longer returns documents that have a score of 0.0 (#6717)
* fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0

Also update tests to accommodate the new behavior.

* Remove superfluous code
2024-01-12 17:50:55 +01:00
ZanSara
0616197b44
feat!: Rename model_name_or_path to model in TransformersSimilarityRanker (#6734)
* rename model parameter in transformers ranker

* fix tests for transformers ranker

* reno

* reno

* typo
2024-01-12 17:09:12 +01:00
ZanSara
288ed150c9
feat!: Rename model_name or model_name_or_path to model in all Embedder classes (#6733)
* rename model parameter in the openai doc embedder

* fix tests for openai doc embedder

* rename model parameter in the openai text embedder

* fix tests for openai text embedder

* rename model parameter in the st doc embedder

* fix tests for st doc embedder

* rename model parameter in the st backend

* fix tests for st backend

* rename model parameter in the st text embedder

* fix tests for st text embedder

* fix docstring

* fix pipeline utils

* fix e2e

* reno

* fix the indexing pipeline _create_embedder function

* fix e2e eval rag pipeline

* pytest
2024-01-12 15:30:17 +01:00
ZanSara
ce7abc9bde
feat!: Rename model_name or model_name_or_path to model in all Transcriber classes (#6731)
* rename model parameter in local transcriber

* fix tests for local transcriber

* rename model parameter in remote transcriber

* fix tests for remote transcriber

* reno

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-01-12 14:40:30 +01:00
Stefano Fiorucci
24c71bd221
rename model_name_or_path to model in test (#6732) 2024-01-12 13:56:14 +01:00
sahusiddharth
dbdeb8259e
feat: rename model_name or model_name_or_path to model in generators (#6715)
* renamed model_name or model_name_or_path to model

* added release notes

* Update releasenotes/notes/renamed-model_name-or-model_name_or_path-to-model-184490cbb66c4d7c.yaml

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2024-01-12 12:58:01 +01:00
Stefano Fiorucci
80c3e6825a
fix: serialize/deserialize torch dtype in the components that need it (#6713)
* first draft for ranker

* same for the reader

* consider also bnb_4bit_compute_dtype

* dtype serialization in hugging_face_local_generator

* add release note

* address dtype defined in huggingface_pipeline_kwargs

* test quantization options in reader

* fix

* serialize quantization_config

* test quantization_config serialization

* address feedback

* fix typo
2024-01-12 12:22:45 +01:00
ZanSara
60780ce897
feat: Tweak CacheChecker output type (#6719)
* specify cache checker output type

* (de)serialization

* tests

* add default value for type

* reno

* mypy

* feedback

* reduce diff

* reduce diff

* reno
2024-01-11 12:33:26 +01:00
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores namespace (#6714)
* remove symbols under the haystack.document_stores namespace

* Update haystack/document_stores/types/protocol.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* fix

* same for retrievers

* leftovers

* more leftovers

* add relnote

* leftovers

* one more

* fix examples

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00