262 Commits

Author SHA1 Message Date
Julian Risch
4ae0924ea0
feat!: Remove SklearnQueryClassifier (#5779)
* remove SklearnQueryClassifier

* reno
2023-09-13 12:55:33 +02:00
Stefano Fiorucci
283ecf2760
feat: add prefix and suffix to SentenceTransformersDocumentEmbedder (#5745)
* add prefix and suffix

* fix test
2023-09-13 12:55:06 +02:00
ZanSara
2c4d839b64
feat: GPT4Generator (#5744)
* add gpt4generator

* add e2e

* add tests

* reno

* fix e2e

* Update test/preview/components/generators/openai/test_gpt4_generator.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-09-13 10:07:09 +02:00
Christian Clauss
23f7308bec
ci: pre-commit autoupdate (#5777) 2023-09-12 14:34:41 +02:00
ZanSara
6e70d403f8
feat: Improve Document for Haystack 2.0 (#5738)
* initial draft

* tests

* add proposal

* proposal number

* reno

* fix tests and usage of content and content_type

* update branch & fix more tests

* mypy

* add docstring

* fix more tests

* review feedback

* improve __str__

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/preview/dataclasses/document.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* improve __str__

* fix tests

* fix more tests

* Update haystack/preview/document_stores/memory/document_store.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-11 17:40:00 +02:00
Stefano Fiorucci
2edf85f739
MemoryEmbeddingRetriever (2.0) (#5726)
* MemoryDocumentStore - Embedding retrieval draft

* add release notes

* fix mypy

* better comment

* improve return_embeddings handling

* MemoryEmbeddingRetriever - first draft

* address PR comments

* release note

* update docstrings

* update docstrings

* incorporated feeback

* add return_embedding to __init__

* rm leftover docstring

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-08 15:52:48 +02:00
Stefano Fiorucci
b7bea3ae9c
MemoryDocumentStore - Embedding retrieval (2.0) (#5715)
* MemoryDocumentStore - Embedding retrieval draft

* add release notes

* fix mypy

* better comment

* improve return_embeddings handling

* address PR comments

* update docstrings

* incorporated feeback

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-07 15:44:07 +02:00
ZanSara
63cbde7287
feat: GPT35Generator (#5714)
* chatgpt backend

* fix tests

* reno

* remove print

* helpers tests

* add chatgpt generator

* use openai sdk

* remove backend

* tests are broken

* fix tests

* stray param

* move _check_troncated_answers into the class

* wrong import

* rename function

* typo in test

* add openai deps

* mypy

* improve system prompt docstring

* typos update

* Update haystack/preview/components/generators/openai/chatgpt.py

* pylint

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* review feedback

* fix tests

* freview feedback

* reno

* remove tenacity mock

* gpt35generator

* fix naming

* remove stray references to chatgpt

* fix e2e

* Update releasenotes/notes/chatgpt-llm-generator-d043532654efe684.yaml

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* add another test

* test wrong model name

* review feedback

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-07 10:06:57 +02:00
Vladimir Blagojevic
c5edb45c10
feat: Add SerperDevWebSearch Haystack 2.0 component (#5712)
* Add SerperDev

* Add release note

* PR Feedback

* Simplify, remove one-liner

* Update haystack/preview/components/websearch/serper_dev.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Update haystack/preview/components/websearch/serper_dev.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Fix formatting

* PR feedback

* Fix tests

* Function rename

* Remove scoring, update tests

* PR feedback

* Fix return

* small adjustments

* fix tests

* add e2e test

* fix release notes

* fix tests

* fix e2e

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-06 17:31:42 +02:00
bogdankostic
639f7cf888
chore: Rename AnswersBuilder to AnswerBuilder (#5720)
* Add AnswersBuilder

* Add tests for AnswersBuilder

* Add release note

* PR feedback

* Fix mypy

* Remove redundant check for number of groups

* Rename AnswersBuilder to AnswerBuilder

* Update test/preview/components/builders/test_answer_builder.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Rename reno file

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-05 14:34:22 +02:00
Silvano Cerza
2acc41ea85
Add PromptBuilder (#5713)
* Add PromptBuilder

* Update release note

* Add test
2023-09-05 12:22:21 +02:00
bogdankostic
a5b815690e
feat: Add AnswersBuilder component (2.0) (#5701)
* Add AnswersBuilder

* Add tests for AnswersBuilder

* Add release note

* PR feedback

* Fix mypy

* Remove redundant check for number of groups

* docstrings upd

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-04 21:16:20 +02:00
bogdankostic
11440395f4
fix: Set model_max_length in the Tokenizer of DefaultPromptHandler (#5596)
* Set model_max_length in tokenizer in prompt handler

* Add release note
2023-09-01 11:48:41 +02:00
ZanSara
5f1256ac7e
feat: generators (2.0) (#5690)
* add generators module

* add tests for module helper

* reno

* add another test

* move into openai

* improve tests
2023-08-31 17:33:12 +02:00
Fanli Lin
40d9f34e68
feat: enable passing use_fast to the underlying transformers' pipeline (#5655)
* copy instead of deepcopy

* fix pylint

* add use_fast

* add release note

* remove unrelevant changes

* black fix

* fix bug

* black

* bug fix
2023-08-30 10:25:18 +02:00
ZanSara
b1daa7c647
chore: migrate to canals==0.7.0 (#5647)
* add default_to_dict and default_from_dict placeholders to ease migration to canals 0.7.0

* canals==0.7.0

* whisper components

* add to_dict/from_dict stubs

* import serialization methods in init to hide canals imports

* reno

* export deserializationerror too

* Update haystack/preview/__init__.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* serialization methods for LocalWhisperTranscriber (#5648)

* chore: serialization methods for `FileExtensionClassifier` (#5651)

* serialization methods for FileExtensionClassifier

* Update test_file_classifier.py

* chore: serialization methods for `SentenceTransformersDocumentEmbedder` (#5652)

* serialization methods for SentenceTransformersDocumentEmbedder

* fix device management

* serialization methods for SentenceTransformersTextEmbedder (#5653)

* serialization methods for TextFileToDocument (#5654)

* chore: serialization methods for `RemoteWhisperTranscriber` (#5650)

* serialization methods for RemoteWhisperTranscriber

* remove patches

* Add default to_dict and from_dict in document stores built with factory (#5674)

* fix tests (#5671)

* chore: simplify serialization methods for `MemoryDocumentStore` (#5667)

* simplify serialization for MemoryDocumentStore

* remove redundant tests

* pylint

* chore: serialization methods for `MemoryRetriever` (#5663)

* serialization method for MemoryRetriever

* more tests

* remove hash from default_document_store_to_dict

* remove diff in factory.py

* chore: serialization methods for `DocumentWriter` (#5661)

* serialization methods for DocumentWriter

* more tests

* use factory

* black

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-29 18:15:07 +02:00
Vladimir Blagojevic
e5e7bb9654
feat: Allow WebRetrieve to use custom LinkContentFetcher (#5662)
* Allow use of custom LinkContentFetcher

* Add release note
2023-08-29 15:46:48 +02:00
Vladimir Blagojevic
1f7c7b716a
Update release note for #5526 (#5664) 2023-08-29 14:25:52 +02:00
Julian Risch
fa81c611e8
build: Upgrade transformers to v4.32.1 (#5658)
* upgrade transformers to 4.32.1

* added release notes

* upgrade transformers version also for inference extra
2023-08-29 13:46:00 +02:00
Vladimir Blagojevic
f13b37db24
fix: LinkContentFetcher - when no content retrieved (i.e. request blocked), default to snippet text (#5656)
* When no content retrieved (i.e. request blocked), default to snippet

* Add release note
2023-08-29 10:57:47 +02:00
Vladimir Blagojevic
2118f68769
feat: Add domain scoping to WebRetriever (#5587)
* WebSearch: add allowed_domains scoped search

* Add talk to website example

* Add release note

* Add allowed_domains to WebSearch

* Minor fix

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-28 20:02:02 +02:00
Stefano Fiorucci
72fe4fc57b
feat: SentenceTransformersDocumentEmbedder (#5606)
* first draft

* incorporate feedback

* some unit tests

* release notes

* real release notes

* refactored to use a factory class

* allow forcing fresh instances

* first draft

* Update haystack/preview/embedding_backends/sentence_transformers_backend.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* simplify implementation and tests

* add embed_meta_fields implementation

* lg update

* improve meta data embedding; tests

* support non-string metadata

* make factory private

* change return type; improve tests

* warm_up not called in run

* fix typing

* rm unused import

* Remove base test class

* black

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:41 +02:00
Stefano Fiorucci
89c1813d9f
feat: SentenceTransformersTextEmbedder (#5600)
* first draft

* incorporate feedback

* some unit tests

* release notes

* real release notes

* first draft

* refactored to use a factory class

* adapt to new ST Embedding Backend implementation

* allow forcing fresh instances

* add tests

* release notes

* fix typo

* little improvements in tests

* Update haystack/preview/embedding_backends/sentence_transformers_backend.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* simplify implementation and tests

* lg update

* input check

* better error message

* make factory private

* change return type; improve tests

* warm_up not called in run

* warm_up not called in run

* rm unused import; default model

* fix typing

* rm unused import

* Remove BaseTestComponent

* black

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:26 +02:00
Stefano Fiorucci
35dfe47186
feat: SentenceTransformersEmbeddingBackend (v2) (#5572)
* first draft

* incorporate feedback

* some unit tests

* release notes

* real release notes

* refactored to use a factory class

* allow forcing fresh instances

* Update haystack/preview/embedding_backends/sentence_transformers_backend.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* simplify implementation and tests

* make factory private

* change return type; improve tests

* fix typing

* rm unused import

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 12:32:37 +02:00
Stefano Fiorucci
8342b6a457
upgrade transformers (#5619) 2023-08-25 16:38:34 +02:00
Silvano Cerza
66f615a3a4
Remove BaseTestComponent (#5613)
* Remove BaseTestComponent

* Add release notes
2023-08-23 17:03:37 +02:00
Silvano Cerza
d5599df029
Fix release notes (#5599) 2023-08-18 17:59:07 +02:00
Silvano Cerza
03ebef7219
Remove DocumentStoreAwareMixin (#5585)
* Remove Pipeline

* Add release notes

* Enhance imports

* Update release note

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Remove Pipeline tests

* Remove DocumentStoreAwareMixin

* Add release notes

* Remove DocumentStoreAwareMixin from __all__

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-18 17:56:24 +02:00
Silvano Cerza
4ef813fc8a
Remove specialised Pipeline (#5584)
* Remove Pipeline

* Add release notes

* Enhance imports

* Update release note

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Remove Pipeline tests

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-18 17:48:13 +02:00
Silvano Cerza
72e0a588db
Rework DocumentWriter (#5583)
* Remove DocumentStoreAwareMixin from DocumentWriter

* Add release notes
2023-08-18 17:03:17 +02:00
Silvano Cerza
4bc68cbc2f
Rework MemoryRetriever (#5582)
* Remove DocumentStoreAwareMixin from MemoryRetriever

* Add release notes

* Update an article

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-18 16:33:35 +02:00
Massimiliano Pippi
7e633c6b0c
chore: change import paths under preview (#5592)
* fix import paths

* add release notes
2023-08-18 12:53:25 +02:00
Massimiliano Pippi
39a1f61326
chore: improve error message in FileExtensionClassifier (#5590)
* output an actionable error

* add release note

* fix matching in raised error

* fix release note category
2023-08-18 12:28:55 +02:00
Stefano Fiorucci
aa8da40820
chore: add preview section to release notes (#5591)
* add preview section to reno config and update existing notes

* Empty commit to trigger CLA
2023-08-18 09:59:01 +02:00
Vladimir Blagojevic
46c9139caf
refactor: Rework WebRetriever caching, adjust tests (#5566)
* Rework WebRetriever caching, adjust tests

* Add release note

* Better pydocs

* Minor improvements

* Update haystack/nodes/retriever/web.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-16 17:41:11 +02:00
ZanSara
a8d4a99db9
feat: copy lazy_imports.py to preview (#5580)
* copy lazy_imports

* reno
2023-08-16 14:27:17 +02:00
Julian Risch
22c7601729
feat: Add DocumentWriter v2 (#5435)
* add draft of WriteToStore and basic test

* add DocumentWriter implementation

* draft unit and integration tests

* add release note

* mock Store in unit tests

* pylint

* Update haystack/preview/components/writers/document_writer.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Remove unnecessary test

* Rework DocumentWriter to support new Component I/O definition

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-16 13:48:33 +02:00
MichelBartels
93b3400440
Add Answer class (#5563)
* add answer class

* inheritance instead of composition

* make answer immutable

* Remove probability field for GenerativeAnswer

* rename Answer classes

* fix name change

* add release notes
2023-08-16 11:56:22 +02:00
Vladimir Blagojevic
8652d00b54
feat: Add FileExtensionClassifier to previews (#5514)
* Add FileExtensionClassifier preview component

* Add release note

* PR feedback
2023-08-15 15:58:55 +02:00
Silvano Cerza
a7416bcf89
Add to_dict and from_dict methods for Stores (#5541)
* Add to_dict and from_dict methods for Stores

* Add release notes

* Add tests with custom init parameters
2023-08-11 14:45:56 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler (#5374)
* Add content type resolution, pdf handler, user agent switching
---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
ZanSara
5ca4874df9
Migrate existing v2 components to Canals 0.4.0 (#5532)
* pin canals==0.4.0

* update audio components

* allow audio components to receive whisper_params in init too

* migrating memoryretriever

* migrate memoryretriever

* migrate TextFileToDocument

* fix TextFileToDocument tests

* fix pipeline tests

* fix defaults management

* reno

* inverted assignments

* Simplify release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-09 15:51:32 +02:00
Silvano Cerza
83fce1bd72
Add Store class factory (#5530)
* Add Store class factory

* Add release notes
2023-08-09 13:09:36 +02:00
HP
ff86af576a
fix: TransformersImageToText.generate_captions accepts "str" #5485 (#5491)
* fix: TransformersImageToText.generate_captions accepts "str" #5485 -- fix author email

* fix: TransformersImageToText.generate_captions accepts "str" #5485 - fix mypy, pylint, black issues

* fix: TransformersImageToText.generate_captions accepts "str" #5485 - changes after pr review
2023-08-09 09:54:12 +02:00
Vladimir Blagojevic
227bf6ca39
feat: Remove template variables from PromptNode invocation kwargs (#5526)
* Remove template params from kwargs before passing kwargs to invocation layer

* More unit tests

* Add release note

* Enable simple prompt node pipeline integration test use case
2023-08-08 16:40:23 +02:00
Vladimir Blagojevic
84ed954c8c
feat: Improve performance and add default media support in FileTypeClassifier (#5083)
* feat: add media outgoing edge to FileTypeClassifier

* Add release note

* Update language

---------

Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:51:07 +02:00
tstadel
d46c84bb61
feat: support dynamic filters in custom_query (#5427)
* support filters in custom_query

* better tests

* Update docstrings

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:48:15 +02:00
Stefano Fiorucci
3f472995bb
refactor: update Crawler to support selenium>=4.11.0 and simplify it (#5515)
* refactor crawler

* rm unused imports

* release notes!

* rm outdated mock
2023-08-08 15:13:22 +02:00
Fanli Lin
f6b50cfdf9
fix: StopWordsCriteria doesn't compare the stop word token ids with the input ids in a continuous and sequential order (#5503)
* bug fix

* add release note

* add unit test

* refactor

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-08-08 08:35:10 +02:00
Fanli Lin
4496fc6afd
fix: leading whitespace is missing in the generated text when using stop_words (#5511)
* bug fix

* add release note

* Update releasenotes/notes/fix-stop-words-strip-issue-22ce51306e7b91e4.yaml

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Update releasenotes/notes/fix-stop-words-strip-issue-22ce51306e7b91e4.yaml

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-04 17:40:19 +02:00