ZanSara
b1daa7c647
chore: migrate to canals==0.7.0
( #5647 )
...
* add default_to_dict and default_from_dict placeholders to ease migration to canals 0.7.0
* canals==0.7.0
* whisper components
* add to_dict/from_dict stubs
* import serialization methods in init to hide canals imports
* reno
* export deserializationerror too
* Update haystack/preview/__init__.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* serialization methods for LocalWhisperTranscriber (#5648 )
* chore: serialization methods for `FileExtensionClassifier` (#5651 )
* serialization methods for FileExtensionClassifier
* Update test_file_classifier.py
* chore: serialization methods for `SentenceTransformersDocumentEmbedder` (#5652 )
* serialization methods for SentenceTransformersDocumentEmbedder
* fix device management
* serialization methods for SentenceTransformersTextEmbedder (#5653 )
* serialization methods for TextFileToDocument (#5654 )
* chore: serialization methods for `RemoteWhisperTranscriber` (#5650 )
* serialization methods for RemoteWhisperTranscriber
* remove patches
* Add default to_dict and from_dict in document stores built with factory (#5674 )
* fix tests (#5671 )
* chore: simplify serialization methods for `MemoryDocumentStore` (#5667 )
* simplify serialization for MemoryDocumentStore
* remove redundant tests
* pylint
* chore: serialization methods for `MemoryRetriever` (#5663 )
* serialization method for MemoryRetriever
* more tests
* remove hash from default_document_store_to_dict
* remove diff in factory.py
* chore: serialization methods for `DocumentWriter` (#5661 )
* serialization methods for DocumentWriter
* more tests
* use factory
* black
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-29 18:15:07 +02:00
Vladimir Blagojevic
e5e7bb9654
feat: Allow WebRetrieve to use custom LinkContentFetcher ( #5662 )
...
* Allow use of custom LinkContentFetcher
* Add release note
2023-08-29 15:46:48 +02:00
Vladimir Blagojevic
1f7c7b716a
Update release note for #5526 ( #5664 )
2023-08-29 14:25:52 +02:00
Julian Risch
fa81c611e8
build: Upgrade transformers to v4.32.1 ( #5658 )
...
* upgrade transformers to 4.32.1
* added release notes
* upgrade transformers version also for inference extra
2023-08-29 13:46:00 +02:00
Vladimir Blagojevic
f13b37db24
fix: LinkContentFetcher - when no content retrieved (i.e. request blocked), default to snippet text ( #5656 )
...
* When no content retrieved (i.e. request blocked), default to snippet
* Add release note
2023-08-29 10:57:47 +02:00
Vladimir Blagojevic
2118f68769
feat: Add domain scoping to WebRetriever ( #5587 )
...
* WebSearch: add allowed_domains scoped search
* Add talk to website example
* Add release note
* Add allowed_domains to WebSearch
* Minor fix
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-28 20:02:02 +02:00
Stefano Fiorucci
72fe4fc57b
feat: SentenceTransformersDocumentEmbedder ( #5606 )
...
* first draft
* incorporate feedback
* some unit tests
* release notes
* real release notes
* refactored to use a factory class
* allow forcing fresh instances
* first draft
* Update haystack/preview/embedding_backends/sentence_transformers_backend.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* simplify implementation and tests
* add embed_meta_fields implementation
* lg update
* improve meta data embedding; tests
* support non-string metadata
* make factory private
* change return type; improve tests
* warm_up not called in run
* fix typing
* rm unused import
* Remove base test class
* black
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:41 +02:00
Stefano Fiorucci
89c1813d9f
feat: SentenceTransformersTextEmbedder ( #5600 )
...
* first draft
* incorporate feedback
* some unit tests
* release notes
* real release notes
* first draft
* refactored to use a factory class
* adapt to new ST Embedding Backend implementation
* allow forcing fresh instances
* add tests
* release notes
* fix typo
* little improvements in tests
* Update haystack/preview/embedding_backends/sentence_transformers_backend.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* simplify implementation and tests
* lg update
* input check
* better error message
* make factory private
* change return type; improve tests
* warm_up not called in run
* warm_up not called in run
* rm unused import; default model
* fix typing
* rm unused import
* Remove BaseTestComponent
* black
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:26 +02:00
Stefano Fiorucci
35dfe47186
feat: SentenceTransformersEmbeddingBackend (v2) ( #5572 )
...
* first draft
* incorporate feedback
* some unit tests
* release notes
* real release notes
* refactored to use a factory class
* allow forcing fresh instances
* Update haystack/preview/embedding_backends/sentence_transformers_backend.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* simplify implementation and tests
* make factory private
* change return type; improve tests
* fix typing
* rm unused import
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 12:32:37 +02:00
Stefano Fiorucci
8342b6a457
upgrade transformers ( #5619 )
2023-08-25 16:38:34 +02:00
Silvano Cerza
66f615a3a4
Remove BaseTestComponent ( #5613 )
...
* Remove BaseTestComponent
* Add release notes
2023-08-23 17:03:37 +02:00
Silvano Cerza
d5599df029
Fix release notes ( #5599 )
2023-08-18 17:59:07 +02:00
Silvano Cerza
03ebef7219
Remove DocumentStoreAwareMixin
( #5585 )
...
* Remove Pipeline
* Add release notes
* Enhance imports
* Update release note
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Remove Pipeline tests
* Remove DocumentStoreAwareMixin
* Add release notes
* Remove DocumentStoreAwareMixin from __all__
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-18 17:56:24 +02:00
Silvano Cerza
4ef813fc8a
Remove specialised Pipeline
( #5584 )
...
* Remove Pipeline
* Add release notes
* Enhance imports
* Update release note
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Remove Pipeline tests
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-18 17:48:13 +02:00
Silvano Cerza
72e0a588db
Rework DocumentWriter
( #5583 )
...
* Remove DocumentStoreAwareMixin from DocumentWriter
* Add release notes
2023-08-18 17:03:17 +02:00
Silvano Cerza
4bc68cbc2f
Rework MemoryRetriever
( #5582 )
...
* Remove DocumentStoreAwareMixin from MemoryRetriever
* Add release notes
* Update an article
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-18 16:33:35 +02:00
Massimiliano Pippi
7e633c6b0c
chore: change import paths under preview
( #5592 )
...
* fix import paths
* add release notes
2023-08-18 12:53:25 +02:00
Massimiliano Pippi
39a1f61326
chore: improve error message in FileExtensionClassifier ( #5590 )
...
* output an actionable error
* add release note
* fix matching in raised error
* fix release note category
2023-08-18 12:28:55 +02:00
Stefano Fiorucci
aa8da40820
chore: add preview
section to release notes ( #5591 )
...
* add preview section to reno config and update existing notes
* Empty commit to trigger CLA
2023-08-18 09:59:01 +02:00
Vladimir Blagojevic
46c9139caf
refactor: Rework WebRetriever caching, adjust tests ( #5566 )
...
* Rework WebRetriever caching, adjust tests
* Add release note
* Better pydocs
* Minor improvements
* Update haystack/nodes/retriever/web.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-16 17:41:11 +02:00
ZanSara
a8d4a99db9
feat: copy lazy_imports.py
to preview
( #5580 )
...
* copy lazy_imports
* reno
2023-08-16 14:27:17 +02:00
Julian Risch
22c7601729
feat: Add DocumentWriter v2 ( #5435 )
...
* add draft of WriteToStore and basic test
* add DocumentWriter implementation
* draft unit and integration tests
* add release note
* mock Store in unit tests
* pylint
* Update haystack/preview/components/writers/document_writer.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Remove unnecessary test
* Rework DocumentWriter to support new Component I/O definition
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-16 13:48:33 +02:00
MichelBartels
93b3400440
Add Answer class ( #5563 )
...
* add answer class
* inheritance instead of composition
* make answer immutable
* Remove probability field for GenerativeAnswer
* rename Answer classes
* fix name change
* add release notes
2023-08-16 11:56:22 +02:00
Vladimir Blagojevic
8652d00b54
feat: Add FileExtensionClassifier to previews ( #5514 )
...
* Add FileExtensionClassifier preview component
* Add release note
* PR feedback
2023-08-15 15:58:55 +02:00
Silvano Cerza
a7416bcf89
Add to_dict
and from_dict
methods for Stores ( #5541 )
...
* Add to_dict and from_dict methods for Stores
* Add release notes
* Add tests with custom init parameters
2023-08-11 14:45:56 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler ( #5374 )
...
* Add content type resolution, pdf handler, user agent switching
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
ZanSara
5ca4874df9
Migrate existing v2 components to Canals 0.4.0 ( #5532 )
...
* pin canals==0.4.0
* update audio components
* allow audio components to receive whisper_params in init too
* migrating memoryretriever
* migrate memoryretriever
* migrate TextFileToDocument
* fix TextFileToDocument tests
* fix pipeline tests
* fix defaults management
* reno
* inverted assignments
* Simplify release notes
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-09 15:51:32 +02:00
Silvano Cerza
83fce1bd72
Add Store class factory ( #5530 )
...
* Add Store class factory
* Add release notes
2023-08-09 13:09:36 +02:00
HP
ff86af576a
fix: TransformersImageToText.generate_captions accepts "str" #5485 ( #5491 )
...
* fix: TransformersImageToText.generate_captions accepts "str" #5485 -- fix author email
* fix: TransformersImageToText.generate_captions accepts "str" #5485 - fix mypy, pylint, black issues
* fix: TransformersImageToText.generate_captions accepts "str" #5485 - changes after pr review
2023-08-09 09:54:12 +02:00
Vladimir Blagojevic
227bf6ca39
feat: Remove template variables from PromptNode invocation kwargs ( #5526 )
...
* Remove template params from kwargs before passing kwargs to invocation layer
* More unit tests
* Add release note
* Enable simple prompt node pipeline integration test use case
2023-08-08 16:40:23 +02:00
Vladimir Blagojevic
84ed954c8c
feat: Improve performance and add default media support in FileTypeClassifier ( #5083 )
...
* feat: add media outgoing edge to FileTypeClassifier
* Add release note
* Update language
---------
Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:51:07 +02:00
tstadel
d46c84bb61
feat: support dynamic filters in custom_query ( #5427 )
...
* support filters in custom_query
* better tests
* Update docstrings
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:48:15 +02:00
Stefano Fiorucci
3f472995bb
refactor: update Crawler to support selenium>=4.11.0 and simplify it ( #5515 )
...
* refactor crawler
* rm unused imports
* release notes!
* rm outdated mock
2023-08-08 15:13:22 +02:00
Fanli Lin
f6b50cfdf9
fix: StopWordsCriteria doesn't compare the stop word token ids with the input ids in a continuous and sequential order ( #5503 )
...
* bug fix
* add release note
* add unit test
* refactor
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-08-08 08:35:10 +02:00
Fanli Lin
4496fc6afd
fix: leading whitespace is missing in the generated text when using stop_words
( #5511 )
...
* bug fix
* add release note
* Update releasenotes/notes/fix-stop-words-strip-issue-22ce51306e7b91e4.yaml
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Update releasenotes/notes/fix-stop-words-strip-issue-22ce51306e7b91e4.yaml
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-04 17:40:19 +02:00
tstadel
d26d4201fc
feat: support search_fields in DeepsetCloudDocumentStore ( #5455 )
...
* feat: support search_fields in DeepsetCloudDocumentStore
* add reno file
* make search_fields plain init arg
* Update lg
* Update releasenotes/notes/deepset-cloud-document-store-search-fields-40b2322466f808a3.yaml
* Update haystack/document_stores/deepsetcloud.py
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-04 11:13:05 +02:00
Vladimir Blagojevic
d96c963bc4
test: Convert two HFLocalInvocationLayer integration to unit tests ( #5446 )
...
* Convert two HFLocalInvocationLayer integration to unit tests
* Simplify unit test
* Improve HFLocalInvocationLayer unit tests
2023-08-03 17:41:32 +02:00
Vladimir Blagojevic
1876c41f07
feat: Add LostInTheMiddleRanker ( #5457 )
...
* Add lost in the middle ranker
* Add release note
* Julian's feedback: more precise version of truncate
* Better comments for the litm algorithm
* Sebastian PR feedback
* Add check for invalid values of word_count_threshold
* Remove _truncate as it is not needed any more
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-08-02 17:05:13 +02:00
Vladimir Blagojevic
0efe0ee7b3
feat: Add top_k
parameter to DiversityRanker
init method ( #5494 )
...
* Add top_k
* Add release note
2023-08-02 17:04:04 +02:00
Fanli Lin
8d04f28e11
fix: hf agent outputs the prompt text while the openai agent not ( #5461 )
...
* add skil prompt
* fix formatting
* add release note
* add release note
* Update releasenotes/notes/add-skip-prompt-for-hf-model-agent-89aef2838edb907c.yaml
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/nodes/prompt/invocation_layer/handlers.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/prompt/invocation_layer/handlers.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/prompt/invocation_layer/hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* add a unit test
* add a unit test2
* add skil prompt
* Revert "add skil prompt"
This reverts commit b1ba938c94b67a4fd636d321945990aabd2c5b2a.
* add unit test
---------
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 16:34:33 +02:00
Fanli Lin
73fa796735
fix: enable passing max_length
for text2text-generation task ( #5420 )
...
* bug fix
* add unit test
* reformatting
* add release note
* add release note
* Update releasenotes/notes/enable-set-max-length-during-runtime-097d65e537bf800b.yaml
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update test/prompt/invocation_layer/test_hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* bug fix
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 14:13:30 +02:00
Vladimir Blagojevic
40a2e9b56a
refactor: Update WebRetriever to use LinkContentFetcher ( #5229 )
...
* Refactor WebRetriever to use LinkContentFetcher
* PR feedback
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-02 12:45:03 +02:00
Fanli Lin
f7fd5eeb4f
feat: enable loading tokenizer for models that are not supported by the transformers library ( #5314 )
...
* add tokenizer load
* change import order
* move imports
* refactor code
* import lib
* remove pretrainedmodel
* fix linting
* update patch
* fix order
* remove tokenizer class
* use tokenizer class
* no copy
* add case for model is an instance
* fix optional
* add ut
* set default to None
* change models
* Update haystack/nodes/prompt/invocation_layer/hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/prompt/invocation_layer/hugging_face.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* add unit tests
* add unit tests
* remove lib
* formatting
* formatting
* formatting
* add release note
* Update releasenotes/notes/load-tokenizer-if-not-load-by-transformers-5841cdc9ff69bcc2.yaml
Co-authored-by: bogdankostic <bogdankostic@web.de>
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 11:42:23 +02:00
Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker ( #5398 )
...
* Introduce DiversityRanker
* improve most_diverse_order speed
* Compute mean for numerical stability
* Add release note
* Add cosine similarity
* Test both dot product and cosine similarity
* Add pydocs hook
---------
Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
bogdankostic
a51ca19fe4
feat: Add TextFileToDocument
component (v2) ( #5467 )
...
* Add TextfileToDocument component
* Add docstrings
* Add unit tests
* Add release note file
* Make use of progress bar
* Add TextfileToDocument to __init__.py
* Use lazy % formatting in logging functions
* Remove f from non-f-string
* Add TextfileToDocument to __init__.py
* Use correct dependency extra
* Compare file path against path object
* PR feedback
* PR feedback
* Update haystack/preview/components/file_converters/txt.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update docstrings
* Add error handling
* Add unit test
* Reintroduce falsely removed caplog
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-01 11:34:52 +02:00
Stefano Fiorucci
6f534873a5
fix: restrict supports
method in the OpenAI invocation layer and a similar method in the EmbeddingRetriever
( #5458 )
...
* restrict OpenAI supports method
* better note
* Update releasenotes/notes/restrict-openai-supports-method-fb126583e4beb057.yaml
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-31 13:14:22 +02:00
Massimiliano Pippi
363f3edbf7
feat: add reno
to manage release notes ( #5397 )
...
* first draft
* add release notes
* remove old settings
* add reno usage instructions
* page the docs team when release notes are added
* add reno to the dev dependencies
* Apply suggestions from code review
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-24 17:02:46 +02:00