2813 Commits

Author SHA1 Message Date
Massimiliano Pippi
714b944dc2
chore: rename store to document_store for clarity (#5547)
* store -> document_store

* fix leftovers

* fix import name

* moar leftovers

* rebase on main, update MemoryDocumentStore to the new protocol

* Update haystack/preview/pipeline.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-12 08:44:36 +02:00
Agnieszka Marzec
e7532c49cf
Add cohere ranker api (#5549) 2023-08-11 17:47:36 +02:00
Silvano Cerza
a7416bcf89
Add to_dict and from_dict methods for Stores (#5541)
* Add to_dict and from_dict methods for Stores

* Add release notes

* Add tests with custom init parameters
2023-08-11 14:45:56 +02:00
Vladimir Blagojevic
094d8578bd
feat: Update Docker readme (#5536)
* Update Docker readme

* Update wording

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-11 14:06:12 +02:00
Massimiliano Pippi
d73d443bc0
test: ease testing for 3rd parties (#5539)
* ease testing for 3rd parties

* fix __all__

* uniform error management

* raise the same filter error

* raise the same filter error

* fix circular import
2023-08-10 17:13:15 +02:00
Silvano Cerza
168b7c806c
Add _store_name field to StoreAwareMixin to ease serialisation (#5531) 2023-08-10 15:42:19 +02:00
Tuana Çelik
4bb22c9665
Update weaviate.py (#5469)
Updating the weaviate docstrings to replace the old URL with the new correct one. The old one now gives a 404
2023-08-10 15:37:55 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler (#5374)
* Add content type resolution, pdf handler, user agent switching
---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
Stefano Fiorucci
52133d3a81
proposal: Embedders design (#5390)
* first draft

* rename

* refinements

* added clarifications

* improvements

* improvements

* improvements

* further improvements

* fix typo

* Apply suggestions from code review

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* adapt to new Canals I/O

* fix links to previous proposals

* fix

* add migration example: update_embeddings

* rename EmbeddingService to EmbeddingBackend

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-09 17:09:30 +02:00
ZanSara
5ca4874df9
Migrate existing v2 components to Canals 0.4.0 (#5532)
* pin canals==0.4.0

* update audio components

* allow audio components to receive whisper_params in init too

* migrating memoryretriever

* migrate memoryretriever

* migrate TextFileToDocument

* fix TextFileToDocument tests

* fix pipeline tests

* fix defaults management

* reno

* inverted assignments

* Simplify release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-09 15:51:32 +02:00
Silvano Cerza
83fce1bd72
Add Store class factory (#5530)
* Add Store class factory

* Add release notes
2023-08-09 13:09:36 +02:00
HP
ff86af576a
fix: TransformersImageToText.generate_captions accepts "str" #5485 (#5491)
* fix: TransformersImageToText.generate_captions accepts "str" #5485 -- fix author email

* fix: TransformersImageToText.generate_captions accepts "str" #5485 - fix mypy, pylint, black issues

* fix: TransformersImageToText.generate_captions accepts "str" #5485 - changes after pr review
2023-08-09 09:54:12 +02:00
ZanSara
c27622e1bc
chore: normalize more optional imports (#5251)
* docstore filters

* modeling metrics

* doc language classifier

* file converter

* docx converter

* tika

* preprocessor

* context matcher

* pylint
2023-08-09 09:27:53 +02:00
Stefano Fiorucci
30e6c7ac43
build: pin safetensors (#5528)
* pin safetensors

* rm unneeded optional pin
2023-08-08 18:05:56 +02:00
Vladimir Blagojevic
227bf6ca39
feat: Remove template variables from PromptNode invocation kwargs (#5526)
* Remove template params from kwargs before passing kwargs to invocation layer

* More unit tests

* Add release note

* Enable simple prompt node pipeline integration test use case
2023-08-08 16:40:23 +02:00
Vladimir Blagojevic
84ed954c8c
feat: Improve performance and add default media support in FileTypeClassifier (#5083)
* feat: add media outgoing edge to FileTypeClassifier

* Add release note

* Update language

---------

Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:51:07 +02:00
tstadel
d46c84bb61
feat: support dynamic filters in custom_query (#5427)
* support filters in custom_query

* better tests

* Update docstrings

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:48:15 +02:00
Stefano Fiorucci
3f472995bb
refactor: update Crawler to support selenium>=4.11.0 and simplify it (#5515)
* refactor crawler

* rm unused imports

* release notes!

* rm outdated mock
2023-08-08 15:13:22 +02:00
Vladimir Blagojevic
37cf1fe49c
Tests in e2e/nodes/test_summarizer.py could be removed as pipeline e2e tests cover SearchSummarizationPipeline already (#5454)
Tests in e2e/nodes/test_translator.py can be removed as unit tests exist for translattor and e2e test mostly tests just that the model is good, which is nothing we should test for
2023-08-08 13:21:11 +02:00
Fanli Lin
f6b50cfdf9
fix: StopWordsCriteria doesn't compare the stop word token ids with the input ids in a continuous and sequential order (#5503)
* bug fix

* add release note

* add unit test

* refactor

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-08-08 08:35:10 +02:00
Daria Fokina
99cb95a63a
docs: separate abstract classes into separate API references (#5501)
* separate_abstractions

* img-to-text parent slug upd

* Apply suggestions from code review

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-07 12:21:25 +02:00
Massimiliano Pippi
ac4e762422
Fix datadog client init (#5524) 2023-08-07 12:18:46 +02:00
Stefano Fiorucci
43d4730b6c
remove reference to the UI directory (#5522) 2023-08-07 11:52:37 +02:00
Fanli Lin
4496fc6afd
fix: leading whitespace is missing in the generated text when using stop_words (#5511)
* bug fix

* add release note

* Update releasenotes/notes/fix-stop-words-strip-issue-22ce51306e7b91e4.yaml

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Update releasenotes/notes/fix-stop-words-strip-issue-22ce51306e7b91e4.yaml

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-04 17:40:19 +02:00
Vladimir Blagojevic
abc6737e63
feat: Improve LFQA Web Example (#5504)
* Improve web_lfqa example

* Turn off pylint for logging setup

* Another way to turn off logging
2023-08-04 14:20:06 +02:00
Massimiliano Pippi
c079576a87
chore: move base test class into haystack core (#5509)
* move base test class into haystack core

* fix linter

* do not compute coverage of testing code
2023-08-04 12:42:13 +02:00
tstadel
d26d4201fc
feat: support search_fields in DeepsetCloudDocumentStore (#5455)
* feat: support search_fields in DeepsetCloudDocumentStore

* add reno file

* make search_fields plain init arg

* Update lg

* Update releasenotes/notes/deepset-cloud-document-store-search-fields-40b2322466f808a3.yaml

* Update haystack/document_stores/deepsetcloud.py

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-04 11:13:05 +02:00
Vladimir Blagojevic
d96c963bc4
test: Convert two HFLocalInvocationLayer integration to unit tests (#5446)
* Convert two HFLocalInvocationLayer integration to unit tests

* Simplify unit test

* Improve HFLocalInvocationLayer unit tests
2023-08-03 17:41:32 +02:00
Daria Fokina
1f88cd165f
Update hugging_face.py (#5488) 2023-08-03 13:34:45 +02:00
bogdankostic
56cea8cbbd
test: Add scripts to send benchmark results to datadog (#5432)
* Add config files

* log benchmarks to stdout

* Add top-k and batch size to configs

* Add batch size to configs

* fix: don't download files if they already exist

* Add batch size to configs

* refine script

* Remove configs using 1m docs

* update run script

* update run script

* update run script

* datadog integration

* remove out folder

* gitignore benchmarks output

* test: send benchmarks to datadog

* remove uncommented lines in script

* feat: take branch/tag argument for benchmark setup script

* fix: run.sh should ignore errors

* Remove changes unrelated to datadog

* Apply black

* Update test/benchmarks/utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR feedback

* Account for reader benchmarks not doing indexing

* Change key of reader metrics

* Apply PR feedback

* Remove whitespace

---------

Co-authored-by: rjanjua <rohan.janjua@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-03 10:09:00 +02:00
bogdankostic
a26859f065
docs: Add inherited methods to API reference documentation (#5273)
* Add inherited methods to API reference documentation

* Fix typing
2023-08-02 18:54:15 +02:00
Vladimir Blagojevic
1876c41f07
feat: Add LostInTheMiddleRanker (#5457)
* Add lost in the middle ranker

* Add release note

* Julian's feedback: more precise version of truncate

* Better comments for the litm algorithm

* Sebastian PR feedback

* Add check for invalid values of word_count_threshold

* Remove _truncate as it is not needed any more

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-08-02 17:05:13 +02:00
Vladimir Blagojevic
0efe0ee7b3
feat: Add top_k parameter to DiversityRanker init method (#5494)
* Add top_k

* Add release note
2023-08-02 17:04:04 +02:00
Fanli Lin
8d04f28e11
fix: hf agent outputs the prompt text while the openai agent not (#5461)
* add skil prompt

* fix formatting

* add release note

* add release note

* Update releasenotes/notes/add-skip-prompt-for-hf-model-agent-89aef2838edb907c.yaml

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

* Update haystack/nodes/prompt/invocation_layer/handlers.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/prompt/invocation_layer/handlers.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/prompt/invocation_layer/hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* add a unit test

* add a unit test2

* add skil prompt

* Revert "add skil prompt"

This reverts commit b1ba938c94b67a4fd636d321945990aabd2c5b2a.

* add unit test

---------

Co-authored-by: Daria Fokina <daria.f93@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 16:34:33 +02:00
Fanli Lin
73fa796735
fix: enable passing max_length for text2text-generation task (#5420)
* bug fix

* add unit test

* reformatting

* add release note

* add release note

* Update releasenotes/notes/enable-set-max-length-during-runtime-097d65e537bf800b.yaml

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* bug fix

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 14:13:30 +02:00
Vladimir Blagojevic
40a2e9b56a
refactor: Update WebRetriever to use LinkContentFetcher (#5229)
* Refactor WebRetriever to use LinkContentFetcher

* PR feedback

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-02 12:45:03 +02:00
Fanli Lin
f7fd5eeb4f
feat: enable loading tokenizer for models that are not supported by the transformers library (#5314)
* add tokenizer load

* change import order

* move imports

* refactor code

* import lib

* remove pretrainedmodel

* fix linting

* update patch

* fix order

* remove tokenizer class

* use tokenizer class

* no copy

* add case for model is an instance

* fix optional

* add ut

* set default to None

* change models

* Update haystack/nodes/prompt/invocation_layer/hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/prompt/invocation_layer/hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* add unit tests

* add unit tests

* remove lib

* formatting

* formatting

* formatting

* add release note

* Update releasenotes/notes/load-tokenizer-if-not-load-by-transformers-5841cdc9ff69bcc2.yaml

Co-authored-by: bogdankostic <bogdankostic@web.de>

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 11:42:23 +02:00
bogdankostic
97e4522a83
build: Remove upper bound for weaviate client (#5486)
* Set upper bound for boto3 and botocore versions

* Set lower bound for weaviate client

* Remove upper bound for version from weaviate

* Add release note

* Update release note

* Remove release note
2023-08-02 11:08:50 +02:00
Bilge Yücel
37bdfddff5
Fix Agent API (#5483)
* Fix agent.yml for new modules

* Fix ConversationalAgent docstrings
2023-08-01 17:05:13 +03:00
Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker (#5398)
* Introduce DiversityRanker

* improve most_diverse_order speed

* Compute mean for numerical stability

* Add release note

* Add cosine similarity 

* Test both dot product and cosine similarity

* Add pydocs hook

---------

Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
Malte Pietsch
8c017ccc32
Update installation instructions in README.md (#5480) 2023-08-01 12:33:40 +02:00
Silvano Cerza
bc152d953c
Skip running tests in CI when editing docs Python files (#5482) 2023-08-01 12:31:24 +02:00
Silvano Cerza
9a359101fd
chore: Rework docs generation (#5481)
* Change docs generation to use id for parent doc instead of slug

* Rename step
2023-08-01 12:18:33 +02:00
bogdankostic
a51ca19fe4
feat: Add TextFileToDocument component (v2) (#5467)
* Add TextfileToDocument component

* Add docstrings

* Add unit tests

* Add release note file

* Make use of progress bar

* Add TextfileToDocument to __init__.py

* Use lazy % formatting in logging functions

* Remove f from non-f-string

* Add TextfileToDocument to __init__.py

* Use correct dependency extra

* Compare file path against path object

* PR feedback

* PR feedback

* Update haystack/preview/components/file_converters/txt.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update docstrings

* Add error handling

* Add unit test

* Reintroduce falsely removed caplog

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-01 11:34:52 +02:00
Muhammad Bilal
8920fd6939
feat: add optional index selection for endpoints (#5444)
* add index selection

* reformatting

* updated test script
2023-08-01 10:47:46 +02:00
Bilge Yücel
62029ba441
Add AgentStep to api reference (#5402) 2023-07-31 19:26:34 +03:00
Stefano Fiorucci
6f534873a5
fix: restrict supports method in the OpenAI invocation layer and a similar method in the EmbeddingRetriever (#5458)
* restrict OpenAI supports method

* better note

* Update releasenotes/notes/restrict-openai-supports-method-fb126583e4beb057.yaml

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-31 13:14:22 +02:00
Massimiliano Pippi
d9fd1ab7bc
feat!: remove original files after indexing (#5459)
* remove original files after indexing

* fix tests
2023-07-31 13:07:16 +02:00
Massimiliano Pippi
5f01391827
add workflow to check presence of release notes (#5449) 2023-07-27 10:40:40 +02:00
Stefano Fiorucci
672813052d
Update invocation-layers.yml (#5445) 2023-07-26 15:39:08 +02:00