3038 Commits

Author SHA1 Message Date
Massimiliano Pippi
c079576a87
chore: move base test class into haystack core (#5509)
* move base test class into haystack core

* fix linter

* do not compute coverage of testing code
2023-08-04 12:42:13 +02:00
tstadel
d26d4201fc
feat: support search_fields in DeepsetCloudDocumentStore (#5455)
* feat: support search_fields in DeepsetCloudDocumentStore

* add reno file

* make search_fields plain init arg

* Update lg

* Update releasenotes/notes/deepset-cloud-document-store-search-fields-40b2322466f808a3.yaml

* Update haystack/document_stores/deepsetcloud.py

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-04 11:13:05 +02:00
Vladimir Blagojevic
d96c963bc4
test: Convert two HFLocalInvocationLayer integration to unit tests (#5446)
* Convert two HFLocalInvocationLayer integration to unit tests

* Simplify unit test

* Improve HFLocalInvocationLayer unit tests
2023-08-03 17:41:32 +02:00
Daria Fokina
1f88cd165f
Update hugging_face.py (#5488) 2023-08-03 13:34:45 +02:00
bogdankostic
56cea8cbbd
test: Add scripts to send benchmark results to datadog (#5432)
* Add config files

* log benchmarks to stdout

* Add top-k and batch size to configs

* Add batch size to configs

* fix: don't download files if they already exist

* Add batch size to configs

* refine script

* Remove configs using 1m docs

* update run script

* update run script

* update run script

* datadog integration

* remove out folder

* gitignore benchmarks output

* test: send benchmarks to datadog

* remove uncommented lines in script

* feat: take branch/tag argument for benchmark setup script

* fix: run.sh should ignore errors

* Remove changes unrelated to datadog

* Apply black

* Update test/benchmarks/utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR feedback

* Account for reader benchmarks not doing indexing

* Change key of reader metrics

* Apply PR feedback

* Remove whitespace

---------

Co-authored-by: rjanjua <rohan.janjua@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-03 10:09:00 +02:00
bogdankostic
a26859f065
docs: Add inherited methods to API reference documentation (#5273)
* Add inherited methods to API reference documentation

* Fix typing
2023-08-02 18:54:15 +02:00
Vladimir Blagojevic
1876c41f07
feat: Add LostInTheMiddleRanker (#5457)
* Add lost in the middle ranker

* Add release note

* Julian's feedback: more precise version of truncate

* Better comments for the litm algorithm

* Sebastian PR feedback

* Add check for invalid values of word_count_threshold

* Remove _truncate as it is not needed any more

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-08-02 17:05:13 +02:00
Vladimir Blagojevic
0efe0ee7b3
feat: Add top_k parameter to DiversityRanker init method (#5494)
* Add top_k

* Add release note
2023-08-02 17:04:04 +02:00
Fanli Lin
8d04f28e11
fix: hf agent outputs the prompt text while the openai agent not (#5461)
* add skil prompt

* fix formatting

* add release note

* add release note

* Update releasenotes/notes/add-skip-prompt-for-hf-model-agent-89aef2838edb907c.yaml

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

* Update haystack/nodes/prompt/invocation_layer/handlers.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/prompt/invocation_layer/handlers.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/prompt/invocation_layer/hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* add a unit test

* add a unit test2

* add skil prompt

* Revert "add skil prompt"

This reverts commit b1ba938c94b67a4fd636d321945990aabd2c5b2a.

* add unit test

---------

Co-authored-by: Daria Fokina <daria.f93@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 16:34:33 +02:00
Fanli Lin
73fa796735
fix: enable passing max_length for text2text-generation task (#5420)
* bug fix

* add unit test

* reformatting

* add release note

* add release note

* Update releasenotes/notes/enable-set-max-length-during-runtime-097d65e537bf800b.yaml

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test/prompt/invocation_layer/test_hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* bug fix

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 14:13:30 +02:00
Vladimir Blagojevic
40a2e9b56a
refactor: Update WebRetriever to use LinkContentFetcher (#5229)
* Refactor WebRetriever to use LinkContentFetcher

* PR feedback

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-08-02 12:45:03 +02:00
Fanli Lin
f7fd5eeb4f
feat: enable loading tokenizer for models that are not supported by the transformers library (#5314)
* add tokenizer load

* change import order

* move imports

* refactor code

* import lib

* remove pretrainedmodel

* fix linting

* update patch

* fix order

* remove tokenizer class

* use tokenizer class

* no copy

* add case for model is an instance

* fix optional

* add ut

* set default to None

* change models

* Update haystack/nodes/prompt/invocation_layer/hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/prompt/invocation_layer/hugging_face.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* add unit tests

* add unit tests

* remove lib

* formatting

* formatting

* formatting

* add release note

* Update releasenotes/notes/load-tokenizer-if-not-load-by-transformers-5841cdc9ff69bcc2.yaml

Co-authored-by: bogdankostic <bogdankostic@web.de>

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-08-02 11:42:23 +02:00
bogdankostic
97e4522a83
build: Remove upper bound for weaviate client (#5486)
* Set upper bound for boto3 and botocore versions

* Set lower bound for weaviate client

* Remove upper bound for version from weaviate

* Add release note

* Update release note

* Remove release note
2023-08-02 11:08:50 +02:00
Bilge Yücel
37bdfddff5
Fix Agent API (#5483)
* Fix agent.yml for new modules

* Fix ConversationalAgent docstrings
2023-08-01 17:05:13 +03:00
Vladimir Blagojevic
540d0fad97
feat: Add DiversityRanker (#5398)
* Introduce DiversityRanker

* improve most_diverse_order speed

* Compute mean for numerical stability

* Add release note

* Add cosine similarity 

* Test both dot product and cosine similarity

* Add pydocs hook

---------

Co-authored-by: Michel Bartels <login@michelbartels.com>
2023-08-01 12:48:34 +02:00
Malte Pietsch
8c017ccc32
Update installation instructions in README.md (#5480) 2023-08-01 12:33:40 +02:00
Silvano Cerza
bc152d953c
Skip running tests in CI when editing docs Python files (#5482) 2023-08-01 12:31:24 +02:00
Silvano Cerza
9a359101fd
chore: Rework docs generation (#5481)
* Change docs generation to use id for parent doc instead of slug

* Rename step
2023-08-01 12:18:33 +02:00
bogdankostic
a51ca19fe4
feat: Add TextFileToDocument component (v2) (#5467)
* Add TextfileToDocument component

* Add docstrings

* Add unit tests

* Add release note file

* Make use of progress bar

* Add TextfileToDocument to __init__.py

* Use lazy % formatting in logging functions

* Remove f from non-f-string

* Add TextfileToDocument to __init__.py

* Use correct dependency extra

* Compare file path against path object

* PR feedback

* PR feedback

* Update haystack/preview/components/file_converters/txt.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update docstrings

* Add error handling

* Add unit test

* Reintroduce falsely removed caplog

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-01 11:34:52 +02:00
Muhammad Bilal
8920fd6939
feat: add optional index selection for endpoints (#5444)
* add index selection

* reformatting

* updated test script
2023-08-01 10:47:46 +02:00
Bilge Yücel
62029ba441
Add AgentStep to api reference (#5402) 2023-07-31 19:26:34 +03:00
Stefano Fiorucci
6f534873a5
fix: restrict supports method in the OpenAI invocation layer and a similar method in the EmbeddingRetriever (#5458)
* restrict OpenAI supports method

* better note

* Update releasenotes/notes/restrict-openai-supports-method-fb126583e4beb057.yaml

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-31 13:14:22 +02:00
Massimiliano Pippi
d9fd1ab7bc
feat!: remove original files after indexing (#5459)
* remove original files after indexing

* fix tests
2023-07-31 13:07:16 +02:00
Massimiliano Pippi
5f01391827
add workflow to check presence of release notes (#5449) 2023-07-27 10:40:40 +02:00
Stefano Fiorucci
672813052d
Update invocation-layers.yml (#5445) 2023-07-26 15:39:08 +02:00
Vladimir Blagojevic
409e3471cb
feat: Enable Support for Meta LLama-2 Models in Amazon Sagemaker (#5437)
* Enable Support for Meta LLama-2 Models in Amazon Sagemaker

* Improve unit test for invocation layers positioning

* Small adjustment, add more unit tests

* mypy fixes

* Improve unit tests

* Update test/prompt/invocation_layer/test_sagemaker_meta.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* PR feedback

* Add pydocs for newly extracted methods

* simplify is_proper_chat_*

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-07-26 15:26:39 +02:00
Silvano Cerza
9ab6298f1d
build: Unpin mlflow, constraint dulwich and botocore (#5441)
* Unpin mlflow

* Pin dulwich

* Pin botocore
2023-07-26 12:59:16 +02:00
Silvano Cerza
7940ec0482
Add @store decorator (#5438) 2023-07-26 09:32:23 +02:00
Vladimir Blagojevic
22897c17a2
fix:Improve log warnings in REST API /health endpoint (#5381)
* Improve warning in REST APIs get_health_status method

* Convert log message

* A better solution and documentation

* Add another nested try/except block

* Simplify
2023-07-25 17:06:03 +02:00
Julian Risch
5bb0a1f57a
Revert "fix: num_return_sequences should be less than num_beams, not top_k (#5280)" (#5434)
This reverts commit 514f93a6eb575d376b21d22e32080fac62cf785f.
2023-07-25 13:27:41 +02:00
Sebastian Husch Lee
2bc7fe1a08
test: reactivate unit tests in test_eval.py (#5255)
* Activate tests that follow unit test and integration test rules

* Adding more integration labels

* Change name to better reflect complexity of test

* Remove mark integration tags, move test to doc store test for add_eval_data

* Removing incorrect integration label

* Deactivated document store test b/c it fails for Weaviate and pinecone

* Remove unit label since test needs to be refactored to be considered a unit test

* Undo changes

* Undo change

* Check every field in the load evaluation result

* Add back label and add skip reason

* Use pytest skip instead of TODO
2023-07-24 17:07:45 +02:00
Massimiliano Pippi
363f3edbf7
feat: add reno to manage release notes (#5397)
* first draft

* add release notes

* remove old settings

* add reno usage instructions

* page the docs team when release notes are added

* add reno to the dev dependencies

* Apply suggestions from code review

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-07-24 17:02:46 +02:00
github-actions[bot]
afabc785c3
Update unstable version (#5424)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2023-07-24 16:59:49 +02:00
bogdankostic
345dbeb638
docs: Add Elasticsearch to API config (#5422) 2023-07-24 16:23:13 +02:00
Nicola Procopio
8a2ab82651
feat: Added hybrid search example (#5376)
* added hybrid search example

Added an example about hybrid search for faq pipeline on covid dataset

* formatted with back formatter

* renamed document

* fixed

* fixed typos

* added test

added test for hybrid search

* fixed withespaces

* removed test for hybrid search

* fixed pylint

* commented logging
2023-07-24 12:54:21 +02:00
Julian Risch
f38f365682
fix: Error message about weight param in RecentnessRanker (#5409)
* fix: error message about weight param in RecentnessRanker

* trigger GitHub actions

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-07-24 10:41:17 +02:00
Vladimir Blagojevic
597df1414c
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes (#5406)
* Update Claude support with the latest models, new streaming API, context window sizes

* Use Github Anthropic SDK link for tokenizer, revert _init_tokenizer

* Change example key name to ANTHROPIC_API_KEY
2023-07-21 13:33:07 +02:00
Stefano Fiorucci
1706b662db
build: upgrade transformers to v4.31.0 (#5391)
* Update transformers

* fix the forgotten pin
2023-07-21 09:30:03 +02:00
Massimiliano Pippi
a13ffcf9df
bump pydoc-markdown (#5405) 2023-07-20 16:48:08 +02:00
elundaeva
612c6779fb
feat: RecentnessRanker (#5301)
* recency reranker code

* removed

* readd

* edited code

* edit

* mypy test fix

* adding warnings for score method

* fix

* fix

* adding paper link

* comments implementation

* change to predict and predict_batch

* change to predict and predict_batch 2

* adding unit test

* fixes

* small fixes

* fix for unit test

* table driven test

* small fixes

* small fixes2

* adding predict_batch tests

* add recentness_ranker to api reference docs

* implementing feedback

* implementing feedback2

* implementing feedback3

* implementing feedback4

* implementing feedback5

* remove document_map, remove final check if score is not None

* add final check if doc score is not None for mypy

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-20 16:20:45 +02:00
bogdankostic
c2506866bd
docs: Pin PyYAML to 5.3.1 (#5400) 2023-07-20 15:31:58 +02:00
Julian Risch
eeb29b5686
test: Re-activate end-to-end tests workflow (#5343)
* Install haystack with required extras

* remove whitespaces

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add sleep

* Add s for seconds

* Move container initialization in workflow

* Update e2e.yml

add nightly run

* use new folder for initial e2e test

* use file hash for caching and trigger on push to branch

* remove \n from model names read from file

* remove trigger on push to branch

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-20 11:48:51 +02:00
Sebastian Husch Lee
f7642e83ea
feat: Add embed_meta_fields to Ranker nodes (#5361)
* Adding embed_meta_fields to ranker nodes

* Fix tests by adding case where embed_meta_fields=None

* Adding unit test for _add_meta_fields_to_docs

* Fix pylint

* Add unit test

* Added another unit test. Caught a bug.

* Adding more unit tests

* Add unit test

* Updating some older tests into unit tests using mocking

* Convert another test to unit test

* Test run method

* One last unit test
2023-07-18 09:11:51 +02:00
elundaeva
e0cf1421c6
proposal: Add RecentnessRanker component (#5289)
proposal for adding Recentness Ranker to Haystack
2023-07-17 16:33:47 +02:00
Fanli Lin
09a1d3c0dc
remove duplicate (#5368) 2023-07-17 16:24:02 +02:00
ZanSara
8f3fe85878
feat: extend pipeline.add_component to support stores (#5261)
* add protocol and adapt pipeline

* change API in pipeline.add_component

* adapt pipeline tests

* adapt memoryretriever

* additional checks

* separate protocol and mixin

* review feedback & update tests

* pylint

* Update haystack/preview/document_stores/protocols.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/document_stores/memory/document_store.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* docstring of Store

* adapt memorydocumentstore

* fix tests

* remove direct inheritance

* pylint

* Update haystack/preview/document_stores/mixins.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* test names

* revert suggestion

* private self._stores

* move asserts out

* remove protocols

* review feedback

* review feedback

* fix tests

* mypy

* review feedback

* fix tests & other details

* naming

* mypy

* fix tests

* typing

* partial review feedback

* move .store to input dataclass

* Revert "move .store to input dataclass"

This reverts commit 53f624b99f3414c89d5134711725b31bd94ef77a.

* disable reusing components with stores

* disable sharing components with docstores

* Update mixins.py

* black

* upgrade canals & fix tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-17 15:06:19 +02:00
Vladimir Blagojevic
adfabdd648
Improve token limit tests for OpenAI PromptNode layer (#5351) 2023-07-17 14:03:03 +02:00
Ikko Eltociear Ashimine
35b2c99f43
chore: fix typo in base.py (#5356)
paramters -> parameters
2023-07-13 18:40:21 +02:00
Fanli Lin
9891bfeddd
fix: a small bug in StopWordsCriteria (#5316) 2023-07-13 15:58:06 +02:00
bogdankostic
237d67dbfd
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 (#5320)
* Check ES server version + add support for ES <= 7.5

* Adapt comment

* PR feedback
2023-07-13 14:50:43 +02:00