3803 Commits

Author SHA1 Message Date
Julian Risch
f38f365682
fix: Error message about weight param in RecentnessRanker (#5409)
* fix: error message about weight param in RecentnessRanker

* trigger GitHub actions

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-07-24 10:41:17 +02:00
Vladimir Blagojevic
597df1414c
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes (#5406)
* Update Claude support with the latest models, new streaming API, context window sizes

* Use Github Anthropic SDK link for tokenizer, revert _init_tokenizer

* Change example key name to ANTHROPIC_API_KEY
2023-07-21 13:33:07 +02:00
Stefano Fiorucci
1706b662db
build: upgrade transformers to v4.31.0 (#5391)
* Update transformers

* fix the forgotten pin
2023-07-21 09:30:03 +02:00
Massimiliano Pippi
a13ffcf9df
bump pydoc-markdown (#5405) 2023-07-20 16:48:08 +02:00
elundaeva
612c6779fb
feat: RecentnessRanker (#5301)
* recency reranker code

* removed

* readd

* edited code

* edit

* mypy test fix

* adding warnings for score method

* fix

* fix

* adding paper link

* comments implementation

* change to predict and predict_batch

* change to predict and predict_batch 2

* adding unit test

* fixes

* small fixes

* fix for unit test

* table driven test

* small fixes

* small fixes2

* adding predict_batch tests

* add recentness_ranker to api reference docs

* implementing feedback

* implementing feedback2

* implementing feedback3

* implementing feedback4

* implementing feedback5

* remove document_map, remove final check if score is not None

* add final check if doc score is not None for mypy

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-20 16:20:45 +02:00
bogdankostic
c2506866bd
docs: Pin PyYAML to 5.3.1 (#5400) 2023-07-20 15:31:58 +02:00
Julian Risch
eeb29b5686
test: Re-activate end-to-end tests workflow (#5343)
* Install haystack with required extras

* remove whitespaces

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add sleep

* Add s for seconds

* Move container initialization in workflow

* Update e2e.yml

add nightly run

* use new folder for initial e2e test

* use file hash for caching and trigger on push to branch

* remove \n from model names read from file

* remove trigger on push to branch

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-20 11:48:51 +02:00
Sebastian Husch Lee
f7642e83ea
feat: Add embed_meta_fields to Ranker nodes (#5361)
* Adding embed_meta_fields to ranker nodes

* Fix tests by adding case where embed_meta_fields=None

* Adding unit test for _add_meta_fields_to_docs

* Fix pylint

* Add unit test

* Added another unit test. Caught a bug.

* Adding more unit tests

* Add unit test

* Updating some older tests into unit tests using mocking

* Convert another test to unit test

* Test run method

* One last unit test
2023-07-18 09:11:51 +02:00
elundaeva
e0cf1421c6
proposal: Add RecentnessRanker component (#5289)
proposal for adding Recentness Ranker to Haystack
2023-07-17 16:33:47 +02:00
Fanli Lin
09a1d3c0dc
remove duplicate (#5368) 2023-07-17 16:24:02 +02:00
ZanSara
8f3fe85878
feat: extend pipeline.add_component to support stores (#5261)
* add protocol and adapt pipeline

* change API in pipeline.add_component

* adapt pipeline tests

* adapt memoryretriever

* additional checks

* separate protocol and mixin

* review feedback & update tests

* pylint

* Update haystack/preview/document_stores/protocols.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/document_stores/memory/document_store.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* docstring of Store

* adapt memorydocumentstore

* fix tests

* remove direct inheritance

* pylint

* Update haystack/preview/document_stores/mixins.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test/preview/components/retrievers/test_memory_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* test names

* revert suggestion

* private self._stores

* move asserts out

* remove protocols

* review feedback

* review feedback

* fix tests

* mypy

* review feedback

* fix tests & other details

* naming

* mypy

* fix tests

* typing

* partial review feedback

* move .store to input dataclass

* Revert "move .store to input dataclass"

This reverts commit 53f624b99f3414c89d5134711725b31bd94ef77a.

* disable reusing components with stores

* disable sharing components with docstores

* Update mixins.py

* black

* upgrade canals & fix tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-17 15:06:19 +02:00
Vladimir Blagojevic
adfabdd648
Improve token limit tests for OpenAI PromptNode layer (#5351) 2023-07-17 14:03:03 +02:00
Ikko Eltociear Ashimine
35b2c99f43
chore: fix typo in base.py (#5356)
paramters -> parameters
2023-07-13 18:40:21 +02:00
Fanli Lin
9891bfeddd
fix: a small bug in StopWordsCriteria (#5316) 2023-07-13 15:58:06 +02:00
bogdankostic
237d67dbfd
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 (#5320)
* Check ES server version + add support for ES <= 7.5

* Adapt comment

* PR feedback
2023-07-13 14:50:43 +02:00
Daria Fokina
63fd63ff23
fix: update WebRetriever docstrings and default mode (#5352)
* update WebRetriever docstrings

* snippets as default mode
2023-07-13 13:57:52 +02:00
Vladimir Blagojevic
f21005f8ea
refactor: Extract link retrieval from WebRetriever, introduce LinkContentRetriever (#5227)
* Extract link retrieval from WebRetriever, introduce LinkContentRetriever

* Add example
---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-07-13 12:54:40 +02:00
MichelBartels
fd350bbb8f
fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed (#5308)
---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-13 12:52:56 +02:00
ZanSara
7848f00d01
feat: upgrade canals in preview (#5344)
* upgrade nodes

* linting
2023-07-13 12:30:49 +02:00
Stefano Fiorucci
8750d92763
pin scikit-learn>=1.3.0 (#5322) 2023-07-13 11:11:28 +02:00
Daria Fokina
a152a812da
create invocation-layers API reference page (#5262)
* create invocation-layers.yml

* Update invocation-layers.yml
2023-07-12 17:48:27 +02:00
Sebastian Husch Lee
b5aef24a7e
feat: Add support for meta fields that are lists when using embed_meta_fields (#5307)
* Add support for meta fields that are lists when using embed_meta_fields

* Make sure unit test doesn't download model

* Adding more unit tests
2023-07-11 17:32:33 +02:00
Stefano Fiorucci
6632505540
chore: deprecate SklearnQueryClassifier (#5324)
* pin scikit-learn, deprecate SklearnQueryClassifier

* rm scikit-learn pin
2023-07-11 17:07:23 +02:00
Sebastian Husch Lee
22750d342c
test: Refactor some retriever tests into unit tests (#5306)
* Modify and reactivate two unit tests

* Refactor openai embedding tests into unit tests

* Update test_retriever.py

* Changing tests
2023-07-11 13:36:23 +02:00
Fanli Lin
514f93a6eb
fix: num_return_sequences should be less than num_beams, not top_k (#5280)
* formatting

* remove top_k variable

* add pytest

* add numbers

* string formatting

* fix formatting

* revert

* extend tests with assertions for num_return_sequences

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-11 12:20:21 +02:00
bogdankostic
41668f26d6
ci: Update labeler.yml to account for Elasticsearch changes (#5318) 2023-07-11 11:51:49 +02:00
Sebastian Husch Lee
2703c2d483
docs: Small documentation updates to dense.py (#5305)
* Small documentation updates

* Update doc strings
2023-07-10 18:16:49 +02:00
bogdankostic
b7f683bfa4
ci: Add unit test for Elasticsearch8 (#5300)
* Add job for ES8 integration tests

* Add unit test for Elasticsearch 8

* Add tests.yml

* Adapt tests.yml

* Remove added white space

* Adapt tests.yml

* Adapt tests.yml

* Add dependencies to unit test name

* Adapt unit test matrix

* Adapt unit test matrix

* Adapt unit test matrix

* Adapt unit test matrix

* Update tests.yml

* Create separate tests where necessary

* Fix skip

* Adapt tests
2023-07-10 16:03:50 +02:00
bogdankostic
048fc7f640
ci: Add job for ES8 integration tests (#5297)
* Add job for ES8 integration tests

* Remove whitespace

* Fix filename

* Add tests.yml

* Revert "Add tests.yml"

This reverts commit ec12654d4e146b5ef6cba04ad82f5973935d8520.
2023-07-10 10:43:05 +02:00
bogdankostic
206b21816c
chore: Adapt import message for Elasticsearch7 (#5295)
* Adapt import message for es7.ElasticsearchDocumentStore

* Move import statement
2023-07-10 10:21:26 +02:00
bogdankostic
86d1fb5e1c
builld: Add elasticsearch7 and elasticsearch8 extra (#5296) 2023-07-10 09:59:51 +02:00
Silvano Cerza
d6f855cbc5
chore: Add support for hierarchical docs (#5278)
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-07 17:00:29 +02:00
tstadel
9acb275680
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests (#5113)
* use _source on opensearch bulk requests

* fix label bulk requests

* add tests

* fix test

* apply feedback

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-07 15:12:50 +02:00
ZanSara
13bed30504
feat: batch mode for MemoryRetriever (v2) (#5287)
* memoryretriever batch mode

* typing of output
2023-07-07 12:10:35 +02:00
ZanSara
f49bd3a12f
feat: introduce Store protocol (v2) (#5259)
* add protocol and adapt pipeline

* review feedback & update tests

* pylint

* Update haystack/preview/document_stores/protocols.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/document_stores/memory/document_store.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* docstring of Store

* adapt memorydocumentstore

* fix tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-07 12:10:08 +02:00
Stefano Fiorucci
90ff3817e7
feat: support OpenAI-Organization for authentication (#5292)
* add openai_organization to invocation layer, generator and retriever

* added tests
2023-07-07 12:02:21 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators (#5291)
* Add isolated node eval to BaseGenerator's run_batch

* Add unit tests
2023-07-07 10:32:43 +02:00
Vladimir Blagojevic
395854d823
Add cpu-remote-inference Docker image (#5225)
* Add cpu-remote-inference Docker image

* Add web lfqa pipeline as an example for cpu-remote-inference Docker image

* WebRetriever must have document_store attribute

* Add cpu-remote-inference-latest

* Add image testing in CI

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-07-07 10:23:14 +02:00
MichelBartels
08f1865ddd
fix: Improve robustness of get_task HF pipeline invocations (#5284)
* replace get_task method and change invocation layer order

* add test for invocation layer order

* add test documentation

* make invocation layer test more robust

* fix type annotation

* change hf timeout

* simplify timeout mock and add get_task exception cause

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-06 16:33:44 +02:00
Vladimir Blagojevic
ac412193cc
refactor: Simplify selection of Azure vs OpenAI invocation layers (#5271) 2023-07-06 13:23:13 +02:00
Silvano Cerza
a1a390056a
Remove requests_cache in tests (#5285) 2023-07-06 13:22:52 +02:00
Vladimir Blagojevic
ad6072728d
Add dependencies to build lxml successfully (#5288) 2023-07-06 12:53:28 +02:00
bogdankostic
fd25106c88
test: Adapt batch size in retriever-reader benchmarks (#5281) 2023-07-06 10:42:34 +02:00
Sebastian Husch Lee
da2c9b4799
test: Update test/others/test_utils.py (#5270)
* Add unit test mark for appropriate tests

* Remove deepset Cloud specific tests

* Create pytest fixtures

* Reduce number of checks run for test_match_context_multi_process and test_match_context_single_process

* Increase speed of test_match_contexts_multi_process

* Revert "Remove deepset Cloud specific tests"

This reverts commit b65173665f3e873f17f3613c5fd4fa3174a6d71b.

* Continuing revert commit

* Remove unnecessary comment

* Break down bigger test into smaller tests
2023-07-05 12:00:32 +02:00
Sebastian Husch Lee
12f319b4c9
Remove deprecated return_table_cell from conftest.py (#5264) 2023-07-05 09:37:41 +02:00
Massimiliano Pippi
00efa514ca
refactor: remove Elasticsearch client version 8 deprecation warnings (#5245)
* remove deprecation warnings

* remove leftover
2023-07-04 14:17:34 +02:00
Sebastian Husch Lee
87281b2e10
Fix to_dict and from_dict of Multilabel such that to_dict outputs a json serializable object (using Label.to_dict()) (#5257) 2023-07-04 12:44:11 +02:00
Malte Reimann
195077eca9
fix: import_utils fetch_archive_from_http - improve url parsing for fetching archive from http (#5199)
* Use urlparse to get file extension for urls that contain text after the file extension such as query parameters

* Run pre-commit to fix format

* Reformat import_utils

* Document get_filename_extension_from_url

* Formatting

* Formatting

* Update haystack/utils/import_utils.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Update haystack/utils/import_utils.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-04 10:20:58 +02:00
ZanSara
4b380d8fb0
fix: install inference in REST API tests (#5252)
* install inference in restapi tests

* add workflow dispatch to test the REST API CI in PR

* trigger ci

* tablecell

* tablecell

* revert ci trigger

* mypy
2023-07-03 15:10:14 +02:00
Vladimir Blagojevic
1066e959a2
bug: fix for pinecone not working for per document updates (#5110) 2023-07-03 14:07:52 +02:00