3174 Commits

Author SHA1 Message Date
bogdankostic
206b21816c
chore: Adapt import message for Elasticsearch7 (#5295)
* Adapt import message for es7.ElasticsearchDocumentStore

* Move import statement
2023-07-10 10:21:26 +02:00
bogdankostic
86d1fb5e1c
builld: Add elasticsearch7 and elasticsearch8 extra (#5296) 2023-07-10 09:59:51 +02:00
Silvano Cerza
d6f855cbc5
chore: Add support for hierarchical docs (#5278)
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-07 17:00:29 +02:00
tstadel
9acb275680
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests (#5113)
* use _source on opensearch bulk requests

* fix label bulk requests

* add tests

* fix test

* apply feedback

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-07 15:12:50 +02:00
ZanSara
13bed30504
feat: batch mode for MemoryRetriever (v2) (#5287)
* memoryretriever batch mode

* typing of output
2023-07-07 12:10:35 +02:00
ZanSara
f49bd3a12f
feat: introduce Store protocol (v2) (#5259)
* add protocol and adapt pipeline

* review feedback & update tests

* pylint

* Update haystack/preview/document_stores/protocols.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/document_stores/memory/document_store.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* docstring of Store

* adapt memorydocumentstore

* fix tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-07 12:10:08 +02:00
Stefano Fiorucci
90ff3817e7
feat: support OpenAI-Organization for authentication (#5292)
* add openai_organization to invocation layer, generator and retriever

* added tests
2023-07-07 12:02:21 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators (#5291)
* Add isolated node eval to BaseGenerator's run_batch

* Add unit tests
2023-07-07 10:32:43 +02:00
Vladimir Blagojevic
395854d823
Add cpu-remote-inference Docker image (#5225)
* Add cpu-remote-inference Docker image

* Add web lfqa pipeline as an example for cpu-remote-inference Docker image

* WebRetriever must have document_store attribute

* Add cpu-remote-inference-latest

* Add image testing in CI

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-07-07 10:23:14 +02:00
MichelBartels
08f1865ddd
fix: Improve robustness of get_task HF pipeline invocations (#5284)
* replace get_task method and change invocation layer order

* add test for invocation layer order

* add test documentation

* make invocation layer test more robust

* fix type annotation

* change hf timeout

* simplify timeout mock and add get_task exception cause

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-06 16:33:44 +02:00
Vladimir Blagojevic
ac412193cc
refactor: Simplify selection of Azure vs OpenAI invocation layers (#5271) 2023-07-06 13:23:13 +02:00
Silvano Cerza
a1a390056a
Remove requests_cache in tests (#5285) 2023-07-06 13:22:52 +02:00
Vladimir Blagojevic
ad6072728d
Add dependencies to build lxml successfully (#5288) 2023-07-06 12:53:28 +02:00
bogdankostic
fd25106c88
test: Adapt batch size in retriever-reader benchmarks (#5281) 2023-07-06 10:42:34 +02:00
Sebastian Husch Lee
da2c9b4799
test: Update test/others/test_utils.py (#5270)
* Add unit test mark for appropriate tests

* Remove deepset Cloud specific tests

* Create pytest fixtures

* Reduce number of checks run for test_match_context_multi_process and test_match_context_single_process

* Increase speed of test_match_contexts_multi_process

* Revert "Remove deepset Cloud specific tests"

This reverts commit b65173665f3e873f17f3613c5fd4fa3174a6d71b.

* Continuing revert commit

* Remove unnecessary comment

* Break down bigger test into smaller tests
2023-07-05 12:00:32 +02:00
Sebastian Husch Lee
12f319b4c9
Remove deprecated return_table_cell from conftest.py (#5264) 2023-07-05 09:37:41 +02:00
Massimiliano Pippi
00efa514ca
refactor: remove Elasticsearch client version 8 deprecation warnings (#5245)
* remove deprecation warnings

* remove leftover
2023-07-04 14:17:34 +02:00
Sebastian Husch Lee
87281b2e10
Fix to_dict and from_dict of Multilabel such that to_dict outputs a json serializable object (using Label.to_dict()) (#5257) 2023-07-04 12:44:11 +02:00
Malte Reimann
195077eca9
fix: import_utils fetch_archive_from_http - improve url parsing for fetching archive from http (#5199)
* Use urlparse to get file extension for urls that contain text after the file extension such as query parameters

* Run pre-commit to fix format

* Reformat import_utils

* Document get_filename_extension_from_url

* Formatting

* Formatting

* Update haystack/utils/import_utils.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Update haystack/utils/import_utils.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-04 10:20:58 +02:00
ZanSara
4b380d8fb0
fix: install inference in REST API tests (#5252)
* install inference in restapi tests

* add workflow dispatch to test the REST API CI in PR

* trigger ci

* tablecell

* tablecell

* revert ci trigger

* mypy
2023-07-03 15:10:14 +02:00
Vladimir Blagojevic
1066e959a2
bug: fix for pinecone not working for per document updates (#5110) 2023-07-03 14:07:52 +02:00
Stefano Fiorucci
1be39367ac
Fix: FAISSDocumentStore - make write_documents properly work in combination w update_embeddings (#5221)
* Update VERSION.txt

* first draft

* simplify method and test

* rm unnecessary pb.close

* integrate feedback
2023-07-03 10:07:36 +02:00
Massimiliano Pippi
aee862833e
Update pyproject.toml (#5244) 2023-06-30 19:44:08 +02:00
Massimiliano Pippi
6c1d0fbf04
refactor: isolate Elasticsearch client calls (#5241)
* isolate client code

* pass headers

* pass headers

* more adjustments

* revert

* revert

* leftover

* fix opensearch
2023-06-30 18:29:01 +02:00
Massimiliano Pippi
cb638af0ff
refactor: fix method type and add comments (#5235)
* fix method type and add comments

* fix tests
2023-06-30 11:55:52 +02:00
Massimiliano Pippi
037e4f24ce
refactor: add a new Document Store supporting Elasticsearch 8 (#5231)
* introduce es8

* prepare tests

* fix unit tests

* adjust tests

* install elastic_transport package

* make mypy happy

* fix opensearch tests
2023-06-29 16:40:10 +02:00
Massimiliano Pippi
d5c13aa71d
refactor: introduce a base class for ElasticsearchDocumentStore (#5228)
* introduce a base class

* forgot

* fix linting

* try

* try

* schema generation doesnt support aliasing, use the same name
2023-06-29 13:28:49 +02:00
bogdankostic
8c63e295f4
fix: Allow filtering on list fields in InMemoryDocumentStore with all operators (#5208)
* Add support for list fields

* Unskip tests
2023-06-29 12:10:39 +02:00
Massimiliano Pippi
6373e2ea66
refactor: prepare support to Elasticsearch 8 (#5226)
* make  a package

* Update haystack/document_stores/elasticsearch/es7.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* do not expose ES types from the package

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-06-29 11:06:20 +02:00
Daria Fokina
181a277f60
Update agent.yml (#5222) 2023-06-28 21:14:11 +02:00
bogdankostic
ed1bad1155
fix: Use add_isolated_node_eval of eval_batch in run_batch (#5223)
* Fix isolated node eval in eval_batch

* Add unit test
2023-06-28 16:51:23 +02:00
Vladimir Blagojevic
bc86f57715
feat: BM25 retrieval for MemoryDocumentStore (#5151) 2023-06-27 17:42:23 +02:00
Massimiliano Pippi
c068e34954
Remove deprecated param return_table_cell (#5218)
* remove deprecated param

* Update haystack/nodes/reader/table.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* try

* remove unused functions and ignore mypy error

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-06-27 16:14:29 +02:00
Stefano Fiorucci
cbc9dcfdad
add inference dependency to docker images (#5215) 2023-06-27 11:47:40 +02:00
ZanSara
462f3a5c99
feat: globally disable progress bars (#5207)
* add SilenceableTqdm and update usage

* pylint

* rename module

* add tests
2023-06-27 11:45:17 +02:00
Vladimir Blagojevic
5ee393226d
fix: Support all SageMaker HF text generation models (other than Falcon) (#5205)
* Create SageMaker base class and two implementation subclasses
---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-06-26 19:59:16 +02:00
github-actions[bot]
ea2ae1887d
Update unstable version (#5211)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2023-06-26 18:13:41 +02:00
bogdankostic
82291b56ad
fix: Send batches of query-doc pairs to inference_from_objects (#5125)
* Send batches of query-doc pairs to inference_from_objects

* Use absolute import path

* Add separate preprocessing_batch_size parameter
2023-06-26 14:26:26 +02:00
Bilge Yücel
f4e18e91a5
Change 'history' to 'memory' (#5203) 2023-06-26 12:43:06 +02:00
ZanSara
7a9cf30063
chore: remove safe_import and all usages (#5139)
* remove safe_import and all usages

* forward references

* fix additional import

* mypy

* mypy

* pylint

* forward reference

* Update haystack/document_stores/opensearch.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* fix except clause

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-06-26 12:42:43 +02:00
Vladimir Blagojevic
eb2255c0dd
Rename SageMakerInvocationLayer -> SageMakerHFTextGenerationInvocationLayer (#5204) 2023-06-26 11:03:30 +02:00
Stefano Fiorucci
25d5dedb46
Fix: FARMReader - Consider the max number of labels/answers during training (#5197)
* first draft

* improve it a bit

* unit tests

* PR review, improved tests

* PR review, improved tests 2
2023-06-26 10:14:21 +02:00
Sebastian
f1932492f1
feat: Add CohereRanker node using Cohere reranking endpoint (#5152)
* Started to add CohereRanker node

* Small refactoring of SentenceTransformersRanker node

* Started to add predict_batch method

* Simplified predict_batch code

* Added missing imports

* Undoing a change

* Fix mypy

* Adding unit tests using mocking

* Updated truncation warning message.

* Update doc strings

* Update to docs

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Updating docs to reflect PR discussion

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-23 16:46:46 +02:00
Malte Pietsch
c9179ed0eb
feat: enable LLMs hosted via AWS SageMaker in PromptNode (#5155)
* Add SageMakerInvocationLayer
---------

Co-authored-by: oryx1729 <78848855+oryx1729@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-06-23 15:33:20 +02:00
ZanSara
31664627eb
feat: hard document length limit at max_chars_check (#5191)
* implement hard cut at max_chars_check

* regenerate ids

* black

* docstring

* black
2023-06-23 12:34:19 +02:00
ZanSara
36192eca72
feat: current_datetime shaper function (#5195)
* current_datetime shaper

* explicitly add current_datetime to the functions allowed in a prompt template
2023-06-23 10:33:34 +02:00
bogdankostic
612c5cd005
chore: Remove add_tool from ToolsManager (#5192)
* Remove add_tool from ToolsManager

* Fix tests
2023-06-23 09:26:06 +02:00
Sebastian
1602f3abdd
test: Adding unit tests to Ranker (#5167)
* adding unit tests for sentence transformers ranker

* Adding more unit tests

* Remove empty line

* Undo static method

* Revert change

* Updated indentation and added match message

* Remove unneeded paranthesis
2023-06-22 15:23:23 +02:00
Michael Feil
cfd703fa3e
fix: model_tokenizer in openai text completion tokenization details (#5104)
* fix: model_tokenizer

* Update test

---------

Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2023-06-22 14:23:19 +02:00
bogdankostic
6a5fbb7118
docs: Remove transformers module from AnswerGenerator API docs (#5185) 2023-06-21 17:20:25 +02:00