Sebastian Husch Lee
22750d342c
test: Refactor some retriever tests into unit tests ( #5306 )
...
* Modify and reactivate two unit tests
* Refactor openai embedding tests into unit tests
* Update test_retriever.py
* Changing tests
2023-07-11 13:36:23 +02:00
Fanli Lin
514f93a6eb
fix: num_return_sequences should be less than num_beams, not top_k ( #5280 )
...
* formatting
* remove top_k variable
* add pytest
* add numbers
* string formatting
* fix formatting
* revert
* extend tests with assertions for num_return_sequences
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-07-11 12:20:21 +02:00
bogdankostic
b7f683bfa4
ci: Add unit test for Elasticsearch8 ( #5300 )
...
* Add job for ES8 integration tests
* Add unit test for Elasticsearch 8
* Add tests.yml
* Adapt tests.yml
* Remove added white space
* Adapt tests.yml
* Adapt tests.yml
* Add dependencies to unit test name
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Adapt unit test matrix
* Update tests.yml
* Create separate tests where necessary
* Fix skip
* Adapt tests
2023-07-10 16:03:50 +02:00
tstadel
9acb275680
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests ( #5113 )
...
* use _source on opensearch bulk requests
* fix label bulk requests
* add tests
* fix test
* apply feedback
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-07 15:12:50 +02:00
ZanSara
13bed30504
feat: batch mode for MemoryRetriever
(v2) ( #5287 )
...
* memoryretriever batch mode
* typing of output
2023-07-07 12:10:35 +02:00
ZanSara
f49bd3a12f
feat: introduce Store
protocol (v2) ( #5259 )
...
* add protocol and adapt pipeline
* review feedback & update tests
* pylint
* Update haystack/preview/document_stores/protocols.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/document_stores/memory/document_store.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* docstring of Store
* adapt memorydocumentstore
* fix tests
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-07-07 12:10:08 +02:00
Stefano Fiorucci
90ff3817e7
feat: support OpenAI-Organization
for authentication ( #5292 )
...
* add openai_organization to invocation layer, generator and retriever
* added tests
2023-07-07 12:02:21 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators ( #5291 )
...
* Add isolated node eval to BaseGenerator's run_batch
* Add unit tests
2023-07-07 10:32:43 +02:00
MichelBartels
08f1865ddd
fix: Improve robustness of get_task HF pipeline invocations ( #5284 )
...
* replace get_task method and change invocation layer order
* add test for invocation layer order
* add test documentation
* make invocation layer test more robust
* fix type annotation
* change hf timeout
* simplify timeout mock and add get_task exception cause
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-06 16:33:44 +02:00
Vladimir Blagojevic
ac412193cc
refactor: Simplify selection of Azure vs OpenAI invocation layers ( #5271 )
2023-07-06 13:23:13 +02:00
Silvano Cerza
a1a390056a
Remove requests_cache in tests ( #5285 )
2023-07-06 13:22:52 +02:00
bogdankostic
fd25106c88
test: Adapt batch size in retriever-reader benchmarks ( #5281 )
2023-07-06 10:42:34 +02:00
Sebastian Husch Lee
da2c9b4799
test: Update test/others/test_utils.py
( #5270 )
...
* Add unit test mark for appropriate tests
* Remove deepset Cloud specific tests
* Create pytest fixtures
* Reduce number of checks run for test_match_context_multi_process and test_match_context_single_process
* Increase speed of test_match_contexts_multi_process
* Revert "Remove deepset Cloud specific tests"
This reverts commit b65173665f3e873f17f3613c5fd4fa3174a6d71b.
* Continuing revert commit
* Remove unnecessary comment
* Break down bigger test into smaller tests
2023-07-05 12:00:32 +02:00
Sebastian Husch Lee
12f319b4c9
Remove deprecated return_table_cell from conftest.py ( #5264 )
2023-07-05 09:37:41 +02:00
Sebastian Husch Lee
87281b2e10
Fix to_dict and from_dict of Multilabel such that to_dict outputs a json serializable object (using Label.to_dict()) ( #5257 )
2023-07-04 12:44:11 +02:00
Malte Reimann
195077eca9
fix: import_utils fetch_archive_from_http
- improve url parsing for fetching archive from http ( #5199 )
...
* Use urlparse to get file extension for urls that contain text after the file extension such as query parameters
* Run pre-commit to fix format
* Reformat import_utils
* Document get_filename_extension_from_url
* Formatting
* Formatting
* Update haystack/utils/import_utils.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Update haystack/utils/import_utils.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-07-04 10:20:58 +02:00
Vladimir Blagojevic
1066e959a2
bug: fix for pinecone not working for per document updates ( #5110 )
2023-07-03 14:07:52 +02:00
Stefano Fiorucci
1be39367ac
Fix: FAISSDocumentStore
- make write_documents
properly work in combination w update_embeddings
( #5221 )
...
* Update VERSION.txt
* first draft
* simplify method and test
* rm unnecessary pb.close
* integrate feedback
2023-07-03 10:07:36 +02:00
Massimiliano Pippi
cb638af0ff
refactor: fix method type and add comments ( #5235 )
...
* fix method type and add comments
* fix tests
2023-06-30 11:55:52 +02:00
Massimiliano Pippi
037e4f24ce
refactor: add a new Document Store supporting Elasticsearch 8 ( #5231 )
...
* introduce es8
* prepare tests
* fix unit tests
* adjust tests
* install elastic_transport package
* make mypy happy
* fix opensearch tests
2023-06-29 16:40:10 +02:00
bogdankostic
8c63e295f4
fix: Allow filtering on list fields in InMemoryDocumentStore
with all operators ( #5208 )
...
* Add support for list fields
* Unskip tests
2023-06-29 12:10:39 +02:00
Massimiliano Pippi
6373e2ea66
refactor: prepare support to Elasticsearch 8 ( #5226 )
...
* make a package
* Update haystack/document_stores/elasticsearch/es7.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* do not expose ES types from the package
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-06-29 11:06:20 +02:00
bogdankostic
ed1bad1155
fix: Use add_isolated_node_eval
of eval_batch
in run_batch
( #5223 )
...
* Fix isolated node eval in eval_batch
* Add unit test
2023-06-28 16:51:23 +02:00
Vladimir Blagojevic
bc86f57715
feat: BM25 retrieval for MemoryDocumentStore
( #5151 )
2023-06-27 17:42:23 +02:00
Massimiliano Pippi
c068e34954
Remove deprecated param return_table_cell
( #5218 )
...
* remove deprecated param
* Update haystack/nodes/reader/table.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* try
* remove unused functions and ignore mypy error
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-06-27 16:14:29 +02:00
ZanSara
462f3a5c99
feat: globally disable progress bars ( #5207 )
...
* add SilenceableTqdm and update usage
* pylint
* rename module
* add tests
2023-06-27 11:45:17 +02:00
Vladimir Blagojevic
5ee393226d
fix: Support all SageMaker HF text generation models (other than Falcon) ( #5205 )
...
* Create SageMaker base class and two implementation subclasses
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-06-26 19:59:16 +02:00
bogdankostic
82291b56ad
fix: Send batches of query-doc pairs to inference_from_objects ( #5125 )
...
* Send batches of query-doc pairs to inference_from_objects
* Use absolute import path
* Add separate preprocessing_batch_size parameter
2023-06-26 14:26:26 +02:00
Vladimir Blagojevic
eb2255c0dd
Rename SageMakerInvocationLayer -> SageMakerHFTextGenerationInvocationLayer ( #5204 )
2023-06-26 11:03:30 +02:00
Stefano Fiorucci
25d5dedb46
Fix: FARMReader
- Consider the max number of labels/answers during training ( #5197 )
...
* first draft
* improve it a bit
* unit tests
* PR review, improved tests
* PR review, improved tests 2
2023-06-26 10:14:21 +02:00
Sebastian
f1932492f1
feat: Add CohereRanker node using Cohere reranking endpoint ( #5152 )
...
* Started to add CohereRanker node
* Small refactoring of SentenceTransformersRanker node
* Started to add predict_batch method
* Simplified predict_batch code
* Added missing imports
* Undoing a change
* Fix mypy
* Adding unit tests using mocking
* Updated truncation warning message.
* Update doc strings
* Update to docs
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Updating docs to reflect PR discussion
* Update haystack/nodes/ranker/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-23 16:46:46 +02:00
Malte Pietsch
c9179ed0eb
feat: enable LLMs hosted via AWS SageMaker in PromptNode ( #5155 )
...
* Add SageMakerInvocationLayer
---------
Co-authored-by: oryx1729 <78848855+oryx1729@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-06-23 15:33:20 +02:00
ZanSara
31664627eb
feat: hard document length limit at max_chars_check
( #5191 )
...
* implement hard cut at max_chars_check
* regenerate ids
* black
* docstring
* black
2023-06-23 12:34:19 +02:00
ZanSara
36192eca72
feat: current_datetime
shaper function ( #5195 )
...
* current_datetime shaper
* explicitly add current_datetime to the functions allowed in a prompt template
2023-06-23 10:33:34 +02:00
bogdankostic
612c5cd005
chore: Remove add_tool
from ToolsManager
( #5192 )
...
* Remove add_tool from ToolsManager
* Fix tests
2023-06-23 09:26:06 +02:00
Sebastian
1602f3abdd
test: Adding unit tests to Ranker ( #5167 )
...
* adding unit tests for sentence transformers ranker
* Adding more unit tests
* Remove empty line
* Undo static method
* Revert change
* Updated indentation and added match message
* Remove unneeded paranthesis
2023-06-22 15:23:23 +02:00
Michael Feil
cfd703fa3e
fix: model_tokenizer in openai text completion tokenization details ( #5104 )
...
* fix: model_tokenizer
* Update test
---------
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2023-06-22 14:23:19 +02:00
Stefano Fiorucci
637433841e
chore: remove deprecated Seq2SeqGenerator
and RAGenerator
( #5180 )
...
* first draft of removal
* more removals
* don't download unused models
2023-06-21 16:38:45 +02:00
Sebastian
7a140c1524
feat: add ensure token limit for direct prompting of ChatGPT ( #5166 )
...
* Add support for prompt truncation when using chatgpt if direct prompting is used
* Update tests for test token limit for prompt node
* Update warning message to be correct
* Minor cleanup
* Mark back to integration
* Update count_openai_tokens_messages to reflect changes shown in tiktoken
* Use mocking to avoid request call
* Fix test to make it comply with unit test requirements
* Move tests to respective invocation layers
* Moved fixture to one spot
2023-06-21 15:41:28 +02:00
Vladimir Blagojevic
089187ac8b
fix: Check Agent's prompt template variables and prompt resolver parameters are aligned ( #5163 )
...
* Check Agent's prompt template parameters and prompt resolver parameters are aligned
* Lower the logger warning
* Automatically append transcript if needed
* Amend flaky test
2023-06-21 14:34:41 +02:00
Bilge Yücel
6a1b6b1ae3
feat: Update ConversationalAgent ( #5065 )
...
* feat: Update ConversationalAgent
* Add Tools
* Add test
* Change default params
* fix tests
* Fix circular import error
* Update conversational-agent prompt
* Add conversational-agent-without-tools to legacy list
* Add warning to add tools to conversational agent
* Add callable tools
* Add example script
* Fix linter errors
* Update ConversationalAgent depending on the existance of tools
* Initialize the base Agent with different arguments when there's tool
* Inject memory to the prompt in both cases, update prompts accordingly
* Override the add_tools method to prevent adding tools to ConversationalAgent without tools
* Update test
* Fix linter error
* Remove unused import
* Update docstrings and api reference
* Fix imports and doc string code snippet
* docstrings update
* Update conversational.py
* Mock PromptNode
* Prevent circular import error
* Add max_steps to the ConversationalAgent
* Update resolver description
* Add prompt_template as parameter
* Change docstring
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-20 13:09:21 +03:00
Shukri
916e8452f5
feat!: simplify weaviate auth ( #5115 )
...
* feat!: simplify weaviate auth
* docs: explain param precedence
* refactor: simplify _get_embedded_options
2023-06-19 15:46:58 +02:00
Ben Heckmann
1318ac5074
feat: Optional Content Moderation for OpenAI PromptNode & OpenAIAnswerGenerator ( #5017 )
...
* #4071 implemented optional content moderation for OpenAI PromptNode
* added two simple integration tests
* improved documentation & renamed _invoke method to _execute_openai_request
* added a flag to check_openai_policy_violation that will return a full dict of all text violations and their categories
* re-implemented the tests as unit tests & without use of the OpenAI APIs
* removed unused patch
* changed check_openai_policy_violation back to only return a bool
* fixed pylint and test error
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-06-19 13:27:11 +02:00
ZanSara
f52477d31b
fix: small improvement to pipeline v2 tests ( #5153 )
...
* add missing return
* improve test
* docstring
2023-06-16 12:07:00 +02:00
Vladimir Blagojevic
8d8de65492
Add AgentToolLogger, unit test, and example usage ( #5087 )
2023-06-15 08:43:20 +02:00
bogdankostic
7731713a1e
test: Add benchmark config files ( #5093 )
...
* Add config files
* Add top-k and batch size to configs
* Add batch size to configs
* Add batch size to configs
* Remove configs using 1m docs
2023-06-14 18:15:50 +02:00
Ben Heckmann
60e5d73424
fix: changing document scores ( #5090 )
...
* #4653 fix changing scores by returning new document objects from document store queries
* added integration test for InMemoryDocumentStore demonstrating the desired behavior
* Update test/document_stores/test_memory.py
2023-06-14 17:35:46 +02:00
Julian Risch
ce1c9c9ddb
fix: Relax ChatGPT model name check to support gpt-3.5-turbo-0613 ( #5142 )
...
* relax model name checking for chatgpt
* add unit tests
2023-06-14 09:53:00 +02:00
Julian Risch
4c8e0b9d4a
fix: PromptNode falls back to empty list of documents if none are provided but expected ( #5132 )
...
* add warning, default to empty docs list, tests
* pylint
2023-06-13 16:35:19 +02:00
Silvano Cerza
3b8992968d
test: Skip flaky PromptNode test ( #5039 )
...
* Skip flaky PromptNode test
* Add skip reason
* Update test/prompt/test_prompt_node.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-06-13 16:24:29 +02:00