* Added changes from table-qa-pipeline
* Moved classes around to make diff to main look nicer.
* Cleaned things up. Removed option to return_no_answer (not needed), added docs and added integration marks.
* Remove unneeded code
* Added fix for test
* Add check for document_ids in answer
* Prevent passing of empty list to np.mean
* Batching doesn't work with TableQAPipeline b/c of HF issue
* Cleanup of table reader tests, added check for document ids.
* Fixing pylint
* More pylint
* PR comments
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Adding execution time to the debug output of pipeline components
* Linting issue fix
* [EMPTY] Re-trigger CI
* fixed test
---------
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
* Refactoring to remove duplicate code when using OpenAI API
* Adding docstrings
* Fix mypy issue
* Moved retry mechanism to openai_request function in openai_utils
* Migrate OpenAI embedding encoder to use the openai_request util function.
* Adding docstrings.
* pylint import errors
* More pylint import errors
* Move construction of headers into openai_request and api_key as input variable.
* Made _openai_text_completion_tokenization_details so can be resued in PromptNode and OpenAIAnswerGenerator
* Add prompt truncation to the PromptNode.
* Removed commented out test.
* Bump version of tiktoken to 0.2.0 so we can use MODEL_TO_ENCODING to automatically determine correct tokenizer for the requested model
* Change one method back to public
* Fixed bug in token length truncation. Included answer length into truncation amount. Moved truncation higher up to PromptNode level.
* Pylint error
* Improved warning message
* Added _ensure_token_limit for HFLocalInvocationLayer. Had to remove max_length from base PromptModelInvocationLayer to ensure that max_length has a default value.
* Adding tests
* Expanded on doc strings
* Updated tests
* Update docstrings
* Update tests, and go back to how USE_TIKTOKEN was used before.
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/retriever/_openai_encoder.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/openai_utils.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/openai_utils.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Updated docstrings, and added integration marks
* Remove comment
* Update test
* Fix test
* Update test
* Updated openai_request function to work with the azure api
* Fixed error in _openai_encodery.py
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* mock all translator tests and move one to e2e
* typo
* extract pipeline tests using translator
* remove duplicate test
* move generator test in e2e
* Update e2e/pipelines/test_extractive_qa.py
* pytest.mark.unit
* black
* remove model name as well
* remove unused fixture
* rename original and improve pipeline tests
* fixes
* pylint
* Added new test for using EntityExtractor in query node and made some fixtures to reduce code duplication.
* Reuse ner_node fixture
* Added pytest unit markings and swapped over to in memory doc store.
* Change to integration tests
* initial Agent implementation
* mypy and pylint fixes
* add missing ABC import
* improved prompt template
* refactor and shorten run method
* refactor and shorten run method
* add tests for extracting
* fix mixed up tool_input/observation & make tests more robust
* fix bug with max_iterations and update prompt template
* allow setting prompt_template in Agent init
* remove example yml for agent
* add final prediction to transcript
* add transcript to errors and accept PromptTemplate in init
* simplify if else to elif
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* add checks for max_iter<2 and empty list returned by prompt node
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Add JsonConverter node
* Update language
* JsonConverter: Remove id_hash_keys overwrite when it's None
Also, changes in docstring based on review
* Update docstring for JsonConverter
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
* Starting to implement first pass at run_batch
* Started to add _flatten_input function
* First pass at run_batch method.
* Fixed bug
* Adding tests for run_batch
* Update doc strings
* Pylint and mypy
* Pylint
* Fixing mypy
* Restructurig of run_batch tests
* Add minor lg updates
* Adding more tests
* Update dev comments and call static method differently
* Fixed the setting of output variable
* Set output_variable in __init__ of PromptNode
* Make a one-liner
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* Add IVF and Product Quantization support for OpenSearchDocumentStore
* Remove unused import statement
* Fix mypy
* Adapt doc strings and error messages to account for PQ
* Adapt validation of indices
* Adapt existing tests
* Fix pylint
* Add tests
* Update lg
* Adapt based on PR review comments
* Fix Pylint
* Adapt based on PR review
* Add request_timeout
* Adapt based on PR review
* Adapt based on PR review
* Adapt tests
* Pin tenacity
* Unpin tenacity
* Adapt based on PR comments
* Add match to tests
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* add e2e tests
* move tests to their own module
* add e2e workflow
* pylint
* remove from job
* fix index field name
* skip test on sql
* removed unused code
* fix embedding tests
* adjust test for pinecone
* adjust assertions to the new documents
* bad copypasta
* test
* fix tests
* fix tests
* fix test
* fix tests
* pylint
* update milvus version
* remove debug
* move graphdb tests under e2e
* added instruction_prompt and update defaults
* Change back max_tokens
* Code formatting
* Starting to update instruction_prompt to be a PromptTemplate
* Using PromptTemplate in OpenAIAnswerGenerator
* Removed hardcoded value
* pylint and make examples and examples_context optional prompt parameters
* Added new test for when prompt length goes past max token limit
* Improve doc strings.
* Make "text-davinci-003" the new default model
* Renaming variable to prompt_template and name to question-answering-with-examples
* Reduced repetitive code.
* Added some comments to explain key logic for future debuggers
* Update docs for max_tokens and increase defaul
* Updating variable name to prompt_template and docs.
* Updated test and handled Answer case where no documents are used.
* Slight update to docs.
* Adding more doc strings
* lg updates
* Blackify
---------
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* Deduplicate same Documents in one MultiLabel
* Add tests
* Update label
* Update label
* Update test
* Update test
* Revert change to check CI
* Revert reversion
* Use deepcopy
* Update tests
* fix: update kwargs for TriAdaptiveModel
* fix: squeeze batch for TTR inference
* test: add test for ttr + dataframe case
* test: update and reorganise ttr tests
* refactor: make triadaptive model handle shapes
* refactor: remove duplicate reshaping
* refactor: rename test with duplicate name
* fix: add device assignment back to TTR
* fix: remove duplicated vars in test
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Removed double batching around embed_queries
* Add back tests for retrieve_batch for dpr and embedding retrievers
* Updated table-text-retriever to not double batch
* Fixing pylint
* Update to test
* Remove code breaking test
* Updating dev comment to be clearer