* added instruction_prompt and update defaults
* Change back max_tokens
* Code formatting
* Starting to update instruction_prompt to be a PromptTemplate
* Using PromptTemplate in OpenAIAnswerGenerator
* Removed hardcoded value
* pylint and make examples and examples_context optional prompt parameters
* Added new test for when prompt length goes past max token limit
* Improve doc strings.
* Make "text-davinci-003" the new default model
* Renaming variable to prompt_template and name to question-answering-with-examples
* Reduced repetitive code.
* Added some comments to explain key logic for future debuggers
* Update docs for max_tokens and increase defaul
* Updating variable name to prompt_template and docs.
* Updated test and handled Answer case where no documents are used.
* Slight update to docs.
* Adding more doc strings
* lg updates
* Blackify
---------
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* fix: update kwargs for TriAdaptiveModel
* fix: squeeze batch for TTR inference
* test: add test for ttr + dataframe case
* test: update and reorganise ttr tests
* refactor: make triadaptive model handle shapes
* refactor: remove duplicate reshaping
* refactor: rename test with duplicate name
* fix: add device assignment back to TTR
* fix: remove duplicated vars in test
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Removed double batching around embed_queries
* Add back tests for retrieve_batch for dpr and embedding retrievers
* Updated table-text-retriever to not double batch
* Fixing pylint
* Update to test
* Remove code breaking test
* Updating dev comment to be clearer
* Update allowed models to be used with Prompt Node
* Added try except block around the config to skip over OpenAI models.
* Fixing tests
* Adding warning message
* Adding test for different HF models that could be used in prompt node
* fix: Add a verbose option to PromptNode to let users understand the prompts being used #2
* Add comments and refactoring todo note
* Fix logging-fstring-interpolation pylint
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* feat: add start and eng page to PDF converters
* docs: add missing docstrings
* refactor: change list set up, add docstrings and comment
* fix: add missing parameter
* tests: add page range basic test
* tests: test correct page numbers
* tests: remove OCR page range test
*Poppler and Tesseract not installed on CI
* fix: remove mobile change error
* refactor: use weaviate client to build BM25 query
* refactor: remove manual BM25 query building
* refactor: apply BM25 to the content_field only
* test: update weaviate BM25 retrieval test case
update to account for lack of stemming
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
* fix to removing code blocks in markdown converter
* adding a test
* fixing a test
* improving tests
* adding language to code block
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
* fix crawler and try to run CI
* more compact expression
* try to fix
* improve naming regex
* revert regex
* make test_url compatible wirh Windows
* better conditional expression
* Adding model.eval() calls to prediction functions in table reader
* Add unit test to check if model is set in train mode that inference time prediction still works.
* Add table = table.astype(str) to make sure cells are converted into to strings to be compatible witht the TableReader
* Turn more strings into ints
* Make sure answer text is always a string.
* Started making changes to use native Pytorch AMP
* Updated compute_loss functions to use torch.cuda.amp.autocast
* Updating docstrings
* Add use_amp to trainer_checkpoint
* Removed mentions of apex and started to add the necessary warnings
* Removing unused instances of use_amp variable
* Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train
* Make max_query_length optional in FARMReader.train
* Update lg
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* Refactor table reader to use util functions to reduce code duplication.
* Expanding the tests for the table reader
* Adding types
* Updating tests to work for RCIReader
* Fix bug in RCIReader. Saving the wrong queries list.
* Update _flatten_inputs to not change input variable
* Remove duplicate code
* Fixing broken BM25 support with Weaviate - fixes#3720
Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit.
Please see more under issue #3720.
* Fixing mypy issue - method signature wasn't matching the base class
* Mypy related test fix
Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime.
I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests.
* Adding a note regarding an upcomming fix in Weaviate v1.17.0
* Apply suggestions from code review
* revert
* [EMPTY] Re-trigger CI
* first draft to add index param to tfidf
* better mypy handling
* Revert "better mypy handling"
This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c.
* new check in auto_fit
* new check also in retrieve
* better dict typings
* new test and improvements to other test
* remove unnecessary lambda
* improve test
* remove newline from openapi json
* fix test
* language fix
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 2
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 3
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 4
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 5
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 6
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* explicit index value handling
* fix test
* better error messages
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>