* Add JsonConverter node
* Update language
* JsonConverter: Remove id_hash_keys overwrite when it's None
Also, changes in docstring based on review
* Update docstring for JsonConverter
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
* Starting to implement first pass at run_batch
* Started to add _flatten_input function
* First pass at run_batch method.
* Fixed bug
* Adding tests for run_batch
* Update doc strings
* Pylint and mypy
* Pylint
* Fixing mypy
* Restructurig of run_batch tests
* Add minor lg updates
* Adding more tests
* Update dev comments and call static method differently
* Fixed the setting of output variable
* Set output_variable in __init__ of PromptNode
* Make a one-liner
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* added instruction_prompt and update defaults
* Change back max_tokens
* Code formatting
* Starting to update instruction_prompt to be a PromptTemplate
* Using PromptTemplate in OpenAIAnswerGenerator
* Removed hardcoded value
* pylint and make examples and examples_context optional prompt parameters
* Added new test for when prompt length goes past max token limit
* Improve doc strings.
* Make "text-davinci-003" the new default model
* Renaming variable to prompt_template and name to question-answering-with-examples
* Reduced repetitive code.
* Added some comments to explain key logic for future debuggers
* Update docs for max_tokens and increase defaul
* Updating variable name to prompt_template and docs.
* Updated test and handled Answer case where no documents are used.
* Slight update to docs.
* Adding more doc strings
* lg updates
* Blackify
---------
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* fix: update kwargs for TriAdaptiveModel
* fix: squeeze batch for TTR inference
* test: add test for ttr + dataframe case
* test: update and reorganise ttr tests
* refactor: make triadaptive model handle shapes
* refactor: remove duplicate reshaping
* refactor: rename test with duplicate name
* fix: add device assignment back to TTR
* fix: remove duplicated vars in test
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Removed double batching around embed_queries
* Add back tests for retrieve_batch for dpr and embedding retrievers
* Updated table-text-retriever to not double batch
* Fixing pylint
* Update to test
* Remove code breaking test
* Updating dev comment to be clearer
* Update allowed models to be used with Prompt Node
* Added try except block around the config to skip over OpenAI models.
* Fixing tests
* Adding warning message
* Adding test for different HF models that could be used in prompt node
* fix: Add a verbose option to PromptNode to let users understand the prompts being used #2
* Add comments and refactoring todo note
* Fix logging-fstring-interpolation pylint
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* feat: add start and eng page to PDF converters
* docs: add missing docstrings
* refactor: change list set up, add docstrings and comment
* fix: add missing parameter
* tests: add page range basic test
* tests: test correct page numbers
* tests: remove OCR page range test
*Poppler and Tesseract not installed on CI
* fix: remove mobile change error
* refactor: use weaviate client to build BM25 query
* refactor: remove manual BM25 query building
* refactor: apply BM25 to the content_field only
* test: update weaviate BM25 retrieval test case
update to account for lack of stemming
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
* fix to removing code blocks in markdown converter
* adding a test
* fixing a test
* improving tests
* adding language to code block
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
* fix crawler and try to run CI
* more compact expression
* try to fix
* improve naming regex
* revert regex
* make test_url compatible wirh Windows
* better conditional expression
* Adding model.eval() calls to prediction functions in table reader
* Add unit test to check if model is set in train mode that inference time prediction still works.
* Add table = table.astype(str) to make sure cells are converted into to strings to be compatible witht the TableReader
* Turn more strings into ints
* Make sure answer text is always a string.
* Started making changes to use native Pytorch AMP
* Updated compute_loss functions to use torch.cuda.amp.autocast
* Updating docstrings
* Add use_amp to trainer_checkpoint
* Removed mentions of apex and started to add the necessary warnings
* Removing unused instances of use_amp variable
* Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train
* Make max_query_length optional in FARMReader.train
* Update lg
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>