* Add step to loook up tokenizers by prefix in openai_utils
* Updated tiktoken min version + openai_utils test
* Added test case for GPT-4 and Azure model naming
* Broken down tests
* Added default case
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
* add conversion script
* run job in CI
* typo
* invoke python
* install toml
* fix pylint error
* more exclusions
* add toml to dev dependencies
* fix exclusions list
* fix mypy and remove test clause
* Upgrade to transformers 4.28.1
* Commenting out failing piece of test
* trailing-whitespace
* Adjust regex for error match - it changed between releases
* Remove RAG tests failing with transformers update
* extract elasticsearch
* update pyproject.toml
* make more import optional
* move MockBaseRetriever in conftest
* install es in the es integration tests
* clean up the ES instance in a more robust way
* do not sleep, refresh the index instead
* remove client warnings
* fix unit tests
* fix opensearch compatibility
* fix unit tests
* update ES version
* bump elasticsearch-py
* adjust docs
* use recreate_index param
* use same fixture strategy for Opensearch
* Update lg
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* isolate file-conversion deps
* pylint
* add to all extra
* chain was missing
* move langdetect into preprocessing and fix tika
* add file-conversion extra
* Initial commit, add search_engine
* Add TopPSampler
* Add more TopPSampler unit tests
* Remove SearchEngineSampler (converted to TopPSampler)
* Add some basic WebSearch unit tests
* Rename unit tests
* Add WebRetriever into agent_tools
* Adjust to WebRetriever
* Add WebRetriever mode [snippet|document]
* Minor changes
* SerperDev: add peopleAlsoAsk search results
* First agent for hotpotqa
* Making WebRetriever work on hotpotqa
* refactor: minor WebRetriever improvements (#4377)
* refactor: remove doc ids rebuild + antecipate cache
* refactor: improve caching, fix Document ids
* Minor WebRetriever improvements
* Overlooked minor fixes
* feat: add Bing API as search engine
* refactor: let kwargs pass-through
* feat: increase search context
* check sampler result, improve batch typing
* refactor: increase mypy compliance
* Initial commit, add search_engine
* Add TopPSampler
* Add more TopPSampler unit tests
* Remove SearchEngineSampler (converted to TopPSampler)
* Add some basic WebSearch unit tests
* Rename unit tests
* Add WebRetriever into agent_tools
* Adjust to WebRetriever
* Add WebRetriever mode [snippet|document]
* Minor changes
* SerperDev: add peopleAlsoAsk search results
* First agent for hotpotqa
* Making WebRetriever work on hotpotqa
* refactor: minor WebRetriever improvements (#4377)
* refactor: remove doc ids rebuild + antecipate cache
* refactor: improve caching, fix Document ids
* Minor WebRetriever improvements
* Overlooked minor fixes
* feat: add Bing API as search engine
* refactor: let kwargs pass-through
* feat: increase search context
* check sampler result, improve batch typing
* refactor: increase mypy compliance
* Fix mypy
* Minor example fixes
* Fix the descriptions
* PR feedback updates
* More fixes
* TopPSampler: handle top p None value, add unit test
* Add top_k to WebSearch
* Use boilerpy3 instead trafilatura
* Remove date finding
* Add more WebRetriever docs
* Refactor long methods
* making the preprocessor optional
* hide WebSearch and make NeuralWebSearch a pipeline
* remove unused imports
* add WebQAPipeline and split example into two
* change example search engine to SerperDev
* Turn off progress bars in WebRetriever's PreProcesssor
* Agent tool examples - final updates
* Add webqa test, search results ranking scores
* Better answer box handling for SerperDev and SerpAPI
* Minor fixes
* pylint
* pylint fixes
* extract TopPSampler from WebRetriever
* use sampler only for WebRetriever modes other than snippet
* add web retriever tests
* add web retriever tests
* exclude rdflib@6.3.2 due to license issues
* add test for preprocessed docs and kwargs examples in docstrings
* Move test_webqa_pipeline to test/pipelines
* change docstring for join_documents_and_scores
* Use WebQAPipeline in examples/web_lfqa.py
* Use WebQAPipeline in examples/web_lfqa.py
* Move test_webqa_pipeline to e2e
* Updated lg
* Sampler added automatically in WebQAPipeline, no need to add it
* Updated lg
* Updated lg
* :ignore Update agent tools examples to new templates (#4503)
* Update examples to new templates
* Add print back
* fix linting and black format issues
---------
Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* add import for canals
* add stores support to canals
* pyproject.toml
* move tests
* add v2 to the extras in ci
* install v2 in action
* pylint
* save and load
* save and load
* codename "Alfalfa"
* workflows
* Refactoring to remove duplicate code when using OpenAI API
* Adding docstrings
* Fix mypy issue
* Moved retry mechanism to openai_request function in openai_utils
* Migrate OpenAI embedding encoder to use the openai_request util function.
* Adding docstrings.
* pylint import errors
* More pylint import errors
* Move construction of headers into openai_request and api_key as input variable.
* Made _openai_text_completion_tokenization_details so can be resued in PromptNode and OpenAIAnswerGenerator
* Add prompt truncation to the PromptNode.
* Removed commented out test.
* Bump version of tiktoken to 0.2.0 so we can use MODEL_TO_ENCODING to automatically determine correct tokenizer for the requested model
* Change one method back to public
* Fixed bug in token length truncation. Included answer length into truncation amount. Moved truncation higher up to PromptNode level.
* Pylint error
* Improved warning message
* Added _ensure_token_limit for HFLocalInvocationLayer. Had to remove max_length from base PromptModelInvocationLayer to ensure that max_length has a default value.
* Adding tests
* Expanded on doc strings
* Updated tests
* Update docstrings
* Update tests, and go back to how USE_TIKTOKEN was used before.
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/retriever/_openai_encoder.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/openai_utils.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/openai_utils.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Updated docstrings, and added integration marks
* Remove comment
* Update test
* Fix test
* Update test
* Updated openai_request function to work with the azure api
* Fixed error in _openai_encodery.py
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Add IVF and Product Quantization support for OpenSearchDocumentStore
* Remove unused import statement
* Fix mypy
* Adapt doc strings and error messages to account for PQ
* Adapt validation of indices
* Adapt existing tests
* Fix pylint
* Add tests
* Update lg
* Adapt based on PR review comments
* Fix Pylint
* Adapt based on PR review
* Add request_timeout
* Adapt based on PR review
* Adapt based on PR review
* Adapt tests
* Pin tenacity
* Unpin tenacity
* Adapt based on PR comments
* Add match to tests
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
* Adding the ability to call the Ray pipeline from concurrent apps with async
This is to fix#2968
* Fixes: mype + pylint (`invalid-overridden-method`)
* Simplifying - no real need for an `AsyncRayPipeline` anymore
* Moving the new `run_async` method to the `RayPipeline`
* Cleanup
* [EMPTY] Re-trigger CI
* feat: add HA support for Weaviate
Adding the `replicationConfig => factor` parameter to the Weaviate class at the time of class creation, allowing the user to have Haystack create a Weaviate "Class" with a replication factor set above 1.
This enables the use of Weaviate in a HA (High Availability) fashion, where the created class is stored on multiple Weaviate nodes increasing Weaviate's throughput and also ensuring high availability.
* Trying out a recommendation from @masci to fix the CI issue
* enable logging-fstring-interpolation
* remove logging-fstring-interpolation from exclusion list
* remove implicit string interpolations added by black
* remove from rest_api too
* fix % sign