* Embedding instructions in EmbeddingRetriever
Query and documents embeddings are prefixed with instructions, useful
for retrievers finetuned on specific tasks, such as Q&A.
* Tests
Checking vectors 0th component vs. reference, using different stores.
* Normalizing vectors
* Release notes
* Add max_tokens to BaseGenerator params
* Make mypy happy
* Rebase and resolve conflicts
* Fix signature issues
* Update lg
* Add a mocked unit test method
* end-of-file-fixer corrected file
* Convert to unit test
* Mark test as integration
* make the test unit
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* extract elasticsearch
* update pyproject.toml
* make more import optional
* move MockBaseRetriever in conftest
* install es in the es integration tests
* Starting adding support for TableCell
* Update tests to use row and col
* Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables.
* Update eval test to use TableCell
* Added more schema tests for table docs, labels and answers.
* Add boolean to toggle between Span and TableCell
* Add deprecation message
* Test that table answers work as responses in the rest API
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* Add pytest fixture to block requests in unit tests
* Mark test correctly as integration
* Fix crawler unit test failing cause it tries to install chromedriver
* Rework some PromptNode and PromptModel tests
* Remove duplicate code in PromptNode
* Fix mypy
* Fix test cause of missing fixture
* Revert "Fix mypy"
This reverts commit e530295a06cb260d9a8bd89679534958cb3d9776.
* Revert "Remove duplicate code in PromptNode"
This reverts commit 4a678ae81504dcc78a737372c061d12dc8799639.
* Initial commit, add search_engine
* Add TopPSampler
* Add more TopPSampler unit tests
* Remove SearchEngineSampler (converted to TopPSampler)
* Add some basic WebSearch unit tests
* Rename unit tests
* Add WebRetriever into agent_tools
* Adjust to WebRetriever
* Add WebRetriever mode [snippet|document]
* Minor changes
* SerperDev: add peopleAlsoAsk search results
* First agent for hotpotqa
* Making WebRetriever work on hotpotqa
* refactor: minor WebRetriever improvements (#4377)
* refactor: remove doc ids rebuild + antecipate cache
* refactor: improve caching, fix Document ids
* Minor WebRetriever improvements
* Overlooked minor fixes
* feat: add Bing API as search engine
* refactor: let kwargs pass-through
* feat: increase search context
* check sampler result, improve batch typing
* refactor: increase mypy compliance
* Initial commit, add search_engine
* Add TopPSampler
* Add more TopPSampler unit tests
* Remove SearchEngineSampler (converted to TopPSampler)
* Add some basic WebSearch unit tests
* Rename unit tests
* Add WebRetriever into agent_tools
* Adjust to WebRetriever
* Add WebRetriever mode [snippet|document]
* Minor changes
* SerperDev: add peopleAlsoAsk search results
* First agent for hotpotqa
* Making WebRetriever work on hotpotqa
* refactor: minor WebRetriever improvements (#4377)
* refactor: remove doc ids rebuild + antecipate cache
* refactor: improve caching, fix Document ids
* Minor WebRetriever improvements
* Overlooked minor fixes
* feat: add Bing API as search engine
* refactor: let kwargs pass-through
* feat: increase search context
* check sampler result, improve batch typing
* refactor: increase mypy compliance
* Fix mypy
* Minor example fixes
* Fix the descriptions
* PR feedback updates
* More fixes
* TopPSampler: handle top p None value, add unit test
* Add top_k to WebSearch
* Use boilerpy3 instead trafilatura
* Remove date finding
* Add more WebRetriever docs
* Refactor long methods
* making the preprocessor optional
* hide WebSearch and make NeuralWebSearch a pipeline
* remove unused imports
* add WebQAPipeline and split example into two
* change example search engine to SerperDev
* Turn off progress bars in WebRetriever's PreProcesssor
* Agent tool examples - final updates
* Add webqa test, search results ranking scores
* Better answer box handling for SerperDev and SerpAPI
* Minor fixes
* pylint
* pylint fixes
* extract TopPSampler from WebRetriever
* use sampler only for WebRetriever modes other than snippet
* add web retriever tests
* add web retriever tests
* exclude rdflib@6.3.2 due to license issues
* add test for preprocessed docs and kwargs examples in docstrings
* Move test_webqa_pipeline to test/pipelines
* change docstring for join_documents_and_scores
* Use WebQAPipeline in examples/web_lfqa.py
* Use WebQAPipeline in examples/web_lfqa.py
* Move test_webqa_pipeline to e2e
* Updated lg
* Sampler added automatically in WebQAPipeline, no need to add it
* Updated lg
* Updated lg
* :ignore Update agent tools examples to new templates (#4503)
* Update examples to new templates
* Add print back
* fix linting and black format issues
---------
Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* mock all translator tests and move one to e2e
* typo
* extract pipeline tests using translator
* remove duplicate test
* move generator test in e2e
* Update e2e/pipelines/test_extractive_qa.py
* pytest.mark.unit
* black
* remove model name as well
* remove unused fixture
* rename original and improve pipeline tests
* fixes
* pylint
* initial Agent implementation
* mypy and pylint fixes
* add missing ABC import
* improved prompt template
* refactor and shorten run method
* refactor and shorten run method
* add tests for extracting
* fix mixed up tool_input/observation & make tests more robust
* fix bug with max_iterations and update prompt template
* allow setting prompt_template in Agent init
* remove example yml for agent
* add final prediction to transcript
* add transcript to errors and accept PromptTemplate in init
* simplify if else to elif
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* add checks for max_iter<2 and empty list returned by prompt node
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* add e2e tests
* move tests to their own module
* add e2e workflow
* pylint
* remove from job
* fix index field name
* skip test on sql
* removed unused code
* fix embedding tests
* adjust test for pinecone
* adjust assertions to the new documents
* bad copypasta
* test
* fix tests
* fix tests
* fix test
* fix tests
* pylint
* update milvus version
* remove debug
* move graphdb tests under e2e