* chaning default model to gpt-4o-mini
* adding release notes
* fixing some missed tests
* fixing some more missed tests
* fixing one last missed test
* fixing linting issues
* making pylint happy about an end2end test
* chaning if test to walruss operator
* fixing azure embedder from ada to text-embedding-ada-002
* initial import
* wip
* cleaning up tests
* fixing tests
* adding context relevance
* reverting some wrong changes to due PyCharm error in refactoring
* building eval pipeline only once
* handling mypy issues
* rename model parameter and internam model attribute in ExtractiveReader
* fix tests for ExtractiveReader
* fix e2e
* reno
* another fix
* review feedback
* Update releasenotes/notes/rename-model-param-reader-b8cbb0d638e3b8c2.yaml
* rename model parameter in the openai doc embedder
* fix tests for openai doc embedder
* rename model parameter in the openai text embedder
* fix tests for openai text embedder
* rename model parameter in the st doc embedder
* fix tests for st doc embedder
* rename model parameter in the st backend
* fix tests for st backend
* rename model parameter in the st text embedder
* fix tests for st text embedder
* fix docstring
* fix pipeline utils
* fix e2e
* reno
* fix the indexing pipeline _create_embedder function
* fix e2e eval rag pipeline
* pytest
* feat: Add `NamedEntityExtractor`component
This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key.
The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy.
* Address comments
* Expand release note
* Add the `[torch]` extra package specifier to the lazy import
* Remove dead code
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* replace metadata w meta in tests/examples
* do not touch already broken e2e tests
* Revert "do not touch already broken e2e tests"
This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.
* Add Pipeline.arun()
* Sleeper node
* Fix async running
* Add e2e tests
To run a Pipeline that doesn't have any async node in async mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_query_and_indexing_pipeline
To run a Pipeline that has a single async node in concurrent mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_async_concurrent_complex_pipeline
To run a Pipeline that has a single async node in sequential mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_async_sequential_complex_pipeline
* Remove unused _adispatch_run method
* Make Pipeline.run work with async nodes
* Revert "Make Pipeline.run work with async nodes"
This reverts commit 22d7a94e4d41aca1b59dad18c0b366fbb6e8f431.
* Rename Pipeline.arun to Pipeline._arun
* Enhance docstring
* Add Sleeper docstring
* Add release notes
* ignore typing across the node
* make pylint happy
* skip pylint on needed unused import
* fix
* if a node has an arun method, use it
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* enable pipeline filder in e2e
* merge standard pipeline tests with stanrdard pipeline batch tests
* merge summarization tests into standard pipelines tests
* Update test_standard_pipelines.py
* black
* Initial commit, add search_engine
* Add TopPSampler
* Add more TopPSampler unit tests
* Remove SearchEngineSampler (converted to TopPSampler)
* Add some basic WebSearch unit tests
* Rename unit tests
* Add WebRetriever into agent_tools
* Adjust to WebRetriever
* Add WebRetriever mode [snippet|document]
* Minor changes
* SerperDev: add peopleAlsoAsk search results
* First agent for hotpotqa
* Making WebRetriever work on hotpotqa
* refactor: minor WebRetriever improvements (#4377)
* refactor: remove doc ids rebuild + antecipate cache
* refactor: improve caching, fix Document ids
* Minor WebRetriever improvements
* Overlooked minor fixes
* feat: add Bing API as search engine
* refactor: let kwargs pass-through
* feat: increase search context
* check sampler result, improve batch typing
* refactor: increase mypy compliance
* Initial commit, add search_engine
* Add TopPSampler
* Add more TopPSampler unit tests
* Remove SearchEngineSampler (converted to TopPSampler)
* Add some basic WebSearch unit tests
* Rename unit tests
* Add WebRetriever into agent_tools
* Adjust to WebRetriever
* Add WebRetriever mode [snippet|document]
* Minor changes
* SerperDev: add peopleAlsoAsk search results
* First agent for hotpotqa
* Making WebRetriever work on hotpotqa
* refactor: minor WebRetriever improvements (#4377)
* refactor: remove doc ids rebuild + antecipate cache
* refactor: improve caching, fix Document ids
* Minor WebRetriever improvements
* Overlooked minor fixes
* feat: add Bing API as search engine
* refactor: let kwargs pass-through
* feat: increase search context
* check sampler result, improve batch typing
* refactor: increase mypy compliance
* Fix mypy
* Minor example fixes
* Fix the descriptions
* PR feedback updates
* More fixes
* TopPSampler: handle top p None value, add unit test
* Add top_k to WebSearch
* Use boilerpy3 instead trafilatura
* Remove date finding
* Add more WebRetriever docs
* Refactor long methods
* making the preprocessor optional
* hide WebSearch and make NeuralWebSearch a pipeline
* remove unused imports
* add WebQAPipeline and split example into two
* change example search engine to SerperDev
* Turn off progress bars in WebRetriever's PreProcesssor
* Agent tool examples - final updates
* Add webqa test, search results ranking scores
* Better answer box handling for SerperDev and SerpAPI
* Minor fixes
* pylint
* pylint fixes
* extract TopPSampler from WebRetriever
* use sampler only for WebRetriever modes other than snippet
* add web retriever tests
* add web retriever tests
* exclude rdflib@6.3.2 due to license issues
* add test for preprocessed docs and kwargs examples in docstrings
* Move test_webqa_pipeline to test/pipelines
* change docstring for join_documents_and_scores
* Use WebQAPipeline in examples/web_lfqa.py
* Use WebQAPipeline in examples/web_lfqa.py
* Move test_webqa_pipeline to e2e
* Updated lg
* Sampler added automatically in WebQAPipeline, no need to add it
* Updated lg
* Updated lg
* :ignore Update agent tools examples to new templates (#4503)
* Update examples to new templates
* Add print back
* fix linting and black format issues
---------
Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* mock all translator tests and move one to e2e
* typo
* extract pipeline tests using translator
* remove duplicate test
* move generator test in e2e
* Update e2e/pipelines/test_extractive_qa.py
* pytest.mark.unit
* black
* remove model name as well
* remove unused fixture
* rename original and improve pipeline tests
* fixes
* pylint