880 Commits

Author SHA1 Message Date
Vladimir Blagojevic
1dd6158244
fix: Add model_max_length model_kwargs parameter to HF PromptNode (#4651) 2023-04-14 15:40:42 +02:00
ZanSara
174d80ab41
skip tests (#4654) 2023-04-13 17:56:51 +02:00
Vladimir Blagojevic
e30bc8fe5a
feat: Add GenerationConfig option to PromptNode's HuggingFace invocation layer (#4649) 2023-04-13 12:15:00 +02:00
ZanSara
f2106ab37b
feat: initial implementation of MemoryDocumentStore for new Pipelines (#4447)
* add stub implementation

* reimplementation

* test files

* docstore tests

* tests for document

* better testing

* remove mmh3

* readme

* only store, no retrieval yet

* linting

* review feedback

* initial filters implementation

* working on filters

* linters

* filtering works and is isolated by document store

* simplify filters

* comments

* improve filters matching code

* review feedback

* pylint

* move logic into_create_id

* mypy
2023-04-13 09:36:23 +02:00
ZanSara
ba11d1c2a8
refactor!: extract evaluation and statistical dependencies (#4457)
* try-catch sklearn and scipy

* haystack imports

* linting

* mypy

* try to import baseretriever

* remove typing

* unused import

* remove more typing

* pylint

* isolate sql imports for postgres, which we don't use anyway

* remove stats

* replace expit

* als inmemory

* mypy

* feedback

* docker

* expit

* re-add njit
2023-04-12 15:38:56 +02:00
Fernando Pereira
5d41e60d89
fix: ParsrConverter list element added (#4562)
* fix: list element and mapping logic around it added to ParsrConverter convert step + unit test covering the specific mapping of list content from Parsr's to Haystack's

* Code review changes

* changed the samples path after conftest changes

* added samples_path to function arg

---------

Co-authored-by: Namoush <fmpereira22@gmail.com>
Co-authored-by: Fernando Pereira <fernando.pereira@criticalsoftware.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-04-12 18:38:21 +05:30
Silvano Cerza
5baf2f5930
refactor: Rework invocation layers (#4615)
* Move invocation layers into separate package

* Fix circular imports

* Fix import
2023-04-11 11:04:29 +02:00
Ben Heckmann
2d65742443
feat: arbitrary crawler_depth for Crawler class (#4623)
* #3674 implemented iterative crawler depth

* #3674 added two tests for increased crawler depth

* removed old comment
2023-04-11 10:39:17 +02:00
Silvano Cerza
5547e85bd5
feat: Add util method to make HTTP requests with configurable retry (#4627)
* Add util method to make HTTP requests with configurable retry

* Fix pylint

* Remove unnecessary optional parameter
2023-04-11 10:35:39 +02:00
Silvano Cerza
5ac3dffbef
test: Rework conftest (#4614)
* Split root conftest into multiple ones and remove unused fixtures

* Remove some constants and make them fixtures

* Remove unnecessary fixture scoping

* Fix failing whisper tests

* Fix image_file_paths fixture
2023-04-11 10:33:43 +02:00
Silvano Cerza
e85dc79eaa
test: Add pytest fixture to block requests in unit tests (#4433)
* Add pytest fixture to block requests in unit tests

* Mark test correctly as integration

* Fix crawler unit test failing cause it tries to install chromedriver
2023-04-06 18:04:57 +02:00
Silvano Cerza
c3abf73332
refactor: Rework prompt tests (#4600)
* Rework some PromptNode and PromptModel tests

* Remove duplicate code in PromptNode

* Fix mypy

* Fix test cause of missing fixture

* Revert "Fix mypy"

This reverts commit e530295a06cb260d9a8bd89679534958cb3d9776.

* Revert "Remove duplicate code in PromptNode"

This reverts commit 4a678ae81504dcc78a737372c061d12dc8799639.
2023-04-06 14:47:44 +02:00
Vladimir Blagojevic
a8d283cfac
Fix HF stop words (single stop word) (#4584) 2023-04-04 14:45:10 +02:00
Silvano Cerza
1cc4c9c651
refactor: Refactor prompt node (#4580)
* Refactor prompt structure

* Refactor prompt tests structure

* Fix pylint

* Move TestPromptTemplateSyntax to test_prompt_template.py
2023-04-03 11:49:49 +02:00
Silvano Cerza
af02803cce
Skip flaky prompt node integration test (#4572) 2023-04-03 09:49:30 +02:00
Julian Risch
57415ef8ab
test: Remove duplicate test and edit docstring (#4567) 2023-03-31 12:39:18 +02:00
Agnieszka Marzec
815dcdebbd
docs: Update PromptNode API docs (#4549)
* Update docstrings

* adapt test to changed logging message

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-30 14:27:44 +02:00
Stefano Fiorucci
57f87e24a3
refactor: OpenAIAnswerGenerator - avoid tokenizing all documents several times (#4504) 2023-03-29 22:38:27 +02:00
Zoltan Fedor
32091d66cb
Adding filtering support for Weaviate when used for BM25 querying (#4385) 2023-03-29 16:51:22 +02:00
Vladimir Blagojevic
7c9f719496
refactor: Adjust WhisperTranscriber to pipeline run methods (#4510)
* Retrofit WhisperTranscriber run methods
* Add pipeline unit test
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-03-28 13:52:21 +02:00
Silvano Cerza
cfb8dfd470
Fix pipeline config and agent tools hashing for telemetry (#4508) 2023-03-28 09:41:50 +02:00
bogdankostic
ed1837c0c9
feat: Deduplicate duplicate Answers resulting from overlapping Documents in FARMReader (#4470)
* Deduplicate answers resulting from document split overlap

* Add tests

* Fix Pylint

* Adapt existing test

* Incorporate PR feedback
2023-03-27 20:04:59 +02:00
Vladimir Blagojevic
be25655663
feat: Add agent tools (#4437)
* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Fix mypy

* Minor example fixes

* Fix the descriptions

* PR feedback updates

* More fixes

* TopPSampler: handle top p None value, add unit test

* Add top_k to WebSearch

* Use boilerpy3 instead trafilatura

* Remove date finding

* Add more WebRetriever docs

* Refactor long methods

* making the preprocessor optional

* hide WebSearch and make NeuralWebSearch a pipeline

* remove unused imports

* add WebQAPipeline and split example into two

* change example search engine to SerperDev

* Turn off progress bars in WebRetriever's PreProcesssor

* Agent tool examples - final updates

* Add webqa test, search results ranking scores

* Better answer box handling for SerperDev and SerpAPI

* Minor fixes

* pylint

* pylint fixes

* extract TopPSampler from WebRetriever

* use sampler only for WebRetriever modes other than snippet

* add web retriever tests

* add web retriever tests

* exclude rdflib@6.3.2 due to license issues

* add test for preprocessed docs and kwargs examples in docstrings

* Move test_webqa_pipeline to test/pipelines

* change docstring for join_documents_and_scores

* Use WebQAPipeline in examples/web_lfqa.py

* Use WebQAPipeline in examples/web_lfqa.py

* Move test_webqa_pipeline to e2e

* Updated lg

* Sampler added automatically in WebQAPipeline, no need to add it

* Updated lg

* Updated lg

* :ignore Update agent tools examples to new templates (#4503)

* Update examples to new templates

* Add print back

* fix linting and black format issues

---------

Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-27 18:14:58 +02:00
tstadel
4f90e59796
feat: expose prompts to Answer and EvaluationResult (#4341)
* store prompt in Answer

* store prompt in eval csv

* fix tests

* chore: fix context offset loadingQ

* add tests

* add test from PR #4476

* fix tests after merge
2023-03-27 17:54:20 +02:00
ZanSara
6d578ebf3d
refactor: remove telemetry v1 (#4496)
* remove telemetry v1

* more pipeline methods to take out

* send_event_2

* mypy

* pylint

* mypy

* mypy again

* remove test
2023-03-27 17:38:43 +02:00
Silvano Cerza
3b5223fa1c
refactor: Mark MilvusDocumentStore as deprecated (#4498)
* Mark MilvusDocumentStore as deprecated

* Fix mypy
2023-03-27 15:31:48 +02:00
Silvano Cerza
5b63c2086e
refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever (#4500)
* Deprecate BaseKnowledgeGraph and InMemoryKnowledgeGraph

* Deprecate GraphDBKnowledgeGraph

* Fix mypy

* Deprecate Text2SparqlRetriever
2023-03-27 15:31:22 +02:00
tstadel
f8bb270d62
feat: prompt at query time (#4454)
* use outputshapers in prompttemplate

* fix pylint

* first iteration on regex

* implement new promptnode syntax based on f-strings

* finish fstring implementation

* add additional tests

* add security tests

* fix mypy

* fix pylint

* fix test_prompt_templates

* fix test_prompt_template_repr

* fix test_prompt_node_with_custom_invocation_layer

* fix test_invalid_template

* more security tests

* fix test_complex_pipeline_with_all_features

* fix agent tests

* refactor get_prompt_template

* fix test_prompt_template_syntax_parser

* fix test_complex_pipeline_with_all_features

* allow functions in comprehensions

* break out of fstring test

* fix additional tests

* mark new tests as unit tests

* fix agents tests

* convert missing templates

* proper use of get_prompt_template

* refactor and add docstrings

* fix tests

* fix pylint

* fix agents test

* fix tests

* refactor globals

* make allowed functions configurable via env variable

* better dummy variable

* fix special alias

* don't replace special char variables

* more special chars, better docstrings

* cherrypick fix audio tests

* fix test

* rework shapers

* fix pylint

* fix tests

* add new templates

* add reference parsing

* add more shaper tests

* add tests for join and to_string

* fix pylint

* fix pylint

* fix pylint for real

* auto fill shaper function params

* fix reference parsing for multiple references

* fix output variable inference

* consolidate qa prompt template output and make shaper work per-document

* implement prompt at query time

* support serialized PromptTemplates

* fix tests

* add tests for prompt template at query time

* fix types after merge

* fix types after merge

* improve test

* add test for nested shaper syntax in pipelines

* better docstrings

* Correct copilot errors

* found another copilot error

* Another one

* introduce output_parser

* introduce output_parser

* Fix tests for output_parser update

* fix black

* fix tests

* fix tests

* fix tests

* better docstring

* better docstring

* fix test

* fix mypy

* rename RegexAnswerParser to AnswerParser

* rename RegexAnswerParser to AnswerParser

* better docstrings

* better docstrings

* fix docstring example
2023-03-27 14:10:20 +02:00
tstadel
382ca8094e
feat: PromptTemplate extensions (#4378)
* use outputshapers in prompttemplate

* fix pylint

* first iteration on regex

* implement new promptnode syntax based on f-strings

* finish fstring implementation

* add additional tests

* add security tests

* fix mypy

* fix pylint

* fix test_prompt_templates

* fix test_prompt_template_repr

* fix test_prompt_node_with_custom_invocation_layer

* fix test_invalid_template

* more security tests

* fix test_complex_pipeline_with_all_features

* fix agent tests

* refactor get_prompt_template

* fix test_prompt_template_syntax_parser

* fix test_complex_pipeline_with_all_features

* allow functions in comprehensions

* break out of fstring test

* fix additional tests

* mark new tests as unit tests

* fix agents tests

* convert missing templates

* proper use of get_prompt_template

* refactor and add docstrings

* fix tests

* fix pylint

* fix agents test

* fix tests

* refactor globals

* make allowed functions configurable via env variable

* better dummy variable

* fix special alias

* don't replace special char variables

* more special chars, better docstrings

* cherrypick fix audio tests

* fix test

* rework shapers

* fix pylint

* fix tests

* add new templates

* add reference parsing

* add more shaper tests

* add tests for join and to_string

* fix pylint

* fix pylint

* fix pylint for real

* auto fill shaper function params

* fix reference parsing for multiple references

* fix output variable inference

* consolidate qa prompt template output and make shaper work per-document

* fix types after merge

* introduce output_parser

* fix tests

* better docstring

* rename RegexAnswerParser to AnswerParser

* better docstrings
2023-03-27 12:14:11 +02:00
recrudesce
2a2226d63e
fix: Fix debug on PromptNode (#4483)
* Fix debug on PromptNode

Allow the ability to control debug output on PromptNode

* added tests, simplified code

---------

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-24 19:37:52 +05:30
Silvano Cerza
b70715a74d
Remove retry_with_exponential_backoff in favor of tenacity (#4460) 2023-03-24 11:14:11 +01:00
Vladimir Blagojevic
7bb6499c29
feat: Enable PromptNode to use text-generation models (#4349) 2023-03-22 07:20:36 +01:00
Vladimir Blagojevic
3272e2b9fe
refactor: Add AgentStep (#4431) 2023-03-17 18:21:14 +01:00
Silvano Cerza
d55bac189c
Make version semver compliant (#4456) 2023-03-17 14:21:36 +01:00
Vladimir Blagojevic
53528c96a0
feat: Add ChatGPT PromptNode layer (#4357)
* Initial ChatGPTInvocationLayer
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
2023-03-17 14:16:41 +01:00
Sebastian
f04b2f3cee
Update test to reflect change in max token length (#4451) 2023-03-17 09:43:23 +01:00
ju-gu
a3409c7da6
fix: issue evaluation check for content type (#4181)
* fix: issue evaluation check for content type

Evaluation currently breaks, when the content type is not a str.

* add black

* add test table eval

* add black formatting

* Expand integration test

---------

Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-03-16 17:36:53 +01:00
Silvano Cerza
1b5df55dbb
Skip flaky test (#4444) 2023-03-16 16:32:28 +01:00
Silvano Cerza
9802fb159a
Remove unnecessary imports in conftest.py (#4434) 2023-03-16 10:02:01 +01:00
Silvano Cerza
3591fc02e1
Mark Crawler tests correctly (#4435) 2023-03-16 09:26:19 +01:00
Vladimir Blagojevic
2538b4cbc9
Make promptnode test unit (#4420) 2023-03-15 22:17:23 +01:00
Silvano Cerza
b59cf76093
refactor: Remove AnswerToSpeech and DocumentToSpeech nodes (#4391)
* Remove AnswerToSpeech and DocumentToSpeech nodes

* Remove unused dataclasses

* Remove unnecessary dependencies

* Remove unused error class and imports
2023-03-15 19:31:13 +01:00
Vladimir Blagojevic
f13501309e
OpenAI streaming support (#4397) 2023-03-15 18:24:47 +01:00
ZanSara
3ecce5cbeb
refactor: rename v2 package to preview (#4409)
* v2->preview

* fossa -> py3.8

* test matrix

* test matrix

* tests

* test imports

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-03-15 18:02:18 +01:00
Agnieszka Marzec
374d7c9c4f
docs: Update Agent docstrings + add api docs (#4296)
* Update docstrings + add api docs

* Update with reviewer's changes

* Fix category id and blackify

* make max iterations test more robust

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-15 17:26:35 +01:00
Silvano Cerza
b3a659cd4a
test: Fix audio tests failing (#4418)
* Fix audio tests failing

* Disable local whisper tests
2023-03-15 15:26:30 +01:00
kaixuanliu
edf39edda0
fix: when using IVF* indexing, ensure the index is trained frist (#4311)
* add protection, in case we use IVF* indexing, we need to train the index first

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>

* fix formatting issue

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>

* just raising error, instead of silently training the index

* fixed mypy issue

* fixed error msg

---------

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-15 08:55:37 +01:00
ZanSara
677fc8badf
feat: new Pipeline (#4368)
* add import for canals

* add stores support to canals

* pyproject.toml

* move tests

* add v2 to the extras in ci

* install v2 in action

* pylint

* save and load

* save and load

* codename "Alfalfa"

* workflows
2023-03-14 17:01:19 +01:00
Massimiliano Pippi
5aa19ffde6
remove deprecated OpenDistroElasticsearchDocumentStore (#4361) 2023-03-14 09:12:49 +01:00
Vladimir Blagojevic
98256ecf57
Add Whisper node (#4335)
* Add Whisper node

* Add support for audio path, improve tests

* Add docs

* Improve tests
2023-03-13 16:17:07 +01:00