3597 Commits

Author SHA1 Message Date
tstadel
4f90e59796
feat: expose prompts to Answer and EvaluationResult (#4341)
* store prompt in Answer

* store prompt in eval csv

* fix tests

* chore: fix context offset loadingQ

* add tests

* add test from PR #4476

* fix tests after merge
2023-03-27 17:54:20 +02:00
ZanSara
6d578ebf3d
refactor: remove telemetry v1 (#4496)
* remove telemetry v1

* more pipeline methods to take out

* send_event_2

* mypy

* pylint

* mypy

* mypy again

* remove test
2023-03-27 17:38:43 +02:00
Silvano Cerza
3b5223fa1c
refactor: Mark MilvusDocumentStore as deprecated (#4498)
* Mark MilvusDocumentStore as deprecated

* Fix mypy
2023-03-27 15:31:48 +02:00
Silvano Cerza
5b63c2086e
refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever (#4500)
* Deprecate BaseKnowledgeGraph and InMemoryKnowledgeGraph

* Deprecate GraphDBKnowledgeGraph

* Fix mypy

* Deprecate Text2SparqlRetriever
2023-03-27 15:31:22 +02:00
tstadel
f8bb270d62
feat: prompt at query time (#4454)
* use outputshapers in prompttemplate

* fix pylint

* first iteration on regex

* implement new promptnode syntax based on f-strings

* finish fstring implementation

* add additional tests

* add security tests

* fix mypy

* fix pylint

* fix test_prompt_templates

* fix test_prompt_template_repr

* fix test_prompt_node_with_custom_invocation_layer

* fix test_invalid_template

* more security tests

* fix test_complex_pipeline_with_all_features

* fix agent tests

* refactor get_prompt_template

* fix test_prompt_template_syntax_parser

* fix test_complex_pipeline_with_all_features

* allow functions in comprehensions

* break out of fstring test

* fix additional tests

* mark new tests as unit tests

* fix agents tests

* convert missing templates

* proper use of get_prompt_template

* refactor and add docstrings

* fix tests

* fix pylint

* fix agents test

* fix tests

* refactor globals

* make allowed functions configurable via env variable

* better dummy variable

* fix special alias

* don't replace special char variables

* more special chars, better docstrings

* cherrypick fix audio tests

* fix test

* rework shapers

* fix pylint

* fix tests

* add new templates

* add reference parsing

* add more shaper tests

* add tests for join and to_string

* fix pylint

* fix pylint

* fix pylint for real

* auto fill shaper function params

* fix reference parsing for multiple references

* fix output variable inference

* consolidate qa prompt template output and make shaper work per-document

* implement prompt at query time

* support serialized PromptTemplates

* fix tests

* add tests for prompt template at query time

* fix types after merge

* fix types after merge

* improve test

* add test for nested shaper syntax in pipelines

* better docstrings

* Correct copilot errors

* found another copilot error

* Another one

* introduce output_parser

* introduce output_parser

* Fix tests for output_parser update

* fix black

* fix tests

* fix tests

* fix tests

* better docstring

* better docstring

* fix test

* fix mypy

* rename RegexAnswerParser to AnswerParser

* rename RegexAnswerParser to AnswerParser

* better docstrings

* better docstrings

* fix docstring example
2023-03-27 14:10:20 +02:00
Silvano Cerza
123dfc1b34
refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever (#4499) 2023-03-27 13:47:47 +02:00
tstadel
382ca8094e
feat: PromptTemplate extensions (#4378)
* use outputshapers in prompttemplate

* fix pylint

* first iteration on regex

* implement new promptnode syntax based on f-strings

* finish fstring implementation

* add additional tests

* add security tests

* fix mypy

* fix pylint

* fix test_prompt_templates

* fix test_prompt_template_repr

* fix test_prompt_node_with_custom_invocation_layer

* fix test_invalid_template

* more security tests

* fix test_complex_pipeline_with_all_features

* fix agent tests

* refactor get_prompt_template

* fix test_prompt_template_syntax_parser

* fix test_complex_pipeline_with_all_features

* allow functions in comprehensions

* break out of fstring test

* fix additional tests

* mark new tests as unit tests

* fix agents tests

* convert missing templates

* proper use of get_prompt_template

* refactor and add docstrings

* fix tests

* fix pylint

* fix agents test

* fix tests

* refactor globals

* make allowed functions configurable via env variable

* better dummy variable

* fix special alias

* don't replace special char variables

* more special chars, better docstrings

* cherrypick fix audio tests

* fix test

* rework shapers

* fix pylint

* fix tests

* add new templates

* add reference parsing

* add more shaper tests

* add tests for join and to_string

* fix pylint

* fix pylint

* fix pylint for real

* auto fill shaper function params

* fix reference parsing for multiple references

* fix output variable inference

* consolidate qa prompt template output and make shaper work per-document

* fix types after merge

* introduce output_parser

* fix tests

* better docstring

* rename RegexAnswerParser to AnswerParser

* better docstrings
2023-03-27 12:14:11 +02:00
ZanSara
9518bcb7a8
remove env var (#4497) 2023-03-27 10:33:58 +02:00
Julian Risch
45ce87bb48
bug: Exclude rdflib 6.3.2 because of fossa license issues (#4495) 2023-03-27 10:07:03 +02:00
Vladimir Blagojevic
c99b58100d
feat:Add agent event callbacks (#4491)
* Implement agent callbacks with events

* Fix mypy errors

* Fix prompt_params assignment

* PR review fixes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-03-27 10:06:11 +02:00
recrudesce
2a2226d63e
fix: Fix debug on PromptNode (#4483)
* Fix debug on PromptNode

Allow the ability to control debug output on PromptNode

* added tests, simplified code

---------

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-24 19:37:52 +05:30
Mayank Jobanputra
5f72cdc012
fix: stop loading FAISS and InMem doc Store for indexing pipelines (#4396)
* stop loading FAISS and InMem doc Store for indexing pipelines

* pylint fix

* Addressed comments
2023-03-24 19:35:29 +05:30
Silvano Cerza
b70715a74d
Remove retry_with_exponential_backoff in favor of tenacity (#4460) 2023-03-24 11:14:11 +01:00
Jose Pablo Fernandez
dda350088b
feat: add additional params to file upload endpoint (#4445)
* adds additional params to file upload endpoint

* fix mypy

---------

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-23 14:18:16 +01:00
Vladimir Blagojevic
7bb6499c29
feat: Enable PromptNode to use text-generation models (#4349) 2023-03-22 07:20:36 +01:00
Vladimir Blagojevic
3272e2b9fe
refactor: Add AgentStep (#4431) 2023-03-17 18:21:14 +01:00
ZanSara
4d19bd13a5
refactor: consolidate telemetry events (#4275)
* add specific Ray event

* group evaluation and training events

* consolidate pipeline run events

* fix send_event import

* review feedback

* typo

* send uptime

* track embeddingRetriever openai encoder

* track embeddingRetriever openai encoder

* pylitn

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-03-17 18:14:35 +01:00
Florian Hardow
462484445d
feat: break retry loop for 401 unauthorized errors in promptnode (#4389)
* feat: break retry loop for 401 unauthorized errors in promptnode

* Fix black, pylint, mypy

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/utils/openai_utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* chore: blackify project

* chore: fix liniting error (remove elif after raise)

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-03-17 17:07:08 +01:00
Silvano Cerza
d55bac189c
Make version semver compliant (#4456) 2023-03-17 14:21:36 +01:00
Vladimir Blagojevic
53528c96a0
feat: Add ChatGPT PromptNode layer (#4357)
* Initial ChatGPTInvocationLayer
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
2023-03-17 14:16:41 +01:00
Silvano Cerza
0f605118d9
ci: remove python_cache internal action (#4429) 2023-03-17 13:55:07 +01:00
Agnieszka Marzec
26e0fbb4f8
Docs: Update language classifier docstrings (#4413)
* Update language classifier docstrings

* Apply suggestions from code review

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-03-17 12:40:02 +01:00
Sebastian
f04b2f3cee
Update test to reflect change in max token length (#4451) 2023-03-17 09:43:23 +01:00
Ahmed Nabil
d29342c8bf
feat: Add the New Tokenizer of gpt-3.5-turbo (#4331)
* Updated the tokenizer algorithm and pyproject.tomel tiktoken version

* Updated the tokenizer algorithm and pyproject.tomel tiktoken version

* Update haystack/utils/openai_utils.py

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update references in openai_utils.py

* Update docs/pydoc/config/extractor.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/document-classifier.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/file-converters.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/file-classifier.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/other.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/pipelines.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/preprocessor.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/primitives.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/translator.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/crawler.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/prompt-node.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/pseudo-label-generator.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/query-classifier.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/question-generator.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/reader.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/ranker.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/retriever.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update docs/pydoc/config/transformers-img-to-text.yml

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* Update openai_utils.py

Adding GPT-4 tokenization handler

* try to fix black

---------

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-03-17 08:20:57 +01:00
ju-gu
a3409c7da6
fix: issue evaluation check for content type (#4181)
* fix: issue evaluation check for content type

Evaluation currently breaks, when the content type is not a str.

* add black

* add test table eval

* add black formatting

* Expand integration test

---------

Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-03-16 17:36:53 +01:00
Silvano Cerza
1b5df55dbb
Skip flaky test (#4444) 2023-03-16 16:32:28 +01:00
Silvano Cerza
22c50207c1
Run readme_sync.yml in PRs (#4442) 2023-03-16 15:18:13 +01:00
Massimiliano Pippi
8d4c56720c
do not run tests on osx (#4443) 2023-03-16 15:00:29 +01:00
Agnieszka Marzec
798fba87dd
Fix agent module (#4441) 2023-03-16 10:14:59 +01:00
Silvano Cerza
9802fb159a
Remove unnecessary imports in conftest.py (#4434) 2023-03-16 10:02:01 +01:00
Agnieszka Marzec
3a97e271fc
Fix order and category of agent (#4440) 2023-03-16 09:59:17 +01:00
Silvano Cerza
3591fc02e1
Mark Crawler tests correctly (#4435) 2023-03-16 09:26:19 +01:00
Vladimir Blagojevic
2538b4cbc9
Make promptnode test unit (#4420) 2023-03-15 22:17:23 +01:00
Silvano Cerza
b59cf76093
refactor: Remove AnswerToSpeech and DocumentToSpeech nodes (#4391)
* Remove AnswerToSpeech and DocumentToSpeech nodes

* Remove unused dataclasses

* Remove unnecessary dependencies

* Remove unused error class and imports
2023-03-15 19:31:13 +01:00
Vladimir Blagojevic
f13501309e
OpenAI streaming support (#4397) 2023-03-15 18:24:47 +01:00
ZanSara
3ecce5cbeb
refactor: rename v2 package to preview (#4409)
* v2->preview

* fossa -> py3.8

* test matrix

* test matrix

* tests

* test imports

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-03-15 18:02:18 +01:00
Agnieszka Marzec
374d7c9c4f
docs: Update Agent docstrings + add api docs (#4296)
* Update docstrings + add api docs

* Update with reviewer's changes

* Fix category id and blackify

* make max iterations test more robust

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-15 17:26:35 +01:00
Massimiliano Pippi
d87b310f01
feat: improve is_containerized() (#4412)
* improve is_containerized()

* ignore global-var warning
2023-03-15 17:06:46 +01:00
Silvano Cerza
b3a659cd4a
test: Fix audio tests failing (#4418)
* Fix audio tests failing

* Disable local whisper tests
2023-03-15 15:26:30 +01:00
Silvano Cerza
2c7c4aa04e
Use bigger runner for integration-tests-linux (#4422) 2023-03-15 11:22:16 +01:00
kaixuanliu
edf39edda0
fix: when using IVF* indexing, ensure the index is trained frist (#4311)
* add protection, in case we use IVF* indexing, we need to train the index first

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>

* fix formatting issue

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>

* just raising error, instead of silently training the index

* fixed mypy issue

* fixed error msg

---------

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-15 08:55:37 +01:00
ZanSara
677fc8badf
feat: new Pipeline (#4368)
* add import for canals

* add stores support to canals

* pyproject.toml

* move tests

* add v2 to the extras in ci

* install v2 in action

* pylint

* save and load

* save and load

* codename "Alfalfa"

* workflows
2023-03-14 17:01:19 +01:00
Massimiliano Pippi
1498aacc77
chore: make the docs generator runnable without an API key (#4405)
* spit a warning instead of exiting

* print which file is being converted (useful to debug CI)

* pin docspec for the time being
2023-03-14 16:15:19 +01:00
Massimiliano Pippi
5aa19ffde6
remove deprecated OpenDistroElasticsearchDocumentStore (#4361) 2023-03-14 09:12:49 +01:00
Stefano Fiorucci
7d17ca7391
add DocumentLanguageClassifier API (#4401) 2023-03-14 09:12:03 +01:00
Vladimir Blagojevic
98256ecf57
Add Whisper node (#4335)
* Add Whisper node

* Add support for audio path, improve tests

* Add docs

* Improve tests
2023-03-13 16:17:07 +01:00
Daniel Bichuetti
28724e2e25
feat: add automatic OCR detection mechanism and improve performance (#4329)
* feat: add automatic OCR detection mechanism and improve performance

* refactor: add error message

* refactor: ignore pdftoppm bad typing

* refactor: add Tesseract install. docstrings

* fix: check if OCR var. assigned on mp

* tests: add path to windows/linux tests

* tests: add tessdata path

* tests: include matrix ref.

* tests: custom Tesseract matrix install

* refactor: improve user guide

* tests: fix macos path

* tests: remove brew formulae version

* fix: macos paths

* tests: fix macos path

* tests: add Tesseract to Windows Path

* tests: pytesseract path

* tests: macos path

* refactor: fix path message and remove extra path from tests

* refactor: raise exception when path not found

* refactor: expression simplification

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* refactor: check ocr parameter

* tests: mark as integration

* tests: mock deprecation warning

* refactor: simplify code

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* refactor: change deprecation test

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* refactor: add unit patch

* refactor: black formatting

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-13 20:19:22 +05:30
ZanSara
fd3f3143d4
feat: LanguageClassifier (#2994)
* add lanaguage classifier node

* Fix a few bugs and general code style

* whitespace

* first draft and refactoring

* draft of classes separation

* improve base class

* fix inivisible character; add some tests

* fix and more tests

* more docs and tests

* move __init__ to base

* add transformers node; improve tests

* incorporate feedback; little fix to other node

* labels_to_languages mapping

* better docstrings

* use logger instead of logging

---------

Co-authored-by: Stanislav Zamecnik <stanislav.zamecnik@telekom.com>
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
Co-authored-by: stazam <zamecnik.stanislav@gmail.com>
2023-03-13 10:30:03 +01:00
Mahipal Singh Rathore
405aee0cfa
Update table.py (#4376)
Answer should be checked if it is not none before adding id to it
2023-03-13 10:27:59 +01:00
ZanSara
8ea7ba3a94
proposal: drop BaseComponent and re-implement Pipeline (#4284)
* draft proposal

* pr number

* reminder for an agent pipeline example

* proposal number

* add real query pipeline

* add paragraph on validation

* wording

* add_store

* decorator

* add rollout process and parameter's hierarchy examples

* rename project into application

* feedback from the meeting

* defer evaluation to another proposal

* smaller changes

* remove applications for now

* u-turn on pipeline.connect()

* typo

* connect_from/to

* update with Malte's feedback
2023-03-13 10:05:59 +01:00