Silvano Cerza
d55bac189c
Make version semver compliant ( #4456 )
2023-03-17 14:21:36 +01:00
Vladimir Blagojevic
53528c96a0
feat: Add ChatGPT PromptNode layer ( #4357 )
...
* Initial ChatGPTInvocationLayer
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
2023-03-17 14:16:41 +01:00
Silvano Cerza
0f605118d9
ci: remove python_cache internal action ( #4429 )
2023-03-17 13:55:07 +01:00
Agnieszka Marzec
26e0fbb4f8
Docs: Update language classifier docstrings ( #4413 )
...
* Update language classifier docstrings
* Apply suggestions from code review
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-03-17 12:40:02 +01:00
Sebastian
f04b2f3cee
Update test to reflect change in max token length ( #4451 )
2023-03-17 09:43:23 +01:00
Ahmed Nabil
d29342c8bf
feat: Add the New Tokenizer of gpt-3.5-turbo ( #4331 )
...
* Updated the tokenizer algorithm and pyproject.tomel tiktoken version
* Updated the tokenizer algorithm and pyproject.tomel tiktoken version
* Update haystack/utils/openai_utils.py
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update references in openai_utils.py
* Update docs/pydoc/config/extractor.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/document-classifier.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/file-converters.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/file-classifier.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/other.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/pipelines.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/preprocessor.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/primitives.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/translator.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/crawler.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/prompt-node.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/pseudo-label-generator.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/query-classifier.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/question-generator.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/reader.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/ranker.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/retriever.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update docs/pydoc/config/transformers-img-to-text.yml
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* Update openai_utils.py
Adding GPT-4 tokenization handler
* try to fix black
---------
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-03-17 08:20:57 +01:00
ju-gu
a3409c7da6
fix: issue evaluation check for content type ( #4181 )
...
* fix: issue evaluation check for content type
Evaluation currently breaks, when the content type is not a str.
* add black
* add test table eval
* add black formatting
* Expand integration test
---------
Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-03-16 17:36:53 +01:00
Silvano Cerza
1b5df55dbb
Skip flaky test ( #4444 )
2023-03-16 16:32:28 +01:00
Silvano Cerza
22c50207c1
Run readme_sync.yml in PRs ( #4442 )
2023-03-16 15:18:13 +01:00
Massimiliano Pippi
8d4c56720c
do not run tests on osx ( #4443 )
2023-03-16 15:00:29 +01:00
Agnieszka Marzec
798fba87dd
Fix agent module ( #4441 )
2023-03-16 10:14:59 +01:00
Silvano Cerza
9802fb159a
Remove unnecessary imports in conftest.py ( #4434 )
2023-03-16 10:02:01 +01:00
Agnieszka Marzec
3a97e271fc
Fix order and category of agent ( #4440 )
2023-03-16 09:59:17 +01:00
Silvano Cerza
3591fc02e1
Mark Crawler tests correctly ( #4435 )
2023-03-16 09:26:19 +01:00
Vladimir Blagojevic
2538b4cbc9
Make promptnode test unit ( #4420 )
2023-03-15 22:17:23 +01:00
Silvano Cerza
b59cf76093
refactor: Remove AnswerToSpeech and DocumentToSpeech nodes ( #4391 )
...
* Remove AnswerToSpeech and DocumentToSpeech nodes
* Remove unused dataclasses
* Remove unnecessary dependencies
* Remove unused error class and imports
2023-03-15 19:31:13 +01:00
Vladimir Blagojevic
f13501309e
OpenAI streaming support ( #4397 )
2023-03-15 18:24:47 +01:00
ZanSara
3ecce5cbeb
refactor: rename v2 package to preview ( #4409 )
...
* v2->preview
* fossa -> py3.8
* test matrix
* test matrix
* tests
* test imports
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-03-15 18:02:18 +01:00
Agnieszka Marzec
374d7c9c4f
docs: Update Agent docstrings + add api docs ( #4296 )
...
* Update docstrings + add api docs
* Update with reviewer's changes
* Fix category id and blackify
* make max iterations test more robust
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-15 17:26:35 +01:00
Massimiliano Pippi
d87b310f01
feat: improve is_containerized() ( #4412 )
...
* improve is_containerized()
* ignore global-var warning
2023-03-15 17:06:46 +01:00
Silvano Cerza
b3a659cd4a
test: Fix audio tests failing ( #4418 )
...
* Fix audio tests failing
* Disable local whisper tests
2023-03-15 15:26:30 +01:00
Silvano Cerza
2c7c4aa04e
Use bigger runner for integration-tests-linux ( #4422 )
2023-03-15 11:22:16 +01:00
kaixuanliu
edf39edda0
fix: when using IVF* indexing, ensure the index is trained frist ( #4311 )
...
* add protection, in case we use IVF* indexing, we need to train the index first
Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>
* fix formatting issue
Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>
* just raising error, instead of silently training the index
* fixed mypy issue
* fixed error msg
---------
Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-15 08:55:37 +01:00
ZanSara
677fc8badf
feat: new Pipeline ( #4368 )
...
* add import for canals
* add stores support to canals
* pyproject.toml
* move tests
* add v2 to the extras in ci
* install v2 in action
* pylint
* save and load
* save and load
* codename "Alfalfa"
* workflows
2023-03-14 17:01:19 +01:00
Massimiliano Pippi
1498aacc77
chore: make the docs generator runnable without an API key ( #4405 )
...
* spit a warning instead of exiting
* print which file is being converted (useful to debug CI)
* pin docspec for the time being
2023-03-14 16:15:19 +01:00
Massimiliano Pippi
5aa19ffde6
remove deprecated OpenDistroElasticsearchDocumentStore ( #4361 )
2023-03-14 09:12:49 +01:00
Stefano Fiorucci
7d17ca7391
add DocumentLanguageClassifier API ( #4401 )
2023-03-14 09:12:03 +01:00
Vladimir Blagojevic
98256ecf57
Add Whisper node ( #4335 )
...
* Add Whisper node
* Add support for audio path, improve tests
* Add docs
* Improve tests
2023-03-13 16:17:07 +01:00
Daniel Bichuetti
28724e2e25
feat: add automatic OCR detection mechanism and improve performance ( #4329 )
...
* feat: add automatic OCR detection mechanism and improve performance
* refactor: add error message
* refactor: ignore pdftoppm bad typing
* refactor: add Tesseract install. docstrings
* fix: check if OCR var. assigned on mp
* tests: add path to windows/linux tests
* tests: add tessdata path
* tests: include matrix ref.
* tests: custom Tesseract matrix install
* refactor: improve user guide
* tests: fix macos path
* tests: remove brew formulae version
* fix: macos paths
* tests: fix macos path
* tests: add Tesseract to Windows Path
* tests: pytesseract path
* tests: macos path
* refactor: fix path message and remove extra path from tests
* refactor: raise exception when path not found
* refactor: expression simplification
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* refactor: check ocr parameter
* tests: mark as integration
* tests: mock deprecation warning
* refactor: simplify code
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* refactor: change deprecation test
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* refactor: add unit patch
* refactor: black formatting
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-13 20:19:22 +05:30
ZanSara
fd3f3143d4
feat: LanguageClassifier ( #2994 )
...
* add lanaguage classifier node
* Fix a few bugs and general code style
* whitespace
* first draft and refactoring
* draft of classes separation
* improve base class
* fix inivisible character; add some tests
* fix and more tests
* more docs and tests
* move __init__ to base
* add transformers node; improve tests
* incorporate feedback; little fix to other node
* labels_to_languages mapping
* better docstrings
* use logger instead of logging
---------
Co-authored-by: Stanislav Zamecnik <stanislav.zamecnik@telekom.com>
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
Co-authored-by: stazam <zamecnik.stanislav@gmail.com>
2023-03-13 10:30:03 +01:00
Mahipal Singh Rathore
405aee0cfa
Update table.py ( #4376 )
...
Answer should be checked if it is not none before adding id to it
2023-03-13 10:27:59 +01:00
ZanSara
8ea7ba3a94
proposal: drop BaseComponent and re-implement Pipeline ( #4284 )
...
* draft proposal
* pr number
* reminder for an agent pipeline example
* proposal number
* add real query pipeline
* add paragraph on validation
* wording
* add_store
* decorator
* add rollout process and parameter's hierarchy examples
* rename project into application
* feedback from the meeting
* defer evaluation to another proposal
* smaller changes
* remove applications for now
* u-turn on pipeline.connect()
* typo
* connect_from/to
* update with Malte's feedback
2023-03-13 10:05:59 +01:00
Vladimir Blagojevic
95a48c6c9d
refactor: Simplify agent and tool interaction ( #4362 )
...
* Simplify agent and tool interaction
2023-03-10 18:07:44 +01:00
Stefano Fiorucci
444a3116c4
docs: TransformersImageToText- inform about supported models, better exception handling ( #4310 )
...
* better docs, exception handling and tests
* Update lg
* fix little error
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-03-09 15:35:17 +01:00
Mayank Jobanputra
39a20c37fd
fix: hf-tiny-roberta model loading from disk and mypy errors ( #4363 )
...
* Fix mypy failures
* Fix try 1 hf model on windows
* Fix try 2 hf model on windows
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-03-09 18:06:09 +05:30
Vítor Bernardes
95851b82fb
fix: Fix print_answers for output of query run_batch ( #4273 )
...
* fix: Fix `print_answers` for output of query `run_batch` (#4255 )
* fix: print "Answers" label even with no query list
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* test: add unit tests for `print_answers` on `run`, `run_batch` output (#4255 )
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-03-09 12:10:50 +01:00
bogdankostic
e3503a92c9
build: Use uvicorn instead of gunicorn as server in REST API's Dockerfile ( #4304 )
...
* Use uvicorn instead of gunicorn as server
* Added comments and changed service names
* comments improvised
---------
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-09 01:46:07 +05:30
Stefano Fiorucci
f90ffb6851
increase MetaDocumentORM value length ( #4333 )
2023-03-08 03:15:27 +05:30
Bilge Yücel
9198d5ec42
chore: add topic:promptnode label ( #4347 )
2023-03-07 21:23:40 +01:00
ZanSara
024332f98f
refactor: simplify registration of PromptModelInvocationLayer ( #4339 )
...
* use __init_subclass__ and remove registering functions
2023-03-07 20:53:48 +01:00
Sebastian
7d5e7c089c
refactor: Use TableQuestionAnsweringPipeline from transformers ( #4303 )
...
* Added changes from table-qa-pipeline
* Moved classes around to make diff to main look nicer.
* Cleaned things up. Removed option to return_no_answer (not needed), added docs and added integration marks.
* Remove unneeded code
* Added fix for test
* Add check for document_ids in answer
* Prevent passing of empty list to np.mean
* Batching doesn't work with TableQAPipeline b/c of HF issue
* Cleanup of table reader tests, added check for document ids.
* Fixing pylint
* More pylint
* PR comments
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-03-07 11:46:50 +01:00
tstadel
d096f03230
proposal: Shapers in Prompt Templates ( #4172 )
...
* add proposal
* Update 0000-shaper-in-prompt-template.md
* rename proposal file
* update proposal according to feedback
* add clarification about the number of prompts generated
* add section about parsing logic
* Revert "add section about parsing logic"
This reverts commit 904713558706206637eefe1579420d89663f58b8.
* add section about parsing logic
* fix typo
* improved the detailed design section
* fix code section
* chore formatting
* chore formatting
* updated adoption strategy
* final typo and expression changes
2023-03-07 09:52:18 +01:00
Tuana Çelik
8cd8ff6cbb
Update README.md ( #4340 )
2023-03-07 08:34:21 +01:00
Daniel Bichuetti
af6efbdcb0
refactor: Allow flexible document id generation ( #4326 )
2023-03-07 07:25:27 +01:00
Zoltan Fedor
4dea9db01e
feat: Report execution time for pipeline components in _debug ( #4197 )
...
* Adding execution time to the debug output of pipeline components
* Linting issue fix
* [EMPTY] Re-trigger CI
* fixed test
---------
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-07 04:45:31 +05:30
tstadel
19311119db
fix: EvalResult load migration ( #4289 )
...
* fix evalresult load migration
* handle none values correctly
* better None check
* improve logic and add test
2023-03-06 20:05:02 +01:00
Silvano Cerza
9253990bdf
Add workflow to push CI metrics to Datadog ( #4336 )
2023-03-06 18:02:24 +01:00
ZanSara
c802305ccf
test: move tests on standard pipelines in e2e/ ( #4309 )
...
* move out standard pipelines e2e
* fixing unit tests
* add test data
* feedback
* pylint
* black
2023-03-06 17:26:19 +01:00
Vladimir Blagojevic
348e7d2dfe
refactor: Separate PromptModelInvocationLayers in providers.py ( #4327 )
...
* Refactor PromptNode, separate PromptModelInvocationLayers in providers.py
2023-03-06 16:34:59 +01:00
Daniel Bichuetti
1548c5ba0f
feat: Add Azure OpenAI embeddings support ( #4332 )
...
* feate: add Azure OpenAI as embedding option
* feat: Add Azure OpenAI embeddings support
* refactor: check api key
* refactor: better type checking for Azure
* refactor: enable parallelism + separate and update tests
* refactor: string reformat
* refactor: explicit typing
* refactor: update refs and remove unused code
2023-03-06 13:37:20 +01:00