1919 Commits

Author SHA1 Message Date
Massimiliano Pippi
83d615a32b
feat: include testing facilities into haystack package (#4182) 2023-02-17 19:38:03 +01:00
Sebastian
44509cd6a1
feat: Add OpenAIError to retry mechanism (#4178)
* Add OpenAIError to retry mechanism. Use env variable for timeout for OpenAI request in PromptNode.

* Updated retry in OpenAI embedding encoder as well.

* Empty commit
2023-02-17 13:17:44 +01:00
bogdankostic
7eeb3e07bf
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850)
* Add IVF and Product Quantization support for OpenSearchDocumentStore

* Remove unused import statement

* Fix mypy

* Adapt doc strings and error messages to account for PQ

* Adapt validation of indices

* Adapt existing tests

* Fix pylint

* Add tests

* Update lg

* Adapt based on PR review comments

* Fix Pylint

* Adapt based on PR review

* Add request_timeout

* Adapt based on PR review

* Adapt based on PR review

* Adapt tests

* Pin tenacity

* Unpin tenacity

* Adapt based on PR comments

* Add match to tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-17 10:28:36 +01:00
Tuana Celik
8370715e7c
chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used (#4155)
* removing old dataset telemetry events

* changing function name

* adding the datasets back for old tutorials

* fixing mini bug

* resolving cometns

* quick bug fix

* re-adding docstrings

* removing unnecessay import

* re-adding the telemetry event call for datasets

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-17 00:21:46 +01:00
tstadel
e7bb2487eb
make all OpenAI API params controllable via model_kwargs (#4183)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-16 19:56:08 +01:00
Daniel Bichuetti
9f5a3344d5
fix: Windows amd64 platform repr (#4175) 2023-02-16 19:46:34 +01:00
Tuana Celik
cdb05f0f9a
chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option (#4135)
* fix to include the PromptTemplate object as an option

* small fix
2023-02-16 19:05:04 +01:00
Silvano Cerza
a4407f8f98
Use larger runner for Docker release workflow (#4185) 2023-02-16 18:59:13 +01:00
bogdankostic
fe650b2a3a
fix: Remove logging statement of setting ID manually in Document (#4129)
* Remove logging statement

* update lg

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-16 18:58:21 +01:00
Daniel Bichuetti
5187cc1801
refactor: Remove the pin from the espnet module and fix the audio node tests. (#4128)
* fix: fix audio tests + unbound some dependencies

* fix: update for Python 3.8

* refactor: change numpy assertion

* feat: add voice recog. support on audio tests

* fix: fix var assignement

* chore: dummy commit

* fix: fix sndfile error

* refactor: change skip reason

* refactor: hardcode variable

* refactor: unpin numpy

* fix: pin numpy only for audio
2023-02-16 22:12:17 +05:30
Agnieszka Marzec
e7c32da8d7
Fix code block formatting (#4162) 2023-02-16 16:55:41 +01:00
Agnieszka Marzec
e16f1c8935
Docs: Add filter to hide entity post processor (#4160)
* Add filter to hide entity post processor

* Add missing space
2023-02-16 16:40:42 +01:00
Silvano Cerza
689f2cd250
Update docstring-labeler.yml workflow to safely run in PRs from forks (#4146) 2023-02-16 16:02:41 +01:00
Mayank Jobanputra
d27f372b67
build: cache nltk models into the docker image (#4118)
* separated nltk cache

* separated nltk caching

* fixed pylint lazy log error

* using model name as default value
2023-02-16 16:56:16 +05:30
Massimiliano Pippi
ec72dd73fc
refactor: complete the document stores test refactoring (#4125)
* add e2e tests

* move tests to their own module

* add e2e workflow

* pylint

* remove from job

* fix index field name

* skip test on sql

* removed unused code

* fix embedding tests

* adjust test for pinecone

* adjust assertions to the new documents

* bad copypasta

* test

* fix tests

* fix tests

* fix test

* fix tests

* pylint

* update milvus version

* remove debug

* move graphdb tests under e2e
2023-02-16 09:43:25 +01:00
Sebastian
9a26942952
feat: Add model_kwargs option to PromptNode (#4151)
* Add input option to PromptNode to allow the passing of default kwargs

* Add yaml test for model_kwargs parameter
2023-02-15 18:46:26 +01:00
Stefano Fiorucci
24405f851c
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors (#4113)
* refactoring and test

* try to replace error with warning

* more expressive and robust get_scores methods

* make get_scores methods internal
2023-02-14 17:43:11 +01:00
Silvano Cerza
d86a511cc1
Fix Docker images test on release (#4153) 2023-02-14 14:18:49 +01:00
bogdankostic
4a88fae1e7
Update annotation tool readme (#4123) 2023-02-14 09:53:27 +01:00
Sebastian
75ef959678
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038)
* added instruction_prompt and update defaults

* Change back max_tokens

* Code formatting

* Starting to update instruction_prompt to be a PromptTemplate

* Using PromptTemplate in OpenAIAnswerGenerator

* Removed hardcoded value

* pylint and make examples and examples_context optional prompt parameters

* Added new test for when prompt length goes past max token limit

* Improve doc strings.

* Make "text-davinci-003" the new default model

* Renaming variable to prompt_template and name to question-answering-with-examples

* Reduced repetitive code.

* Added some comments to explain key logic for future debuggers

* Update docs for max_tokens and increase defaul

* Updating variable name to prompt_template and docs.

* Updated test and handled Answer case where no documents are used.

* Slight update to docs.

* Adding more doc strings

* lg updates

* Blackify

---------

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-12 00:08:07 +01:00
Silvano Cerza
3cdfe9ca40
Revert changes introduced in PR #4124 (#4137) 2023-02-10 17:54:20 +01:00
Silvano Cerza
d9a7e8011f
Add load arg to docker/bake-action before testing Docker images (#4124) 2023-02-10 17:41:27 +01:00
bogdankostic
27aaa92800
docs: Remove some classes regarding PromptNode from API reference docs (#4132) 2023-02-10 15:56:38 +01:00
Vladimir Blagojevic
d839b9314f
Update PromptTemplate tests (#4131) 2023-02-10 15:24:01 +01:00
bogdankostic
05950719ba
fix: Deduplicate same Documents in isolated evaluation of Reader (#4114)
* Deduplicate same Documents in one MultiLabel

* Add tests

* Update label

* Update label

* Update test

* Update test

* Revert change to check CI

* Revert reversion

* Use deepcopy

* Update tests
2023-02-10 13:55:14 +01:00
Agnieszka Marzec
3c793e4edc
Docs: Update docstrings (#4119)
* Update docstrings

* Blackify

* Bring back the template wording

* Blackify
2023-02-10 11:51:51 +01:00
Silvano Cerza
2cc938ff90
ci: Add workflow to label PRs that edit docstrings (#4115)
* Add workflow to label PRs that edit docstrings

* Add python-version arg in setup-python steps

* Run workflow only in haystack and rest_api python files edit

* Fix labeling job

* Fix labeling conditional

* Fix files globbing in docstrings_checksum.py

* Fix typing

* Rework workflow to use a single job
2023-02-09 18:57:30 +01:00
Silvano Cerza
0b23f84205
Exclude .github folder from triggering tests in CI (#4120) 2023-02-09 18:07:27 +01:00
Jack Butler
e6b6f70ae2
fix: Fix TableTextRetriever for input consisting of tables only (#4048)
* fix: update kwargs for TriAdaptiveModel

* fix: squeeze batch for TTR inference

* test: add test for ttr + dataframe case

* test: update and reorganise ttr tests

* refactor: make triadaptive model handle shapes

* refactor: remove duplicate reshaping

* refactor: rename test with duplicate name

* fix: add device assignment back to TTR

* fix: remove duplicated vars in test

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-09 11:38:16 +01:00
bogdankostic
986472c26f
feat: Add BM25 support for tables in InMemoryDocumentStore (#4090)
* Add BM25 support for tables in InMemoryDocumentStore

* Add table type to query method

* Fix import order

* Adapt tests
2023-02-09 10:47:35 +01:00
Mayank Jobanputra
93962c09fc
fix: fix torchaudio version (#4102)
* fix torchaudio version

* added comment for keeping torchaudio last

* removed torchaudio from base
2023-02-09 15:14:10 +05:30
oryx1729
8ecadd1cac
fix: query filters in REST API (#4105)
* Remove legacy _format_filters()

* Remove test case
2023-02-09 10:42:31 +01:00
Bijay Gurung
79f57d8460
Proposal: Add a JsonConverter node (#3959)
* Add Proposal: JsonConverter

* Add jsonl support + schema to JsonConverter Proposal

* Remove format option from JsonConverter Proposal

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-02-09 09:57:00 +01:00
hsm207
508d9f6b32
feat: add support for custom headers (#4040) 2023-02-09 07:08:40 +01:00
Silvano Cerza
adf4a3ea2f
Fix pylint CI check running with no files (#4097) 2023-02-08 16:33:07 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
Sebastian
1bbf10a376
Remove double batching in retrieve_batch (#4014)
* Removed double batching around embed_queries

* Add back tests for retrieve_batch for dpr and embedding retrievers

* Updated table-text-retriever to not double batch

* Fixing pylint

* Update to test

* Remove code breaking test

* Updating dev comment to be clearer
2023-02-08 14:39:20 +01:00
Silvano Cerza
c66f855caf
Add missing env vars in rest_api CI tests (#4098) 2023-02-08 12:48:20 +01:00
Sebastian
01d39df863
feat: Update allowed models to be used with Prompt Node (#4018)
* Update allowed models to be used with Prompt Node

* Added try except block around the config to skip over OpenAI models.

* Fixing tests

* Adding warning message

* Adding test for different HF models that could be used in prompt node
2023-02-08 12:47:52 +01:00
Agnieszka Marzec
8135e75139
Add shaper to api docs (#4083) 2023-02-08 12:15:08 +01:00
Stefano Fiorucci
5c009c2a1a
feat: OpenAI - warn users if max_tokens is too short (#4094)
* warn users if max_tokens is too short

* skip test if not API KEY

* add counters

* correctly run precommit
2023-02-08 10:39:40 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
Silvano Cerza
5689c43e7e
ci: Make tests run conditionally in CI (#4086)
* Make tests run conditionally in CI

* Move rest_api test into separate workflow

* Avoid running tests.yml when rest_api is modified
2023-02-07 21:16:56 +01:00
Zoltan Fedor
a3016f065f
feat: Support multiple RayPipelines (#4078) 2023-02-07 11:01:07 +01:00
Silvano Cerza
3e4a2201df
ci: Change actionlint pre-commit hook to use Dockerized tool (#4060)
* Change actionlint pre-commit hook to use Dockerized tool

* Add ignore rule for actionlint

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-07 09:34:25 +01:00
Julian Risch
0e282e5ca4
refactor: replace mutable default arguments (#4070)
* refactor: replace mutable default arguments

* change type annotation in BasePreProcessor to Optional[List]
2023-02-07 09:30:33 +01:00
Vladimir Blagojevic
3273a2714d
fix: Add PromptTemplate __repr__ method (#4058)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-02-07 08:14:32 +01:00
Sebastian
a9f13d4641
feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever (#4026)
* Add additional options to pass to the SentenceTransformers trainer

* Make options accessible to the EmbeddingRetriever.train

* Update file-converters.yml

* Update transformers-img-to-text.yml

* Update 3550-csv-converter.md

* move type: ignore to correct line

* Moving type ignore again

* Fixing pylint and mypy

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Updated docstring to be less misleading.

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-07 08:05:21 +01:00
Silvano Cerza
bcf3bfdf79
Fix pylint workflow check running on tests files (#4076) 2023-02-06 19:41:36 +01:00
Julian Risch
51f30487e1
fix: add inner query for mysql compatibility (#4068) 2023-02-06 18:18:25 +01:00