1927 Commits

Author SHA1 Message Date
Silvano Cerza
f5b8835e2c
ci: Fix Dockerfile.base failing cause of missing git (#4210) 2023-02-20 18:40:30 +01:00
Silvano Cerza
e6af353530
ci: Add ca-certificates installation to xpdf container (#4206) 2023-02-20 17:47:10 +01:00
abwiersma
7aae4293d7
Check cuda availability before calling (#4174)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-20 17:37:56 +01:00
bogdankostic
18e7b8399b
refactor: Remove id_hash_keys parameter in from_dict method (#4207)
* Remove id_hash_keys parameter in from_dict method

* Remove unused import

* Adapt `from_dict` of `SpeechDocument`

* Revert "Adapt `from_dict` of `SpeechDocument`"

This reverts commit 309cbeb7fbb3094c43be76d9e431db9391913144.

* Adapt `from_dict` of `SpeechDocument`
2023-02-20 17:37:35 +01:00
Silvano Cerza
30cdb81f19
ci: Move xpdf build into separate container (#4199)
* Create Dockerfile and hcl config to build Xpdf

* Create workflow to build Xpdf Docker image

* Update Dockerfile.base to not build Xpdf

* Fix CWD removal and arg casing

* Fix ARG setting
2023-02-20 14:58:11 +01:00
github-actions[bot]
aaa1522c45
Update unstable version and openapi schema (#4205)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2023-02-20 14:57:45 +01:00
tstadel
14578aa54f
feat: add top_k to PromptNode (#4159)
* add top_k to PromptNode

* fix OpenAI

* fix openai test
2023-02-20 14:51:45 +01:00
Sebastian
d129598203
Prompt node/run batch (#4072)
* Starting to implement first pass at run_batch

* Started to add _flatten_input function

* First pass at run_batch method.

* Fixed bug

* Adding tests for run_batch

* Update doc strings

* Pylint and mypy

* Pylint

* Fixing mypy

* Restructurig of run_batch tests

* Add minor lg updates

* Adding more tests

* Update dev comments and call static method differently

* Fixed the setting of output variable

* Set output_variable in __init__ of PromptNode

* Make a one-liner

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-20 11:58:13 +01:00
Massimiliano Pippi
83d615a32b
feat: include testing facilities into haystack package (#4182) 2023-02-17 19:38:03 +01:00
Sebastian
44509cd6a1
feat: Add OpenAIError to retry mechanism (#4178)
* Add OpenAIError to retry mechanism. Use env variable for timeout for OpenAI request in PromptNode.

* Updated retry in OpenAI embedding encoder as well.

* Empty commit
2023-02-17 13:17:44 +01:00
bogdankostic
7eeb3e07bf
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850)
* Add IVF and Product Quantization support for OpenSearchDocumentStore

* Remove unused import statement

* Fix mypy

* Adapt doc strings and error messages to account for PQ

* Adapt validation of indices

* Adapt existing tests

* Fix pylint

* Add tests

* Update lg

* Adapt based on PR review comments

* Fix Pylint

* Adapt based on PR review

* Add request_timeout

* Adapt based on PR review

* Adapt based on PR review

* Adapt tests

* Pin tenacity

* Unpin tenacity

* Adapt based on PR comments

* Add match to tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-17 10:28:36 +01:00
Tuana Celik
8370715e7c
chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used (#4155)
* removing old dataset telemetry events

* changing function name

* adding the datasets back for old tutorials

* fixing mini bug

* resolving cometns

* quick bug fix

* re-adding docstrings

* removing unnecessay import

* re-adding the telemetry event call for datasets

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-17 00:21:46 +01:00
tstadel
e7bb2487eb
make all OpenAI API params controllable via model_kwargs (#4183)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-16 19:56:08 +01:00
Daniel Bichuetti
9f5a3344d5
fix: Windows amd64 platform repr (#4175) 2023-02-16 19:46:34 +01:00
Tuana Celik
cdb05f0f9a
chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option (#4135)
* fix to include the PromptTemplate object as an option

* small fix
2023-02-16 19:05:04 +01:00
Silvano Cerza
a4407f8f98
Use larger runner for Docker release workflow (#4185) 2023-02-16 18:59:13 +01:00
bogdankostic
fe650b2a3a
fix: Remove logging statement of setting ID manually in Document (#4129)
* Remove logging statement

* update lg

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-16 18:58:21 +01:00
Daniel Bichuetti
5187cc1801
refactor: Remove the pin from the espnet module and fix the audio node tests. (#4128)
* fix: fix audio tests + unbound some dependencies

* fix: update for Python 3.8

* refactor: change numpy assertion

* feat: add voice recog. support on audio tests

* fix: fix var assignement

* chore: dummy commit

* fix: fix sndfile error

* refactor: change skip reason

* refactor: hardcode variable

* refactor: unpin numpy

* fix: pin numpy only for audio
2023-02-16 22:12:17 +05:30
Agnieszka Marzec
e7c32da8d7
Fix code block formatting (#4162) 2023-02-16 16:55:41 +01:00
Agnieszka Marzec
e16f1c8935
Docs: Add filter to hide entity post processor (#4160)
* Add filter to hide entity post processor

* Add missing space
2023-02-16 16:40:42 +01:00
Silvano Cerza
689f2cd250
Update docstring-labeler.yml workflow to safely run in PRs from forks (#4146) 2023-02-16 16:02:41 +01:00
Mayank Jobanputra
d27f372b67
build: cache nltk models into the docker image (#4118)
* separated nltk cache

* separated nltk caching

* fixed pylint lazy log error

* using model name as default value
2023-02-16 16:56:16 +05:30
Massimiliano Pippi
ec72dd73fc
refactor: complete the document stores test refactoring (#4125)
* add e2e tests

* move tests to their own module

* add e2e workflow

* pylint

* remove from job

* fix index field name

* skip test on sql

* removed unused code

* fix embedding tests

* adjust test for pinecone

* adjust assertions to the new documents

* bad copypasta

* test

* fix tests

* fix tests

* fix test

* fix tests

* pylint

* update milvus version

* remove debug

* move graphdb tests under e2e
2023-02-16 09:43:25 +01:00
Sebastian
9a26942952
feat: Add model_kwargs option to PromptNode (#4151)
* Add input option to PromptNode to allow the passing of default kwargs

* Add yaml test for model_kwargs parameter
2023-02-15 18:46:26 +01:00
Stefano Fiorucci
24405f851c
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors (#4113)
* refactoring and test

* try to replace error with warning

* more expressive and robust get_scores methods

* make get_scores methods internal
2023-02-14 17:43:11 +01:00
Silvano Cerza
d86a511cc1
Fix Docker images test on release (#4153) 2023-02-14 14:18:49 +01:00
bogdankostic
4a88fae1e7
Update annotation tool readme (#4123) 2023-02-14 09:53:27 +01:00
Sebastian
75ef959678
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038)
* added instruction_prompt and update defaults

* Change back max_tokens

* Code formatting

* Starting to update instruction_prompt to be a PromptTemplate

* Using PromptTemplate in OpenAIAnswerGenerator

* Removed hardcoded value

* pylint and make examples and examples_context optional prompt parameters

* Added new test for when prompt length goes past max token limit

* Improve doc strings.

* Make "text-davinci-003" the new default model

* Renaming variable to prompt_template and name to question-answering-with-examples

* Reduced repetitive code.

* Added some comments to explain key logic for future debuggers

* Update docs for max_tokens and increase defaul

* Updating variable name to prompt_template and docs.

* Updated test and handled Answer case where no documents are used.

* Slight update to docs.

* Adding more doc strings

* lg updates

* Blackify

---------

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-12 00:08:07 +01:00
Silvano Cerza
3cdfe9ca40
Revert changes introduced in PR #4124 (#4137) 2023-02-10 17:54:20 +01:00
Silvano Cerza
d9a7e8011f
Add load arg to docker/bake-action before testing Docker images (#4124) 2023-02-10 17:41:27 +01:00
bogdankostic
27aaa92800
docs: Remove some classes regarding PromptNode from API reference docs (#4132) 2023-02-10 15:56:38 +01:00
Vladimir Blagojevic
d839b9314f
Update PromptTemplate tests (#4131) 2023-02-10 15:24:01 +01:00
bogdankostic
05950719ba
fix: Deduplicate same Documents in isolated evaluation of Reader (#4114)
* Deduplicate same Documents in one MultiLabel

* Add tests

* Update label

* Update label

* Update test

* Update test

* Revert change to check CI

* Revert reversion

* Use deepcopy

* Update tests
2023-02-10 13:55:14 +01:00
Agnieszka Marzec
3c793e4edc
Docs: Update docstrings (#4119)
* Update docstrings

* Blackify

* Bring back the template wording

* Blackify
2023-02-10 11:51:51 +01:00
Silvano Cerza
2cc938ff90
ci: Add workflow to label PRs that edit docstrings (#4115)
* Add workflow to label PRs that edit docstrings

* Add python-version arg in setup-python steps

* Run workflow only in haystack and rest_api python files edit

* Fix labeling job

* Fix labeling conditional

* Fix files globbing in docstrings_checksum.py

* Fix typing

* Rework workflow to use a single job
2023-02-09 18:57:30 +01:00
Silvano Cerza
0b23f84205
Exclude .github folder from triggering tests in CI (#4120) 2023-02-09 18:07:27 +01:00
Jack Butler
e6b6f70ae2
fix: Fix TableTextRetriever for input consisting of tables only (#4048)
* fix: update kwargs for TriAdaptiveModel

* fix: squeeze batch for TTR inference

* test: add test for ttr + dataframe case

* test: update and reorganise ttr tests

* refactor: make triadaptive model handle shapes

* refactor: remove duplicate reshaping

* refactor: rename test with duplicate name

* fix: add device assignment back to TTR

* fix: remove duplicated vars in test

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-09 11:38:16 +01:00
bogdankostic
986472c26f
feat: Add BM25 support for tables in InMemoryDocumentStore (#4090)
* Add BM25 support for tables in InMemoryDocumentStore

* Add table type to query method

* Fix import order

* Adapt tests
2023-02-09 10:47:35 +01:00
Mayank Jobanputra
93962c09fc
fix: fix torchaudio version (#4102)
* fix torchaudio version

* added comment for keeping torchaudio last

* removed torchaudio from base
2023-02-09 15:14:10 +05:30
oryx1729
8ecadd1cac
fix: query filters in REST API (#4105)
* Remove legacy _format_filters()

* Remove test case
2023-02-09 10:42:31 +01:00
Bijay Gurung
79f57d8460
Proposal: Add a JsonConverter node (#3959)
* Add Proposal: JsonConverter

* Add jsonl support + schema to JsonConverter Proposal

* Remove format option from JsonConverter Proposal

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-02-09 09:57:00 +01:00
hsm207
508d9f6b32
feat: add support for custom headers (#4040) 2023-02-09 07:08:40 +01:00
Silvano Cerza
adf4a3ea2f
Fix pylint CI check running with no files (#4097) 2023-02-08 16:33:07 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
Sebastian
1bbf10a376
Remove double batching in retrieve_batch (#4014)
* Removed double batching around embed_queries

* Add back tests for retrieve_batch for dpr and embedding retrievers

* Updated table-text-retriever to not double batch

* Fixing pylint

* Update to test

* Remove code breaking test

* Updating dev comment to be clearer
2023-02-08 14:39:20 +01:00
Silvano Cerza
c66f855caf
Add missing env vars in rest_api CI tests (#4098) 2023-02-08 12:48:20 +01:00
Sebastian
01d39df863
feat: Update allowed models to be used with Prompt Node (#4018)
* Update allowed models to be used with Prompt Node

* Added try except block around the config to skip over OpenAI models.

* Fixing tests

* Adding warning message

* Adding test for different HF models that could be used in prompt node
2023-02-08 12:47:52 +01:00
Agnieszka Marzec
8135e75139
Add shaper to api docs (#4083) 2023-02-08 12:15:08 +01:00
Stefano Fiorucci
5c009c2a1a
feat: OpenAI - warn users if max_tokens is too short (#4094)
* warn users if max_tokens is too short

* skip test if not API KEY

* add counters

* correctly run precommit
2023-02-08 10:39:40 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00