2358 Commits

Author SHA1 Message Date
Agnieszka Marzec
e16f1c8935
Docs: Add filter to hide entity post processor (#4160)
* Add filter to hide entity post processor

* Add missing space
2023-02-16 16:40:42 +01:00
Silvano Cerza
689f2cd250
Update docstring-labeler.yml workflow to safely run in PRs from forks (#4146) 2023-02-16 16:02:41 +01:00
Mayank Jobanputra
d27f372b67
build: cache nltk models into the docker image (#4118)
* separated nltk cache

* separated nltk caching

* fixed pylint lazy log error

* using model name as default value
2023-02-16 16:56:16 +05:30
Massimiliano Pippi
ec72dd73fc
refactor: complete the document stores test refactoring (#4125)
* add e2e tests

* move tests to their own module

* add e2e workflow

* pylint

* remove from job

* fix index field name

* skip test on sql

* removed unused code

* fix embedding tests

* adjust test for pinecone

* adjust assertions to the new documents

* bad copypasta

* test

* fix tests

* fix tests

* fix test

* fix tests

* pylint

* update milvus version

* remove debug

* move graphdb tests under e2e
2023-02-16 09:43:25 +01:00
Sebastian
9a26942952
feat: Add model_kwargs option to PromptNode (#4151)
* Add input option to PromptNode to allow the passing of default kwargs

* Add yaml test for model_kwargs parameter
2023-02-15 18:46:26 +01:00
Stefano Fiorucci
24405f851c
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors (#4113)
* refactoring and test

* try to replace error with warning

* more expressive and robust get_scores methods

* make get_scores methods internal
2023-02-14 17:43:11 +01:00
Silvano Cerza
d86a511cc1
Fix Docker images test on release (#4153) 2023-02-14 14:18:49 +01:00
bogdankostic
4a88fae1e7
Update annotation tool readme (#4123) 2023-02-14 09:53:27 +01:00
Sebastian
75ef959678
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038)
* added instruction_prompt and update defaults

* Change back max_tokens

* Code formatting

* Starting to update instruction_prompt to be a PromptTemplate

* Using PromptTemplate in OpenAIAnswerGenerator

* Removed hardcoded value

* pylint and make examples and examples_context optional prompt parameters

* Added new test for when prompt length goes past max token limit

* Improve doc strings.

* Make "text-davinci-003" the new default model

* Renaming variable to prompt_template and name to question-answering-with-examples

* Reduced repetitive code.

* Added some comments to explain key logic for future debuggers

* Update docs for max_tokens and increase defaul

* Updating variable name to prompt_template and docs.

* Updated test and handled Answer case where no documents are used.

* Slight update to docs.

* Adding more doc strings

* lg updates

* Blackify

---------

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-12 00:08:07 +01:00
Silvano Cerza
3cdfe9ca40
Revert changes introduced in PR #4124 (#4137) 2023-02-10 17:54:20 +01:00
Silvano Cerza
d9a7e8011f
Add load arg to docker/bake-action before testing Docker images (#4124) 2023-02-10 17:41:27 +01:00
bogdankostic
27aaa92800
docs: Remove some classes regarding PromptNode from API reference docs (#4132) 2023-02-10 15:56:38 +01:00
Vladimir Blagojevic
d839b9314f
Update PromptTemplate tests (#4131) 2023-02-10 15:24:01 +01:00
bogdankostic
05950719ba
fix: Deduplicate same Documents in isolated evaluation of Reader (#4114)
* Deduplicate same Documents in one MultiLabel

* Add tests

* Update label

* Update label

* Update test

* Update test

* Revert change to check CI

* Revert reversion

* Use deepcopy

* Update tests
2023-02-10 13:55:14 +01:00
Agnieszka Marzec
3c793e4edc
Docs: Update docstrings (#4119)
* Update docstrings

* Blackify

* Bring back the template wording

* Blackify
2023-02-10 11:51:51 +01:00
Silvano Cerza
2cc938ff90
ci: Add workflow to label PRs that edit docstrings (#4115)
* Add workflow to label PRs that edit docstrings

* Add python-version arg in setup-python steps

* Run workflow only in haystack and rest_api python files edit

* Fix labeling job

* Fix labeling conditional

* Fix files globbing in docstrings_checksum.py

* Fix typing

* Rework workflow to use a single job
2023-02-09 18:57:30 +01:00
Silvano Cerza
0b23f84205
Exclude .github folder from triggering tests in CI (#4120) 2023-02-09 18:07:27 +01:00
Jack Butler
e6b6f70ae2
fix: Fix TableTextRetriever for input consisting of tables only (#4048)
* fix: update kwargs for TriAdaptiveModel

* fix: squeeze batch for TTR inference

* test: add test for ttr + dataframe case

* test: update and reorganise ttr tests

* refactor: make triadaptive model handle shapes

* refactor: remove duplicate reshaping

* refactor: rename test with duplicate name

* fix: add device assignment back to TTR

* fix: remove duplicated vars in test

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-09 11:38:16 +01:00
bogdankostic
986472c26f
feat: Add BM25 support for tables in InMemoryDocumentStore (#4090)
* Add BM25 support for tables in InMemoryDocumentStore

* Add table type to query method

* Fix import order

* Adapt tests
2023-02-09 10:47:35 +01:00
Mayank Jobanputra
93962c09fc
fix: fix torchaudio version (#4102)
* fix torchaudio version

* added comment for keeping torchaudio last

* removed torchaudio from base
2023-02-09 15:14:10 +05:30
oryx1729
8ecadd1cac
fix: query filters in REST API (#4105)
* Remove legacy _format_filters()

* Remove test case
2023-02-09 10:42:31 +01:00
Bijay Gurung
79f57d8460
Proposal: Add a JsonConverter node (#3959)
* Add Proposal: JsonConverter

* Add jsonl support + schema to JsonConverter Proposal

* Remove format option from JsonConverter Proposal

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-02-09 09:57:00 +01:00
hsm207
508d9f6b32
feat: add support for custom headers (#4040) 2023-02-09 07:08:40 +01:00
Silvano Cerza
adf4a3ea2f
Fix pylint CI check running with no files (#4097) 2023-02-08 16:33:07 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
Sebastian
1bbf10a376
Remove double batching in retrieve_batch (#4014)
* Removed double batching around embed_queries

* Add back tests for retrieve_batch for dpr and embedding retrievers

* Updated table-text-retriever to not double batch

* Fixing pylint

* Update to test

* Remove code breaking test

* Updating dev comment to be clearer
2023-02-08 14:39:20 +01:00
Silvano Cerza
c66f855caf
Add missing env vars in rest_api CI tests (#4098) 2023-02-08 12:48:20 +01:00
Sebastian
01d39df863
feat: Update allowed models to be used with Prompt Node (#4018)
* Update allowed models to be used with Prompt Node

* Added try except block around the config to skip over OpenAI models.

* Fixing tests

* Adding warning message

* Adding test for different HF models that could be used in prompt node
2023-02-08 12:47:52 +01:00
Agnieszka Marzec
8135e75139
Add shaper to api docs (#4083) 2023-02-08 12:15:08 +01:00
Stefano Fiorucci
5c009c2a1a
feat: OpenAI - warn users if max_tokens is too short (#4094)
* warn users if max_tokens is too short

* skip test if not API KEY

* add counters

* correctly run precommit
2023-02-08 10:39:40 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
Silvano Cerza
5689c43e7e
ci: Make tests run conditionally in CI (#4086)
* Make tests run conditionally in CI

* Move rest_api test into separate workflow

* Avoid running tests.yml when rest_api is modified
2023-02-07 21:16:56 +01:00
Zoltan Fedor
a3016f065f
feat: Support multiple RayPipelines (#4078) 2023-02-07 11:01:07 +01:00
Silvano Cerza
3e4a2201df
ci: Change actionlint pre-commit hook to use Dockerized tool (#4060)
* Change actionlint pre-commit hook to use Dockerized tool

* Add ignore rule for actionlint

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-07 09:34:25 +01:00
Julian Risch
0e282e5ca4
refactor: replace mutable default arguments (#4070)
* refactor: replace mutable default arguments

* change type annotation in BasePreProcessor to Optional[List]
2023-02-07 09:30:33 +01:00
Vladimir Blagojevic
3273a2714d
fix: Add PromptTemplate __repr__ method (#4058)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-02-07 08:14:32 +01:00
Sebastian
a9f13d4641
feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever (#4026)
* Add additional options to pass to the SentenceTransformers trainer

* Make options accessible to the EmbeddingRetriever.train

* Update file-converters.yml

* Update transformers-img-to-text.yml

* Update 3550-csv-converter.md

* move type: ignore to correct line

* Moving type ignore again

* Fixing pylint and mypy

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Updated docstring to be less misleading.

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-07 08:05:21 +01:00
Silvano Cerza
bcf3bfdf79
Fix pylint workflow check running on tests files (#4076) 2023-02-06 19:41:36 +01:00
Julian Risch
51f30487e1
fix: add inner query for mysql compatibility (#4068) 2023-02-06 18:18:25 +01:00
Silvano Cerza
9cd94f3dc3
ci: Move formatting and linting checks out of tests.yml (#4046)
* Move formatting and linting checks out of tests.yml

* Revert "Move formatting and linting checks out of tests.yml"

This reverts commit b88b54b7e6404ce10401f308770348465e44b4fc.

* Move pylint and mypy out of tests.yml

* Fix black version

* Handle skipped but required checks
2023-02-06 16:47:48 +01:00
Zoltan Fedor
f4a30a552a
fix: use correct count of outgoing edges in RayPipeline (#4066) 2023-02-06 10:52:32 +01:00
Julian Risch
d819d6badf
proposal: Add Agents for extended LLM support (#3925)
* draft proposal

* add link to colab notebook (api keys required)

* Add alternative name ideas for MRKLAgent

* Breakdown of agent steps

* Added more sections

* Add even more sections

* simplify tool/action mentions, shorten

* agents as new abstraction instead of BaseComponent

* agent tools can be pipelines or nodes

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-02-06 09:47:10 +01:00
Massimiliano Pippi
5e65905659
fix workflow (#4055) 2023-02-06 08:40:13 +01:00
Stefano Fiorucci
b9ab7b3ca2
fix: make the crawler more robust on Windows (#4049)
* first try

* simplify the code a bit

* fix; better docstrings

* add URL
2023-02-03 16:43:18 +01:00
ZanSara
76db26f228
logging-format-interpolation (#3907) 2023-02-03 13:30:56 +01:00
Massimiliano Pippi
8824f3a10a
re-organize pydoc config files (#4042) 2023-02-03 12:51:10 +01:00
Jack Butler
f006eded7d
fix: allow Biadaptive & Triadaptive to work with EarlyStopping (#4033)
* fix: allow str when saving tri/bi-adaptive models

* fix: make trainer model loading class-agnostic

* test: add test for DPR with EarlyStopping

* refactor: simplify model reloading via classmethod

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-02-03 11:13:18 +01:00
Silvano Cerza
a092eac2c7
Add missing env var in PyPi release slack notification (#4052) 2023-02-03 11:03:01 +01:00
Silvano Cerza
6a9cb8651b
Fix pylint version to prevent crash (#4043) 2023-02-02 17:57:39 +01:00
Massimiliano Pippi
76bb105388
chore: remove unneeded files (#4036)
* remove unneeded files

* readme file should stay
2023-02-02 15:38:56 +01:00