3803 Commits

Author SHA1 Message Date
Stefano Fiorucci
24405f851c
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors (#4113)
* refactoring and test

* try to replace error with warning

* more expressive and robust get_scores methods

* make get_scores methods internal
2023-02-14 17:43:11 +01:00
Silvano Cerza
d86a511cc1
Fix Docker images test on release (#4153) 2023-02-14 14:18:49 +01:00
bogdankostic
4a88fae1e7
Update annotation tool readme (#4123) 2023-02-14 09:53:27 +01:00
Sebastian
75ef959678
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038)
* added instruction_prompt and update defaults

* Change back max_tokens

* Code formatting

* Starting to update instruction_prompt to be a PromptTemplate

* Using PromptTemplate in OpenAIAnswerGenerator

* Removed hardcoded value

* pylint and make examples and examples_context optional prompt parameters

* Added new test for when prompt length goes past max token limit

* Improve doc strings.

* Make "text-davinci-003" the new default model

* Renaming variable to prompt_template and name to question-answering-with-examples

* Reduced repetitive code.

* Added some comments to explain key logic for future debuggers

* Update docs for max_tokens and increase defaul

* Updating variable name to prompt_template and docs.

* Updated test and handled Answer case where no documents are used.

* Slight update to docs.

* Adding more doc strings

* lg updates

* Blackify

---------

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-12 00:08:07 +01:00
Silvano Cerza
3cdfe9ca40
Revert changes introduced in PR #4124 (#4137) 2023-02-10 17:54:20 +01:00
Silvano Cerza
d9a7e8011f
Add load arg to docker/bake-action before testing Docker images (#4124) 2023-02-10 17:41:27 +01:00
bogdankostic
27aaa92800
docs: Remove some classes regarding PromptNode from API reference docs (#4132) 2023-02-10 15:56:38 +01:00
Vladimir Blagojevic
d839b9314f
Update PromptTemplate tests (#4131) 2023-02-10 15:24:01 +01:00
bogdankostic
05950719ba
fix: Deduplicate same Documents in isolated evaluation of Reader (#4114)
* Deduplicate same Documents in one MultiLabel

* Add tests

* Update label

* Update label

* Update test

* Update test

* Revert change to check CI

* Revert reversion

* Use deepcopy

* Update tests
2023-02-10 13:55:14 +01:00
Agnieszka Marzec
3c793e4edc
Docs: Update docstrings (#4119)
* Update docstrings

* Blackify

* Bring back the template wording

* Blackify
2023-02-10 11:51:51 +01:00
Silvano Cerza
2cc938ff90
ci: Add workflow to label PRs that edit docstrings (#4115)
* Add workflow to label PRs that edit docstrings

* Add python-version arg in setup-python steps

* Run workflow only in haystack and rest_api python files edit

* Fix labeling job

* Fix labeling conditional

* Fix files globbing in docstrings_checksum.py

* Fix typing

* Rework workflow to use a single job
2023-02-09 18:57:30 +01:00
Silvano Cerza
0b23f84205
Exclude .github folder from triggering tests in CI (#4120) 2023-02-09 18:07:27 +01:00
Jack Butler
e6b6f70ae2
fix: Fix TableTextRetriever for input consisting of tables only (#4048)
* fix: update kwargs for TriAdaptiveModel

* fix: squeeze batch for TTR inference

* test: add test for ttr + dataframe case

* test: update and reorganise ttr tests

* refactor: make triadaptive model handle shapes

* refactor: remove duplicate reshaping

* refactor: rename test with duplicate name

* fix: add device assignment back to TTR

* fix: remove duplicated vars in test

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-09 11:38:16 +01:00
bogdankostic
986472c26f
feat: Add BM25 support for tables in InMemoryDocumentStore (#4090)
* Add BM25 support for tables in InMemoryDocumentStore

* Add table type to query method

* Fix import order

* Adapt tests
2023-02-09 10:47:35 +01:00
Mayank Jobanputra
93962c09fc
fix: fix torchaudio version (#4102)
* fix torchaudio version

* added comment for keeping torchaudio last

* removed torchaudio from base
2023-02-09 15:14:10 +05:30
oryx1729
8ecadd1cac
fix: query filters in REST API (#4105)
* Remove legacy _format_filters()

* Remove test case
2023-02-09 10:42:31 +01:00
Bijay Gurung
79f57d8460
Proposal: Add a JsonConverter node (#3959)
* Add Proposal: JsonConverter

* Add jsonl support + schema to JsonConverter Proposal

* Remove format option from JsonConverter Proposal

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-02-09 09:57:00 +01:00
hsm207
508d9f6b32
feat: add support for custom headers (#4040) 2023-02-09 07:08:40 +01:00
Silvano Cerza
adf4a3ea2f
Fix pylint CI check running with no files (#4097) 2023-02-08 16:33:07 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
Sebastian
1bbf10a376
Remove double batching in retrieve_batch (#4014)
* Removed double batching around embed_queries

* Add back tests for retrieve_batch for dpr and embedding retrievers

* Updated table-text-retriever to not double batch

* Fixing pylint

* Update to test

* Remove code breaking test

* Updating dev comment to be clearer
2023-02-08 14:39:20 +01:00
Silvano Cerza
c66f855caf
Add missing env vars in rest_api CI tests (#4098) 2023-02-08 12:48:20 +01:00
Sebastian
01d39df863
feat: Update allowed models to be used with Prompt Node (#4018)
* Update allowed models to be used with Prompt Node

* Added try except block around the config to skip over OpenAI models.

* Fixing tests

* Adding warning message

* Adding test for different HF models that could be used in prompt node
2023-02-08 12:47:52 +01:00
Agnieszka Marzec
8135e75139
Add shaper to api docs (#4083) 2023-02-08 12:15:08 +01:00
Stefano Fiorucci
5c009c2a1a
feat: OpenAI - warn users if max_tokens is too short (#4094)
* warn users if max_tokens is too short

* skip test if not API KEY

* add counters

* correctly run precommit
2023-02-08 10:39:40 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
Silvano Cerza
5689c43e7e
ci: Make tests run conditionally in CI (#4086)
* Make tests run conditionally in CI

* Move rest_api test into separate workflow

* Avoid running tests.yml when rest_api is modified
2023-02-07 21:16:56 +01:00
Zoltan Fedor
a3016f065f
feat: Support multiple RayPipelines (#4078) 2023-02-07 11:01:07 +01:00
Silvano Cerza
3e4a2201df
ci: Change actionlint pre-commit hook to use Dockerized tool (#4060)
* Change actionlint pre-commit hook to use Dockerized tool

* Add ignore rule for actionlint

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-07 09:34:25 +01:00
Julian Risch
0e282e5ca4
refactor: replace mutable default arguments (#4070)
* refactor: replace mutable default arguments

* change type annotation in BasePreProcessor to Optional[List]
2023-02-07 09:30:33 +01:00
Vladimir Blagojevic
3273a2714d
fix: Add PromptTemplate __repr__ method (#4058)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-02-07 08:14:32 +01:00
Sebastian
a9f13d4641
feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever (#4026)
* Add additional options to pass to the SentenceTransformers trainer

* Make options accessible to the EmbeddingRetriever.train

* Update file-converters.yml

* Update transformers-img-to-text.yml

* Update 3550-csv-converter.md

* move type: ignore to correct line

* Moving type ignore again

* Fixing pylint and mypy

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/retriever/_embedding_encoder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Updated docstring to be less misleading.

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-07 08:05:21 +01:00
Silvano Cerza
bcf3bfdf79
Fix pylint workflow check running on tests files (#4076) 2023-02-06 19:41:36 +01:00
Julian Risch
51f30487e1
fix: add inner query for mysql compatibility (#4068) 2023-02-06 18:18:25 +01:00
Silvano Cerza
9cd94f3dc3
ci: Move formatting and linting checks out of tests.yml (#4046)
* Move formatting and linting checks out of tests.yml

* Revert "Move formatting and linting checks out of tests.yml"

This reverts commit b88b54b7e6404ce10401f308770348465e44b4fc.

* Move pylint and mypy out of tests.yml

* Fix black version

* Handle skipped but required checks
2023-02-06 16:47:48 +01:00
Zoltan Fedor
f4a30a552a
fix: use correct count of outgoing edges in RayPipeline (#4066) 2023-02-06 10:52:32 +01:00
Julian Risch
d819d6badf
proposal: Add Agents for extended LLM support (#3925)
* draft proposal

* add link to colab notebook (api keys required)

* Add alternative name ideas for MRKLAgent

* Breakdown of agent steps

* Added more sections

* Add even more sections

* simplify tool/action mentions, shorten

* agents as new abstraction instead of BaseComponent

* agent tools can be pipelines or nodes

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-02-06 09:47:10 +01:00
Massimiliano Pippi
5e65905659
fix workflow (#4055) 2023-02-06 08:40:13 +01:00
Stefano Fiorucci
b9ab7b3ca2
fix: make the crawler more robust on Windows (#4049)
* first try

* simplify the code a bit

* fix; better docstrings

* add URL
2023-02-03 16:43:18 +01:00
ZanSara
76db26f228
logging-format-interpolation (#3907) 2023-02-03 13:30:56 +01:00
Massimiliano Pippi
8824f3a10a
re-organize pydoc config files (#4042) 2023-02-03 12:51:10 +01:00
Jack Butler
f006eded7d
fix: allow Biadaptive & Triadaptive to work with EarlyStopping (#4033)
* fix: allow str when saving tri/bi-adaptive models

* fix: make trainer model loading class-agnostic

* test: add test for DPR with EarlyStopping

* refactor: simplify model reloading via classmethod

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-02-03 11:13:18 +01:00
Silvano Cerza
a092eac2c7
Add missing env var in PyPi release slack notification (#4052) 2023-02-03 11:03:01 +01:00
Silvano Cerza
6a9cb8651b
Fix pylint version to prevent crash (#4043) 2023-02-02 17:57:39 +01:00
Massimiliano Pippi
76bb105388
chore: remove unneeded files (#4036)
* remove unneeded files

* readme file should stay
2023-02-02 15:38:56 +01:00
tstadel
9611b64ec5
fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885)
* fix document retrieval metrics for all document_relevance_criteria

* fix tests

* fix eval_batch metrics

* small refactorings

* evaluate metrics on label level

* document retrieval tests added

* fix pylint

* fix test

* support file retrieval

* add comment about threshold

* rename test
2023-02-02 15:00:07 +01:00
Silvano Cerza
e62d24d0eb
ci: Add linting of workflow and related pre-commit hook (#4032)
* Add actionlint pre-commit hook

* Add workflow to lint workflows

* Remove unused input in Python Cache action

* Move from deprecated set-output syntax to new one

* Add actionlint config to specify self-hosted runners labels
2023-02-02 14:33:23 +01:00
Massimiliano Pippi
2878c57645
Update pyproject.toml (#4035) 2023-02-02 11:59:17 +01:00
Silvano Cerza
d79d39b28a
Bump act10ns/slack from v1 to v2 (#4031) 2023-02-02 09:39:36 +01:00
Silvano Cerza
938cb62144
Fix PyPi release workflow (#4029) 2023-02-02 09:36:23 +01:00