1204 Commits

Author SHA1 Message Date
Silvano Cerza
4a93517eb4
test: Fix deprecation fixture (#4219)
* Fix deprecation fixture

* Update docstring

* Update docstring

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-02-27 09:55:03 +01:00
ZanSara
13c4ff1b52
refactor: remove direct logging without a logger (#4253)
* remove direct logging without a logger

* add custom pylint checker

* add test

* pylint

* improve checker message

* mypy

* remove test

* add checker for basicConfig

* more logging missed

* ignore basicConfig

* move out logger

* move out statement

* remove logging configuration
2023-02-23 20:42:42 +01:00
Stefano Fiorucci
5e85f33bd3
refactor: Remove deprecated nodes EvalDocuments and EvalAnswers (#4194)
* remove deprecated classed and update test

* remove deprecated classed and update test

* remove unused code

* remove unused import

* remove empty evaluator node

* unused import :-)

* move sas to metrics
2023-02-23 15:26:17 +01:00
Massimiliano Pippi
722dead1b2
fix agents tests (#4237) 2023-02-23 13:03:45 +01:00
Massimiliano Pippi
764eaa035f
skip summarizer tests to reduce pressure (#4241) 2023-02-23 09:50:24 +01:00
ZanSara
f816efa50c
feat: reduce and focus telemetry (#4087)
* simplified telemetry and docker containers detection

* pylint

* mypy

* mypy

* Add new credentials and metadata

* remove prints

* mypy

* remove comment

* simplify inout len measurement

* black

* removed old telemetry, to revert

* reintroduce env function

* reintroduce old telemetry

* fix telemetry selection

* telemetry for promptnode

* telemetry for some training methods

* telemetry for eval and distillation

* mypy & pylint

* review

* Update lg

* mypy

* improve docstrings

* pylint

* mypy

* fix test

* linting

* remove old tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-22 19:02:47 +01:00
Daniel Bichuetti
e0b0fe1bc3
feat!: Increase Crawler standardization regarding Pipelines (#4122)
* feat!(Crawler): Integrate Crawler in the Pipeline.

+Output Documents
+Optional file saving
+Optional Document meta about file path

* refactor: add Optional decl.

* chore: dummy commit

* chore: dummy commit

* refactor: improve overwrite flow

* refactor: change custom file path meta logic + add test

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-22 17:34:19 +01:00
tstadel
32b2abf9d5
fix: add option to not override results by Shaper (#4231)
* add  option to shaper and support answers

* remove publish restrictions on outputs

* support list
2023-02-22 14:36:58 +01:00
Massimiliano Pippi
262c9771f4
relax test assertion (#4229) 2023-02-22 12:37:09 +01:00
Massimiliano Pippi
40f772a9b0
refact: move the first batch of unit tests into the proper job (#4216)
* move the first batch of unit tests into the proper job

* leftover
2023-02-21 17:00:02 +01:00
Julian Risch
5ce7a404ac
feat: Add Agent (#4148)
* initial Agent implementation

* mypy and pylint fixes

* add missing ABC import

* improved prompt template

* refactor and shorten run method

* refactor and shorten run method

* add tests for extracting

* fix mixed up tool_input/observation & make tests more robust

* fix bug with max_iterations and update prompt template

* allow setting prompt_template in Agent init

* remove example yml for agent

* add final prediction to transcript

* add transcript to errors and accept PromptTemplate in init

* simplify if else to elif

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* add checks for max_iter<2 and empty list returned by prompt node

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-02-21 14:27:40 +01:00
Sebastian
bde01cbf1f
Checking if output keys and output_values are same length and fix bug in storing output keys (#4223) 2023-02-21 13:36:15 +01:00
Sebastian
2bedb80ba5
Fix for custom template in OpenAIAnswerGenerator (#4220) 2023-02-21 13:35:17 +01:00
Bijay Gurung
d4b822646e
feat: Add JsonConverter node (#4130)
* Add JsonConverter node

* Update language

* JsonConverter: Remove id_hash_keys overwrite when it's None

Also, changes in docstring based on review

* Update docstring for JsonConverter

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-02-21 09:23:42 +01:00
bogdankostic
18e7b8399b
refactor: Remove id_hash_keys parameter in from_dict method (#4207)
* Remove id_hash_keys parameter in from_dict method

* Remove unused import

* Adapt `from_dict` of `SpeechDocument`

* Revert "Adapt `from_dict` of `SpeechDocument`"

This reverts commit 309cbeb7fbb3094c43be76d9e431db9391913144.

* Adapt `from_dict` of `SpeechDocument`
2023-02-20 17:37:35 +01:00
tstadel
14578aa54f
feat: add top_k to PromptNode (#4159)
* add top_k to PromptNode

* fix OpenAI

* fix openai test
2023-02-20 14:51:45 +01:00
Sebastian
d129598203
Prompt node/run batch (#4072)
* Starting to implement first pass at run_batch

* Started to add _flatten_input function

* First pass at run_batch method.

* Fixed bug

* Adding tests for run_batch

* Update doc strings

* Pylint and mypy

* Pylint

* Fixing mypy

* Restructurig of run_batch tests

* Add minor lg updates

* Adding more tests

* Update dev comments and call static method differently

* Fixed the setting of output variable

* Set output_variable in __init__ of PromptNode

* Make a one-liner

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-20 11:58:13 +01:00
Massimiliano Pippi
83d615a32b
feat: include testing facilities into haystack package (#4182) 2023-02-17 19:38:03 +01:00
bogdankostic
7eeb3e07bf
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850)
* Add IVF and Product Quantization support for OpenSearchDocumentStore

* Remove unused import statement

* Fix mypy

* Adapt doc strings and error messages to account for PQ

* Adapt validation of indices

* Adapt existing tests

* Fix pylint

* Add tests

* Update lg

* Adapt based on PR review comments

* Fix Pylint

* Adapt based on PR review

* Add request_timeout

* Adapt based on PR review

* Adapt based on PR review

* Adapt tests

* Pin tenacity

* Unpin tenacity

* Adapt based on PR comments

* Add match to tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-17 10:28:36 +01:00
Daniel Bichuetti
5187cc1801
refactor: Remove the pin from the espnet module and fix the audio node tests. (#4128)
* fix: fix audio tests + unbound some dependencies

* fix: update for Python 3.8

* refactor: change numpy assertion

* feat: add voice recog. support on audio tests

* fix: fix var assignement

* chore: dummy commit

* fix: fix sndfile error

* refactor: change skip reason

* refactor: hardcode variable

* refactor: unpin numpy

* fix: pin numpy only for audio
2023-02-16 22:12:17 +05:30
Massimiliano Pippi
ec72dd73fc
refactor: complete the document stores test refactoring (#4125)
* add e2e tests

* move tests to their own module

* add e2e workflow

* pylint

* remove from job

* fix index field name

* skip test on sql

* removed unused code

* fix embedding tests

* adjust test for pinecone

* adjust assertions to the new documents

* bad copypasta

* test

* fix tests

* fix tests

* fix test

* fix tests

* pylint

* update milvus version

* remove debug

* move graphdb tests under e2e
2023-02-16 09:43:25 +01:00
Sebastian
9a26942952
feat: Add model_kwargs option to PromptNode (#4151)
* Add input option to PromptNode to allow the passing of default kwargs

* Add yaml test for model_kwargs parameter
2023-02-15 18:46:26 +01:00
Stefano Fiorucci
24405f851c
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors (#4113)
* refactoring and test

* try to replace error with warning

* more expressive and robust get_scores methods

* make get_scores methods internal
2023-02-14 17:43:11 +01:00
Sebastian
75ef959678
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038)
* added instruction_prompt and update defaults

* Change back max_tokens

* Code formatting

* Starting to update instruction_prompt to be a PromptTemplate

* Using PromptTemplate in OpenAIAnswerGenerator

* Removed hardcoded value

* pylint and make examples and examples_context optional prompt parameters

* Added new test for when prompt length goes past max token limit

* Improve doc strings.

* Make "text-davinci-003" the new default model

* Renaming variable to prompt_template and name to question-answering-with-examples

* Reduced repetitive code.

* Added some comments to explain key logic for future debuggers

* Update docs for max_tokens and increase defaul

* Updating variable name to prompt_template and docs.

* Updated test and handled Answer case where no documents are used.

* Slight update to docs.

* Adding more doc strings

* lg updates

* Blackify

---------

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-12 00:08:07 +01:00
Vladimir Blagojevic
d839b9314f
Update PromptTemplate tests (#4131) 2023-02-10 15:24:01 +01:00
bogdankostic
05950719ba
fix: Deduplicate same Documents in isolated evaluation of Reader (#4114)
* Deduplicate same Documents in one MultiLabel

* Add tests

* Update label

* Update label

* Update test

* Update test

* Revert change to check CI

* Revert reversion

* Use deepcopy

* Update tests
2023-02-10 13:55:14 +01:00
Jack Butler
e6b6f70ae2
fix: Fix TableTextRetriever for input consisting of tables only (#4048)
* fix: update kwargs for TriAdaptiveModel

* fix: squeeze batch for TTR inference

* test: add test for ttr + dataframe case

* test: update and reorganise ttr tests

* refactor: make triadaptive model handle shapes

* refactor: remove duplicate reshaping

* refactor: rename test with duplicate name

* fix: add device assignment back to TTR

* fix: remove duplicated vars in test

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-02-09 11:38:16 +01:00
bogdankostic
986472c26f
feat: Add BM25 support for tables in InMemoryDocumentStore (#4090)
* Add BM25 support for tables in InMemoryDocumentStore

* Add table type to query method

* Fix import order

* Adapt tests
2023-02-09 10:47:35 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
Sebastian
1bbf10a376
Remove double batching in retrieve_batch (#4014)
* Removed double batching around embed_queries

* Add back tests for retrieve_batch for dpr and embedding retrievers

* Updated table-text-retriever to not double batch

* Fixing pylint

* Update to test

* Remove code breaking test

* Updating dev comment to be clearer
2023-02-08 14:39:20 +01:00
Sebastian
01d39df863
feat: Update allowed models to be used with Prompt Node (#4018)
* Update allowed models to be used with Prompt Node

* Added try except block around the config to skip over OpenAI models.

* Fixing tests

* Adding warning message

* Adding test for different HF models that could be used in prompt node
2023-02-08 12:47:52 +01:00
Stefano Fiorucci
5c009c2a1a
feat: OpenAI - warn users if max_tokens is too short (#4094)
* warn users if max_tokens is too short

* skip test if not API KEY

* add counters

* correctly run precommit
2023-02-08 10:39:40 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
Vladimir Blagojevic
3273a2714d
fix: Add PromptTemplate __repr__ method (#4058)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-02-07 08:14:32 +01:00
Jack Butler
f006eded7d
fix: allow Biadaptive & Triadaptive to work with EarlyStopping (#4033)
* fix: allow str when saving tri/bi-adaptive models

* fix: make trainer model loading class-agnostic

* test: add test for DPR with EarlyStopping

* refactor: simplify model reloading via classmethod

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-02-03 11:13:18 +01:00
tstadel
9611b64ec5
fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885)
* fix document retrieval metrics for all document_relevance_criteria

* fix tests

* fix eval_batch metrics

* small refactorings

* evaluate metrics on label level

* document retrieval tests added

* fix pylint

* fix test

* support file retrieval

* add comment about threshold

* rename test
2023-02-02 15:00:07 +01:00
ZanSara
9009a9ae58
feat: add Shaper (#3880)
* Shaper initial version

* Inital pydoc

* Add more unit tests

* Fix pydoc, expand Shaper pydoc with YAML example

* Minor fix

* Improve pydoc

* More unit tests with prompt node

* Describe Shaper functions in pydoc

* More pydoc

* Use pytest.raises instead of catching errors

* Improve test_function_invocation_order unit test

* pylint fixes

* Improve run_batch handling

* simpler version, initial stub

* stubbing tests

* promptnode compatibility

* add tests

* simplify

* fix promptnode tests

* pylint

* mypy

* fix corner case & mypy

* mypy

* review feedback

* tests

* Add lg updates

* add rename

* pylint

* Add complex unit test with two PNs and ICMs in between (#3921)

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* docstring

* fix tests

* add join_lists

* add documents_to_strings

* fix tests

* allow lists of input values

* doc review feedback

* do not use locals()

* Update with minor lg changes

* fix corner case in ICM

* fix merge

* review feedback

* answers conversions

* mypy

* add tests

* generative answers

* forgot to commit

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-01 18:36:13 +01:00
Zoltan Fedor
2b1849f525
fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 (#3898)
* fix: Add a verbose option to PromptNode to let users understand the prompts being used #2

* Add comments and refactoring todo note

* Fix logging-fstring-interpolation pylint

* Update haystack/nodes/prompt/prompt_node.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-31 09:33:47 +01:00
bogdankostic
1a8fe0031d
feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore (#3969)
* Add `use_prefiltering` parameter

* Adapt doc string

* Pass use_prefiltering via API to dC

* Adapt doc string

* Adapt test
2023-01-30 15:12:34 +01:00
Daniel Bichuetti
3009ac2988
feat: Add page range support to PDF converters. (#3965)
* feat: add start and eng page to PDF converters

* docs: add missing docstrings

* refactor: change list set up, add docstrings and comment

* fix: add missing parameter

* tests: add page range basic test

* tests: test correct page numbers

* tests: remove OCR page range test
*Poppler and Tesseract not installed on CI

* fix: remove mobile change error
2023-01-30 14:09:22 +01:00
Sebastian
71de0524de
fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number (#3980)
* Fix the embedding count function of InMemoryDocumentStore

* Adding some doc strings explaining how many docs with embeddings to expect.
2023-01-30 12:38:30 +01:00
hsm207
08ec059b14
refactor: use weaviate client to build BM25 query (#3939)
* refactor: use weaviate client to build BM25 query

* refactor: remove manual BM25 query building

* refactor: apply BM25 to the content_field only

* test: update weaviate BM25 retrieval test case

update to account for lack of stemming

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-30 10:07:07 +01:00
Tuana Celik
93312138de
fix: removing code block in MarkdownConverter (#3960)
* first attempt to add frontmatter of markdown to the metadata

* remove bug fix

* running black and pre-commit

* moving the import line

* adding a test

* adding pydoc

* fix to removing code blocks in markdown converter

* adding a test

* fixing a test

* improving tests

* adding language to code block
2023-01-27 15:25:54 +01:00
Tuana Celik
790e9acd3e
feat: add frontmatter to meta in MarkdownConverter (#3953)
* first attempt to add frontmatter of markdown to the metadata

* remove bug fix

* running black and pre-commit

* moving the import line

* adding a test

* adding pydoc
2023-01-26 17:15:02 +01:00
Massimiliano Pippi
52b195faf6
increase the timeout for testing (#3957) 2023-01-26 16:04:43 +01:00
Vladimir Blagojevic
ec85207cf7
Remove __eq__ and __hash__ from PromptNode (#3923) 2023-01-26 13:38:35 +01:00
Vladimir Blagojevic
b945eaeabd
PromptNode: expose output_variable, adjust unit tests (#3892) 2023-01-26 11:01:11 +01:00
ZanSara
0e471d5e5a
fix: change model in distillation test (#3944)
* change model

* change layer count

* move promptnode tests in integration

* fix marker
2023-01-25 23:32:11 +05:30
Mayank Jobanputra
5c53b2bd4a
feat: adding secure loading of models by default for haystack (#3901)
* adding secure loading of models by default

* simplified set function

* testing import effect correctly

* added appropriate log line, adapted the test

* change log string formatting

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* remove extra closing bracket )

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-24 23:01:20 +05:30
Vladimir Blagojevic
4d8b1d0b22
refactor: Improve stop_words handling, add unit test cases (#3918)
* Improve stop_words handling, add unit test cases

* Update test/nodes/test_prompt_node.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-24 12:52:41 +01:00