1524 Commits

Author SHA1 Message Date
Vladimir Blagojevic
4c9843017c
feat: Add agent memory (#4829) 2023-05-15 18:08:44 +02:00
Ben Heckmann
099d0deb86
fix: Dynamic max_answers for SquadProcessor (fixes IndexError when max_answers is less than the number of answers in the dataset) (#4817)
* #4320 implemented dynamic max_answers for SquadProcessor, fixed IndexError when max_answers is less than the number of answers in the dataset

* #4320 added two unit tests for dataset_from_dicts testing default and manual max_answers

* apply suggestions from code review

Co-authored-by: bogdankostic <bogdankostic@web.de>

* simplify comment, fix mypy & pylint errors, fix old test

* adjust max_answers to each dataset individually

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-05-15 14:34:23 +02:00
ZanSara
8fbfca9ebb
fix: Document v2 JSON serialization (#4863)
* fix json serialization

* add missing markers

* pylint

* fix decoder bug

* pylint

* add some more tests

* linting & windows

* windows

* windows

* windows paths again
2023-05-15 11:39:04 +02:00
ZanSara
bffe2d8c19
add base test class (#4908) 2023-05-15 10:36:55 +02:00
Farzad E
6eb251d1f0
fix: Support for gpt-4-32k (#4825)
* Add step to loook up tokenizers by prefix in openai_utils

* Updated tiktoken min version + openai_utils test

* Added test case for GPT-4 and Azure model naming

* Broken down tests

* Added default case

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-05-12 19:02:12 +02:00
Vladimir Blagojevic
73380b194a
feat: Add Cohere PromptNode invocation layer (#4827)
* Add CohereInvocationLayer
---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-05-12 17:50:09 +02:00
ZanSara
618699eb52
fix: improve Document comparison (v2) (#4860)
* don't compare on content directly, use id as proxy

* stray change

* add more tests

* fix tests

* pylint

* black

* review feedback

* fix tests
2023-05-11 18:28:56 +02:00
Silvano Cerza
98947e4c3c
feat: Add Anthropic invocation layer (#4818)
* feat: Add Anthropic Claude Invocation Layer

* feat: Add AnthropicClaude Invocation Layer

* fix: Permission changes

* fix: Permission changes

* Move anthropic utils in anthropic invocation layer file

* Rework method to post data

* Simplify invoke

* Simplify supports classmethod

* Remove unnecessary functions

* Use always same tokenizer

* Add module import

* Rename some members and kwargs

* Add tests

* Fix _post not handling HTTPError

* Fix handling of streamed response

* Fix kwargs handling

* Update tests

* Update supports to be generic

* Fix failing test

* Use correct tokenizer and fix tests

* Update lg

* Fix mypy issue

* Move requests-cache from dev to base dependencies

* Fix failing test

* Handle all stop words use cases

---------

Co-authored-by: recrudesce <recrudesce@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-05-11 10:14:33 +02:00
ZanSara
3a6db68408
feat: allow filtering documents on all fields (v2) (#4773)
* extend tests

* remove stray test

* pylint

* mypy

* review feedback

* fix tests

* fix last tests

* remove comment

* remove print statement

* pylint

* add flatten test

* remove direct acces/ direct write in docstore tests

* fix tests
2023-05-10 16:33:47 +02:00
Sebastian
eff420cce0
test: Update unit tests for schema (#4835)
* Updated text_label tests to match tabel_label tests. Also added answer text as part of the Answer.__eq__ comparison.

* Updated text document unit tests to match ones from table docs

* Converting text answer unit tests to match table answer

* Update some document tests

* Minor update

* Separating unit tests
2023-05-10 16:16:45 +02:00
ZanSara
9cb153d0f4
fix: add unit markers to several v2 tests (#4851)
* add markers

* remove stray marker
2023-05-10 13:46:13 +02:00
Silvano Cerza
f12e5a0127
fix: Fix missing error in openai_request retry strategy (#4802)
* Fix missing error in openai_request retry strategy

* Correctly handle OpenAIUnauthorizedError

Co-authored-by: bogdankostic <bogdankostic@web.de>

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-05-10 10:31:07 +02:00
ZanSara
c734c58b4b
skip flaky test (#4846) 2023-05-09 20:26:59 +02:00
Sebastian
707f1c3546
Add modeling to unit tests so it we can get coverage for that (#4809)
* Add modeling to unit tests so it we can get coverage for that

* fix unit tests

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-08 19:05:21 +02:00
bogdankostic
5b2ef2afd6
Revert "refactor!: Deprecate name param in PromptTemplate and introduce template_name instead (#4810)" (#4834)
This reverts commit f660f41c0615e6b3064ef3e321f1e5a295fafc1b.
2023-05-08 11:31:04 +02:00
ZanSara
6e982e9283
fix: preserve root_node in JoinNode's output (#4820)
* preserve root_node and add tests

* Added if statement to fix failing tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2023-05-08 10:17:36 +02:00
bogdankostic
f660f41c06
refactor!: Deprecate name param in PromptTemplate and introduce template_name instead (#4810)
* Deprecate name parameter

* Adapt existing tests and uses of PromptTemplate

* Move parameter `name` to end

* Adapt existing tests

* lg update

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-08 10:12:29 +02:00
Silvano Cerza
705a2c025f
Update preview Pipelines following Canals changes (#4821) 2023-05-05 19:47:32 +02:00
bogdankostic
43509c88bf
fix: Add support for _split_overlap meta to Pinecone and dict metadata in general to Weaviate (#4805)
* Add support for dicts to Weaviate

* Add support for _split_overlap to Pinecone

* Add tests

* Fix Pylint

* Fix Pylint

* Fix test

* Implement PR feedback
2023-05-05 11:20:21 +02:00
Vladimir Blagojevic
8091ced8d5
refactor: Extract ToolsManager, add it to Agent by composition (#4794)
* Extract ToolsManager, add it to Agent by the composition
* PR feedback Massi
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-03 16:45:40 +02:00
Sebastian
a67ca289db
refactor: Update schema objects to handle Dataframes in to_{dict,json} and from_{dict,json} (#4747)
* Adding support for table Documents when serializing Labels in Haystack

* Fix table label equality test

* Add serialization support and __eq__ support for table answers

* Made convenience functions for converting dataframes. Added some TODOs. Epxanded schema tests for table labels. Updated Multilabel to not convert Dataframes into strings.

* get Answer and Label to_json working with DataFrame

* Fix from_dict method of Label

* Use Dict and remove unneccessary if check

* Using pydantic instead of builtins for type detection

* Update haystack/schema.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/schema.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/schema.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Separated table label equivalency tests and added pytest.mark.unit


* Added unit test for _dict_factory

* Using more descriptive variable names

* Adding json files to test to_json and from_json functions

* Added sample files for tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-03 09:42:07 +02:00
ZanSara
a9ec954c45
bug: fix filtering in MemoryDocumentStore (v2) (#4768)
* fix filtering bug

* pylint

* improve asserts
2023-05-03 09:33:12 +02:00
Pouyan
75ff768c21
Pouyanpi/feat/search engine/providers/google api (#4722)
* feat: implement google api search engine provider

Signed-off-by: Pouyan <prezakhanipr@gmail.com>

---------

Signed-off-by: Pouyan <prezakhanipr@gmail.com>
2023-05-02 17:09:17 +02:00
duffn
479092e3c1
bug: (rest_api) remove full logging of overwritten env variables (#4791)
* bug: (rest_api) remove logging of overwritten env variables

* Update haystack/pipelines/config.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-02 16:48:19 +02:00
Vladimir Blagojevic
1e9f4c1d50
feat: Add HF local runtime token streaming support (#4652)
* Add HF local runtime token streaming support

* Add stream and stream_handler as model kwargs

* Improve HF streaming unit tests
2023-05-02 12:50:20 +02:00
Mayank Jobanputra
dcf3ddddff
Added deprecation tests for seq2seq generator and RAG Generator (#4782) 2023-05-02 13:30:22 +05:30
Mayank Jobanputra
896eb6a2ea
chore: fixed reader loading test for hf-hub starting 0.14.0 (#4607)
* fixed test base for hub 0.13.3

* check if test succeed from branch

* 2nd check if test succeed from branch

* removed dependency changes

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-02 08:22:44 +02:00
ZanSara
b60d9a2cbf
test: move several modeling tests in e2e/ (#4308)
* no dpr test seems worth mocking

* move distillation tests

* pylint

* mypy

* pylint

* move feature_extraction tests as well

* move feature_extraction tests as well

* merge feature extractor suites

* get_language_model tests and adaptive model tests

* duplicate test

* moving fixtures

* mypy

* mypy-again

* trigger

* un-mock integration test

* review feedback

* feedback

* pylint
2023-04-28 17:08:41 +02:00
Vladimir Blagojevic
dcaf3002f1
fix: SentenceTransformersRanker's predict_batch returns wrong number of documents (#4756)
* Fix SentenceTransformersRanker spredict_batch returning wrong number of documents

* Julian's feedback
2023-04-27 15:24:39 +02:00
Vladimir Blagojevic
c9a415ec8d
refactor: Make agent test more robust (#4767)
* Add more examplars to lower test failure rate

* Easier agent run test, more robust, consistently passing
2023-04-27 14:53:15 +02:00
Vladimir Blagojevic
aebc22d27e
Upgrade transformers to 4.28.1 (#4665)
* Upgrade to transformers 4.28.1

* Commenting out failing piece of test

* trailing-whitespace

* Adjust regex for error match - it changed between releases

* Remove RAG tests failing with transformers update
2023-04-27 12:55:21 +02:00
bogdankostic
c7a20d68d2
fix: Add separate query method for OpenSearchDocumentStore (#4764)
* Add separate query method for OpenSearchDocumentStore

* Convert integration test to unit test + add separate tests for OpenSearch
2023-04-26 21:58:33 +02:00
Vladimir Blagojevic
41b6e33f64
Enhance the error logging in PromptTemplate variable resolution (#4730)
* Enhance the error logging in PromptTemplate variable resolution

* Revert change Daria made

* Silvano PR feedback
2023-04-26 18:09:20 +02:00
tstadel
9cbe9e0949
fix: recursion of death while loading PromptTemplate from yaml (#4691)
* fix recursion of death when deserializing prompttemplate

* add test

* set api_key

* fix test

* add generic test

* work in feedback on tests

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-04-26 13:56:51 +02:00
s_teja
d033a086d0
fix: loads local HF Models in PromptNode pipeline (#4670)
* bug: fix load local HF Models in PromptNode pipeline

* Update hugging_face.py

remove duplicate validator

* update: black formatted

* update: update doc string, replace pop with get

* test HFLocalInvocationLayer with local model
2023-04-26 13:10:02 +02:00
ZanSara
1b57b96210
refactor!: extract elasticsearch (#4668)
* extract elasticsearch

* update pyproject.toml

* make more import optional

* move MockBaseRetriever in conftest

* install es in the es integration tests
2023-04-26 10:14:20 +02:00
Sebastian
8d9136bad4
feat: Implementation of Table Cell Proposal (#4616)
* Starting adding support for TableCell

* Update tests to use row and col

* Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables.

* Update eval test to use TableCell

* Added more schema tests for table docs, labels and answers.

* Add boolean to toggle between Span and TableCell

* Add deprecation message

* Test that table answers work as responses in the rest API

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-19 13:14:49 +02:00
Silvano Cerza
f13cc751c3
Block requests_cache in unit tests (#4696) 2023-04-18 16:15:26 +02:00
Massimiliano Pippi
0c081f19e2
fix: remove warnings from the more recent Elasticsearch client (#4602)
* clean up the ES instance in a more robust way

* do not sleep, refresh the index instead

* remove client warnings

* fix unit tests

* fix opensearch compatibility

* fix unit tests

* update ES version

* bump elasticsearch-py

* adjust docs

* use recreate_index param

* use same fixture strategy for Opensearch

* Update lg

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-18 15:40:17 +02:00
Sebastian
8c4176bdb2
feat: More flexible routing for RouteDocuments node (#4690)
* Added warning messages for documents that are skipped by RouteDocuments. Begun adding support for new option return_remaining and List of List support for metadata value splitting.

* Simplify _split_by_content_type

* Added new unit test and updated _calculate_outgoing_edges

* Added some TODOs and turned assert into raising an error.

* Update logging messages and make new fixture in tests

* Update _split_by_metadata_values to work with return_remaining

* Remove unneeded code

* Documentation

* Add proper support for list of lists

* Fix mypy errors

* Added assert to make mypy happy

* Update haystack/nodes/other/route_documents.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR comments

* Remove check for logging level

* make mypy happy

* Update docstring of metadata_values

* Removed duplicate check. Make explicit check for metadata_values

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-04-18 15:18:13 +02:00
ZanSara
b06821b311
refactor: node->component (#4687)
* node->component

* fix tests
2023-04-17 12:20:42 +02:00
Massimiliano Pippi
a03e8335aa
Ignore cross-reference properties when loading documents (#4664)
* drop cross-reference properties

* be more defensive

* fix regression
2023-04-17 10:40:30 +02:00
Silvano Cerza
79727ed31f
Add requests blocker fixture (#4671) 2023-04-14 18:01:30 +02:00
Vladimir Blagojevic
1dcac11133
feat: Add Hugging Face inferencing PromptNode layer (#4641) 2023-04-14 17:59:17 +02:00
Vladimir Blagojevic
1dd6158244
fix: Add model_max_length model_kwargs parameter to HF PromptNode (#4651) 2023-04-14 15:40:42 +02:00
ZanSara
174d80ab41
skip tests (#4654) 2023-04-13 17:56:51 +02:00
Vladimir Blagojevic
e30bc8fe5a
feat: Add GenerationConfig option to PromptNode's HuggingFace invocation layer (#4649) 2023-04-13 12:15:00 +02:00
ZanSara
f2106ab37b
feat: initial implementation of MemoryDocumentStore for new Pipelines (#4447)
* add stub implementation

* reimplementation

* test files

* docstore tests

* tests for document

* better testing

* remove mmh3

* readme

* only store, no retrieval yet

* linting

* review feedback

* initial filters implementation

* working on filters

* linters

* filtering works and is isolated by document store

* simplify filters

* comments

* improve filters matching code

* review feedback

* pylint

* move logic into_create_id

* mypy
2023-04-13 09:36:23 +02:00
ZanSara
ba11d1c2a8
refactor!: extract evaluation and statistical dependencies (#4457)
* try-catch sklearn and scipy

* haystack imports

* linting

* mypy

* try to import baseretriever

* remove typing

* unused import

* remove more typing

* pylint

* isolate sql imports for postgres, which we don't use anyway

* remove stats

* replace expit

* als inmemory

* mypy

* feedback

* docker

* expit

* re-add njit
2023-04-12 15:38:56 +02:00
Fernando Pereira
5d41e60d89
fix: ParsrConverter list element added (#4562)
* fix: list element and mapping logic around it added to ParsrConverter convert step + unit test covering the specific mapping of list content from Parsr's to Haystack's

* Code review changes

* changed the samples path after conftest changes

* added samples_path to function arg

---------

Co-authored-by: Namoush <fmpereira22@gmail.com>
Co-authored-by: Fernando Pereira <fernando.pereira@criticalsoftware.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-04-12 18:38:21 +05:30