55 Commits

Author SHA1 Message Date
Stefano Fiorucci
7cc6080dfa
chore: replace metadata w meta in tests/examples (#6612)
* replace metadata w meta in tests/examples

* do not touch already broken e2e tests

* Revert "do not touch already broken e2e tests"

This reverts commit 1f911920d98954b57daacfe8d8ed02fd77d136db.
2023-12-21 14:09:31 +01:00
Ashwin Mathur
46b395eec3
feat: Add Eval and EvaluationResult (#6505)
* Add initial implementation for Eval and EvaluationResult

* Add release notes

* Update files with suggestions from review

* Remove serialization

* Add eval e2e tests

* Update eval e2e tests
2023-12-18 11:29:09 +01:00
Silvano Cerza
18dbce25fc
refacotr: Refactor answer dataclasses (#6523)
* Refactor answer dataclasses

* Add release notes

* Fix tests

* Fix end to end tests

* Enhance ExtractiveReader
2023-12-11 18:50:49 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
Silvano Cerza
fd16ec63cb
refactor: Add support for new filters declaration (#6397)
* Rework filter logic for InMemoryDocumentStore to support new filters
declaration

* Fix legacy filters tests

* Simplify logic and handle dates comparison

* Rework MetadataRouter to support new filters

* Update docstrings

* Add release notes

* Fix linting

* Avoid duplicating filters specifications

* Handle corner case

* Simplify docstring

* Fix filters logic and tests

* Fix Document Store testing legacy filters tests
2023-11-24 11:22:46 +01:00
Julian Risch
67780a62d5
test: Add end-to-end test for dense doc search 2.0 (#6102)
* draft e2e test for dense doc search

* fix import path

* add DocumentJoiner

* update converter import; fix getting filled doc store

* add text embedder

* add sample txt and pdf for preview e2e tests

* run the query pipeline before serializing

* define samples path

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 16:59:02 +01:00
Vladimir Blagojevic
cfff0d5212
Rename file_converters to converters (#6390) 2023-11-23 10:28:40 +01:00
Julian Risch
4ef2a680bb
feat: Add DocumentJoiner component 2.0 (#6105)
* draft DocumentJoiner

* implement merge and rrf

* draft end-to-end test with DocumentJoiner in hybrid doc search pipeline

* adjust for variadics Canals PR #122

* fix text_embedder input

* adapt to the new Document class

* adapt to new doc id

* specify documents input as Variadic in run method

* compare doc ids instead of full docs

* rename text_file_converter input to sources

* update docstring

* Update haystack/preview/components/routers/document_joiner.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from docstring review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* capitalize Documents and Retrievers in docstrings

* fix log message in test

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2023-11-20 10:56:56 +01:00
ZanSara
dfc1d452bb
feat: upgrade canals to 0.10.1 (#6309)
* upgrade canals

* reno

* trigger preview e2e

* bump canals

* fix decorator

* fix test

* test factory

* tests inmemory

* tests writer

* test audio

* tests builders

* tests caching

* tests embedders

* tests converters

* tests generators

* tests rankers

* tests retrievers

* fix pipeline and telemetry tests

* remove trigger
2023-11-17 14:46:23 +01:00
Julian Risch
1c85e44156
test: Add langdetect installation to e2e tests (#6327)
* Add langdetect installation to e2e tests

* compare doc content and id only
2023-11-17 10:12:05 +01:00
Julian Risch
8b092a90c0
test: Add MetadataRouter to preprocessing pipeline in e2e test (#6321)
* add MetadataRouter to preprocessing pipeline

* replace mimetype check with language check
2023-11-16 11:22:37 +01:00
Vladimir Blagojevic
5497ca2a45
feat: Adapt GPTGenerator to use str input/output format in Haystack 2.x (#6214)
* Adapt GPTGenerator to string input/output

* Finishing touches

* punctuation upd

* PR feedback

* Small naming fixes

* Update haystack/preview/components/generators/openai.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* Update class pydoc with a printed response

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-07 18:00:43 +01:00
Stefano Fiorucci
982ac3df01
fix: fix failing e2e test (after moving classifiers) (#6243)
* mv classifiers

* release note

* fix e2e test
2023-11-06 17:08:20 +01:00
Stefano Fiorucci
063d27c522
refactor!: rename TextDocumentSplitter to DocumentSplitter (#6223)
* rename TextDocumentSplitter to DocumentSplitter

* reno

* fix init
2023-11-03 11:33:20 +01:00
Julian Risch
29b1fefaa4
feat: Add DocumentLanguageClassifier 2.0 (#6037)
* add DocumentLanguageClassifier and tests

* reno

* fix import, rename DocumentCleaner

* mark example usage as python code

* add assertions to e2e test

* use deserialized document_store

* Apply suggestions from code review

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* remove from/to_dict

* use renamed InMemoryDocumentStore

* adapt to Document refactoring

* improve docstring

* fix test for new Document

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-10-31 15:35:05 +01:00
Silvano Cerza
7287657f0e
refactor: Rename Document's text field to content (#6181)
* Rework Document serialisation

Make Document backward compatible

Fix InMemoryDocumentStore filters

Fix InMemoryDocumentStore.bm25_retrieval

Add release notes

Fix pylint failures

Enhance Document kwargs handling and docstrings

Rename Document's text field to content

Fix e2e tests

Fix SimilarityRanker tests

Fix typo in release notes

Rename Document's metadata field to meta (#6183)

* fix bugs

* make linters happy

* fix

* more fix

* match regex

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-10-31 12:44:04 +01:00
Nripesh Niketan
708d33a657
feat: add apple silicon GPU acceleration (#6151)
* feat: add apple silicon GPU acceleration

* add release notes

* small fix

* Update utils.py

* Update utils.py

* ci fix mps

* Revert "ci fix mps"

This reverts commit 783ae503940d9ff8270a970a321549fb9e69dce7.

* mps fix

* Update experiment_tracking.py

* try removing upper watermark limit

* disable mps CI

* Use xl runner

* initialise env

* small fix

* black linting

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-10-30 11:26:46 +01:00
Stefano Fiorucci
4e4af99a5e
refactor!: rename MemoryDocumentStore and related Retrievers (#6076)
* rename doc store and retrievers

* release note

* fix patch
2023-10-17 16:15:16 +02:00
ZanSara
71f2430fd1
test: enhance e2e tests to also draw and serialize/deserialize the test pipelines (#5910)
* add draw and serialization/deserialization to e2e pipeline examples

* add comment about json serialization

* fix a small gptgenerator bug and move indexing in tests

* to json

* review feedback
2023-10-09 13:54:17 +02:00
Stefano Fiorucci
c8398eeb6d
test: e2e test for Extractive QA Pipeline (#5879)
* e2e test for e. qa pipeline
2023-09-26 15:44:34 +02:00
Silvano Cerza
cf7f0ebc22
Add Pipelines async run (#5864)
* Add Pipeline.arun()

* Sleeper node

* Fix async running

* Add e2e tests

To run a Pipeline that doesn't have any async node in async mode:

    pytest e2e/pipelines/test_standard_pipelines.py::test_query_and_indexing_pipeline

To run a Pipeline that has a single async node in concurrent mode:

    pytest e2e/pipelines/test_standard_pipelines.py::test_async_concurrent_complex_pipeline

To run a Pipeline that has a single async node in sequential mode:

    pytest e2e/pipelines/test_standard_pipelines.py::test_async_sequential_complex_pipeline

* Remove unused _adispatch_run method

* Make Pipeline.run work with async nodes

* Revert "Make Pipeline.run work with async nodes"

This reverts commit 22d7a94e4d41aca1b59dad18c0b366fbb6e8f431.

* Rename Pipeline.arun to Pipeline._arun

* Enhance docstring

* Add Sleeper docstring

* Add release notes

* ignore typing across the node

* make pylint happy

* skip pylint on needed unused import

* fix

* if a node has an arun method, use it

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-26 15:37:27 +02:00
Stefano Fiorucci
e9d34fc0e3
test: e2e tests for RAG Pipelines (#5876)
* relax extractive reader integration tests

* force reader to CPU

* ensure integration tests reproducibility

* e2e rag tests

* move set_all_seeds to testing package

* refine rag tests

* Update e2e/preview/pipelines/test_rag_pipelines.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-26 11:49:50 +02:00
ZanSara
23fdef929e
chore: move GPT35Generator tests in the main test suite (#5844)
* move tests

* fix no-test-found error from pytest

* missing self

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-21 11:42:32 +02:00
ZanSara
c933bcaa69
chore: move Whisper e2e tests in the main tests suite (#5845)
* move whisper local tests

* remove e2e file

* move remote tests

* remove e2e file
2023-09-20 14:48:09 +02:00
ZanSara
44f0c468ac
move websearch tests back to main tests suite (#5842) 2023-09-20 11:55:18 +02:00
Christian Clauss
bf6d306d68
ci: Simplify Python code with ruff rules SIM (#5833)
* ci: Simplify Python code with ruff rules SIM

* Revert #5828

* ruff --select=I --fix haystack/modeling/infer.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-20 08:32:44 +02:00
Christian Clauss
91ab90a256
perf: Python performance improvements with ruff C4 and PERF fixes (#5803)
* Python performance improvements with ruff C4 and PERF

* pre-commit fixes

* Revert changes to examples/basic_qa_pipeline.py

* Revert changes to haystack/preview/testing/document_store.py

* revert releasenotes

* Upgrade to ruff v0.0.290
2023-09-16 16:26:07 +02:00
Christian Clauss
6dd52d91b2
ci: Fix typos discovered by codespell (#5778)
* Fix typos discovered by codespell

* pylint: max-args = 38
2023-09-13 16:14:45 +02:00
Julian Risch
4ae0924ea0
feat!: Remove SklearnQueryClassifier (#5779)
* remove SklearnQueryClassifier

* reno
2023-09-13 12:55:33 +02:00
ZanSara
2c4d839b64
feat: GPT4Generator (#5744)
* add gpt4generator

* add e2e

* add tests

* reno

* fix e2e

* Update test/preview/components/generators/openai/test_gpt4_generator.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-09-13 10:07:09 +02:00
ZanSara
94c5d6d216
feat: make GPT35Generator non batch (#5764)
* make gpt35generator not batch

* fix tests

* review feedback

* mypy
2023-09-12 18:19:28 +02:00
ZanSara
24c42b1e03
fix tests (#5773) 2023-09-12 17:41:08 +02:00
ZanSara
7194343458
remove test (#5753) 2023-09-12 16:04:36 +02:00
Stefano Fiorucci
d860a5c604
make tests more robust (#5747) 2023-09-08 15:50:56 +02:00
ZanSara
7abd73419f
fix remote whisper tests (#5732) 2023-09-07 10:53:29 +02:00
ZanSara
63cbde7287
feat: GPT35Generator (#5714)
* chatgpt backend

* fix tests

* reno

* remove print

* helpers tests

* add chatgpt generator

* use openai sdk

* remove backend

* tests are broken

* fix tests

* stray param

* move _check_troncated_answers into the class

* wrong import

* rename function

* typo in test

* add openai deps

* mypy

* improve system prompt docstring

* typos update

* Update haystack/preview/components/generators/openai/chatgpt.py

* pylint

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/preview/components/generators/openai/chatgpt.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* review feedback

* fix tests

* freview feedback

* reno

* remove tenacity mock

* gpt35generator

* fix naming

* remove stray references to chatgpt

* fix e2e

* Update releasenotes/notes/chatgpt-llm-generator-d043532654efe684.yaml

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* add another test

* test wrong model name

* review feedback

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-07 10:06:57 +02:00
Vladimir Blagojevic
c5edb45c10
feat: Add SerperDevWebSearch Haystack 2.0 component (#5712)
* Add SerperDev

* Add release note

* PR Feedback

* Simplify, remove one-liner

* Update haystack/preview/components/websearch/serper_dev.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Update haystack/preview/components/websearch/serper_dev.py

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>

* Fix formatting

* PR feedback

* Fix tests

* Function rename

* Remove scoring, update tests

* PR feedback

* Fix return

* small adjustments

* fix tests

* add e2e test

* fix release notes

* fix tests

* fix e2e

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-06 17:31:42 +02:00
ZanSara
0bbc219a59
chore: enable e2e preview tests (#5730)
* enable e2e preview tests

* fix transcriber test

* quotes

* add missing dep

* missing comma

* ffmpeg
2023-09-06 16:48:45 +02:00
ZanSara
ce06268990
test: fix e2e test failures (#5685)
* fix test errors

* fix pipeline yaml

* disable cache

* fix errors

* remove stray fixture
2023-08-30 12:24:03 +02:00
ZanSara
5985b6d358
chore: refactor pipeline tests for e2e testing (#5576)
* enable pipeline filder in e2e

* merge standard pipeline tests with stanrdard pipeline batch tests

* merge summarization tests into standard pipelines tests

* Update test_standard_pipelines.py

* black
2023-08-29 11:22:39 +02:00
Vladimir Blagojevic
37cf1fe49c
Tests in e2e/nodes/test_summarizer.py could be removed as pipeline e2e tests cover SearchSummarizationPipeline already (#5454)
Tests in e2e/nodes/test_translator.py can be removed as unit tests exist for translattor and e2e test mostly tests just that the model is good, which is nothing we should test for
2023-08-08 13:21:11 +02:00
Julian Risch
eeb29b5686
test: Re-activate end-to-end tests workflow (#5343)
* Install haystack with required extras

* remove whitespaces

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Add sleep

* Add s for seconds

* Move container initialization in workflow

* Update e2e.yml

add nightly run

* use new folder for initial e2e test

* use file hash for caching and trigger on push to branch

* remove \n from model names read from file

* remove trigger on push to branch

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-07-20 11:48:51 +02:00
Stefano Fiorucci
637433841e
chore: remove deprecated Seq2SeqGenerator and RAGenerator (#5180)
* first draft of removal

* more removals

* don't download unused models
2023-06-21 16:38:45 +02:00
ZanSara
f80ae01174
LocalWhisperTranscriber (v2) (#4909)
* original component

* remove remote parts

* unit tests

* polish docstrings

* fix unit tests

* fix e2e tests

* pylint

* remove check

* review feedback

* add type: ignore

* improve tests

* test stream handling

* upgrade canals and improve tests

* pylint
2023-05-22 18:30:35 +02:00
ZanSara
516db4cb52
RemoteWhisperTranscriber (v2) (#4910)
* original-component

* stub

* fix implementation

* fix tests

* review feedback

* review feedback

* upgrade canals

* upgrade canals

* upgrade canals to fix pipeline test

* remove requests_with_retry

* feedback
2023-05-22 16:02:58 +02:00
Massimiliano Pippi
c6ea542b57
chore: remove BaseKnowledgeGraph (#4953)
* remove BaseKnowledgeGraph

* fix pylint
2023-05-21 10:42:02 +02:00
Massimiliano Pippi
4974bf7ab3
chore: remove deprecated MilvusDocumentStore (#4951)
* remove deprecated MilvusDocumentStore

* remove leftovers

* fix pylint
2023-05-19 16:37:38 +02:00
ZanSara
b60d9a2cbf
test: move several modeling tests in e2e/ (#4308)
* no dpr test seems worth mocking

* move distillation tests

* pylint

* mypy

* pylint

* move feature_extraction tests as well

* move feature_extraction tests as well

* merge feature extractor suites

* get_language_model tests and adaptive model tests

* duplicate test

* moving fixtures

* mypy

* mypy-again

* trigger

* un-mock integration test

* review feedback

* feedback

* pylint
2023-04-28 17:08:41 +02:00
Silvano Cerza
5ac3dffbef
test: Rework conftest (#4614)
* Split root conftest into multiple ones and remove unused fixtures

* Remove some constants and make them fixtures

* Remove unnecessary fixture scoping

* Fix failing whisper tests

* Fix image_file_paths fixture
2023-04-11 10:33:43 +02:00