91 Commits

Author SHA1 Message Date
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores namespace (#6714)
* remove symbols under the haystack.document_stores namespace

* Update haystack/document_stores/types/protocol.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* fix

* same for retrievers

* leftovers

* more leftovers

* add relnote

* leftovers

* one more

* fix examples

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00
Silvano Cerza
9445b2d466
Fix skipif with empty env var (#6704) 2024-01-08 19:19:14 +01:00
Silvano Cerza
607e7d1488
Skip integration tests if env var is missing (#6703) 2024-01-08 17:15:10 +01:00
Vladimir Blagojevic
4d08be0c2a
feat: Update OpenAI Python Client in Haystack 2.x (#6584)
* Update openai python client

* Add release note

* Consolidate multiple mock_chat_completion into one

* Ensure all components have api_base_url, organization params

* Update tests

* Enable function calling

* Oversight

* Minor fixes, add streaming test mocks

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* metadata -> meta

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-12-21 16:21:24 +01:00
Silvano Cerza
8a513f3b8c
test: Add fixture to block requests in tests (#6585)
* Add fixture to block requests in tests

* Mark tests making requests as integration
2023-12-21 08:51:54 +01:00
Vladimir Blagojevic
628e8aa3d4
feat: Improve getting started examples (#6510)
* Improve rag and indexing pipelines

* Update examples

* Simplify user interface and code, improve embedder model

* Improve default vals for embedder

* resolve typing

* resolve typing 2

* Fix unit test

---------

Co-authored-by: Timo Möller <timo.moeller@deepset.ai>
2023-12-09 19:01:13 +01:00
Vladimir Blagojevic
008a322023
feat: Add Indexing Pipeline (#6424)
* Add build_indexing_pipeline utils function

* Pylint fixes

* Move into another package to avoid circular deps

* Revert change

* Revert haystack/utils/__init__.py change

* Add example

* Use DocumentStore type, remove typing checks
2023-12-04 16:08:53 +01:00
ZanSara
a38f871dbd
feat: Add RAG pipeline (#6461)
* add rag pipeline

* Update examples/getting_started/rag.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-12-04 15:25:29 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
ZanSara
adf7e49af3
chore: review all extra (#6029) 2023-10-12 21:50:53 +02:00
Christian Clauss
91ab90a256
perf: Python performance improvements with ruff C4 and PERF fixes (#5803)
* Python performance improvements with ruff C4 and PERF

* pre-commit fixes

* Revert changes to examples/basic_qa_pipeline.py

* Revert changes to haystack/preview/testing/document_store.py

* revert releasenotes

* Upgrade to ruff v0.0.290
2023-09-16 16:26:07 +02:00
Christian Clauss
1bc03ddc73
ci: Fix all ruff pyflakes errors except unused imports (#5820)
* ci: Fix all ruff pyflakes errors except unused imports

* Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml
2023-09-15 18:30:33 +02:00
Christian Clauss
9405eb90ee
ci: Fix invalid escape sequences in Python code (#5802)
* ci: Use ruff in pre-commit to further limit complexity

* Fix invalid escape sequences in Python code

* Delete releasenotes/notes/ruff-4d2504d362035166.yaml
2023-09-14 16:42:48 +02:00
Stefano Fiorucci
d860a5c604
make tests more robust (#5747) 2023-09-08 15:50:56 +02:00
Sebastian Husch Lee
2bc7fe1a08
test: reactivate unit tests in test_eval.py (#5255)
* Activate tests that follow unit test and integration test rules

* Adding more integration labels

* Change name to better reflect complexity of test

* Remove mark integration tags, move test to doc store test for add_eval_data

* Removing incorrect integration label

* Deactivated document store test b/c it fails for Weaviate and pinecone

* Remove unit label since test needs to be refactored to be considered a unit test

* Undo changes

* Undo change

* Check every field in the load evaluation result

* Add back label and add skip reason

* Use pytest skip instead of TODO
2023-07-24 17:07:45 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators (#5291)
* Add isolated node eval to BaseGenerator's run_batch

* Add unit tests
2023-07-07 10:32:43 +02:00
bogdankostic
ed1bad1155
fix: Use add_isolated_node_eval of eval_batch in run_batch (#5223)
* Fix isolated node eval in eval_batch

* Add unit test
2023-06-28 16:51:23 +02:00
ZanSara
3c71f0ae3d
chore: mark some unit tests under test/pipeline (#5124)
* mark some unit tests as such

* remove marker
2023-06-12 17:58:31 +02:00
Massimiliano Pippi
4974bf7ab3
chore: remove deprecated MilvusDocumentStore (#4951)
* remove deprecated MilvusDocumentStore

* remove leftovers

* fix pylint
2023-05-19 16:37:38 +02:00
tstadel
7625829684
fix: EvaluationResult serialization changes dataframes (#4906)
* fix nan and index values

* add test

* make test for None values after evalresult read explicit
2023-05-16 16:03:09 +02:00
duffn
479092e3c1
bug: (rest_api) remove full logging of overwritten env variables (#4791)
* bug: (rest_api) remove logging of overwritten env variables

* Update haystack/pipelines/config.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update test

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-02 16:48:19 +02:00
Vladimir Blagojevic
aebc22d27e
Upgrade transformers to 4.28.1 (#4665)
* Upgrade to transformers 4.28.1

* Commenting out failing piece of test

* trailing-whitespace

* Adjust regex for error match - it changed between releases

* Remove RAG tests failing with transformers update
2023-04-27 12:55:21 +02:00
tstadel
9cbe9e0949
fix: recursion of death while loading PromptTemplate from yaml (#4691)
* fix recursion of death when deserializing prompttemplate

* add test

* set api_key

* fix test

* add generic test

* work in feedback on tests

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-04-26 13:56:51 +02:00
Sebastian
8d9136bad4
feat: Implementation of Table Cell Proposal (#4616)
* Starting adding support for TableCell

* Update tests to use row and col

* Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables.

* Update eval test to use TableCell

* Added more schema tests for table docs, labels and answers.

* Add boolean to toggle between Span and TableCell

* Add deprecation message

* Test that table answers work as responses in the rest API

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-19 13:14:49 +02:00
Silvano Cerza
5ac3dffbef
test: Rework conftest (#4614)
* Split root conftest into multiple ones and remove unused fixtures

* Remove some constants and make them fixtures

* Remove unnecessary fixture scoping

* Fix failing whisper tests

* Fix image_file_paths fixture
2023-04-11 10:33:43 +02:00
Silvano Cerza
cfb8dfd470
Fix pipeline config and agent tools hashing for telemetry (#4508) 2023-03-28 09:41:50 +02:00
Vladimir Blagojevic
be25655663
feat: Add agent tools (#4437)
* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Fix mypy

* Minor example fixes

* Fix the descriptions

* PR feedback updates

* More fixes

* TopPSampler: handle top p None value, add unit test

* Add top_k to WebSearch

* Use boilerpy3 instead trafilatura

* Remove date finding

* Add more WebRetriever docs

* Refactor long methods

* making the preprocessor optional

* hide WebSearch and make NeuralWebSearch a pipeline

* remove unused imports

* add WebQAPipeline and split example into two

* change example search engine to SerperDev

* Turn off progress bars in WebRetriever's PreProcesssor

* Agent tool examples - final updates

* Add webqa test, search results ranking scores

* Better answer box handling for SerperDev and SerpAPI

* Minor fixes

* pylint

* pylint fixes

* extract TopPSampler from WebRetriever

* use sampler only for WebRetriever modes other than snippet

* add web retriever tests

* add web retriever tests

* exclude rdflib@6.3.2 due to license issues

* add test for preprocessed docs and kwargs examples in docstrings

* Move test_webqa_pipeline to test/pipelines

* change docstring for join_documents_and_scores

* Use WebQAPipeline in examples/web_lfqa.py

* Use WebQAPipeline in examples/web_lfqa.py

* Move test_webqa_pipeline to e2e

* Updated lg

* Sampler added automatically in WebQAPipeline, no need to add it

* Updated lg

* Updated lg

* :ignore Update agent tools examples to new templates (#4503)

* Update examples to new templates

* Add print back

* fix linting and black format issues

---------

Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-27 18:14:58 +02:00
tstadel
4f90e59796
feat: expose prompts to Answer and EvaluationResult (#4341)
* store prompt in Answer

* store prompt in eval csv

* fix tests

* chore: fix context offset loadingQ

* add tests

* add test from PR #4476

* fix tests after merge
2023-03-27 17:54:20 +02:00
ZanSara
6d578ebf3d
refactor: remove telemetry v1 (#4496)
* remove telemetry v1

* more pipeline methods to take out

* send_event_2

* mypy

* pylint

* mypy

* mypy again

* remove test
2023-03-27 17:38:43 +02:00
ju-gu
a3409c7da6
fix: issue evaluation check for content type (#4181)
* fix: issue evaluation check for content type

Evaluation currently breaks, when the content type is not a str.

* add black

* add test table eval

* add black formatting

* Expand integration test

---------

Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-03-16 17:36:53 +01:00
Zoltan Fedor
4dea9db01e
feat: Report execution time for pipeline components in _debug (#4197)
* Adding execution time to the debug output of pipeline components

* Linting issue fix

* [EMPTY] Re-trigger CI

* fixed test

---------

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-07 04:45:31 +05:30
tstadel
19311119db
fix: EvalResult load migration (#4289)
* fix evalresult load migration

* handle none values correctly

* better None check

* improve logic and add test
2023-03-06 20:05:02 +01:00
ZanSara
c802305ccf
test: move tests on standard pipelines in e2e/ (#4309)
* move out standard pipelines e2e

* fixing unit tests

* add test data

* feedback

* pylint

* black
2023-03-06 17:26:19 +01:00
ZanSara
165a0a5faa
test: mock all Translator tests and move one to e2e (#4290)
* mock all translator tests and move one to e2e

* typo

* extract pipeline tests using translator

* remove duplicate test

* move generator test in e2e

* Update e2e/pipelines/test_extractive_qa.py

* pytest.mark.unit

* black

* remove model name as well

* remove unused fixture

* rename original and improve pipeline tests

* fixes

* pylint
2023-03-01 14:52:05 +01:00
Stefano Fiorucci
e8f9b1b65d
test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS (#4283)
* replace elasticds with imds - first draft

* fix

* fix tests and implement scale_score in imds bm25

* add docstrings for scale_score
2023-03-01 11:35:10 +01:00
ZanSara
13c4ff1b52
refactor: remove direct logging without a logger (#4253)
* remove direct logging without a logger

* add custom pylint checker

* add test

* pylint

* improve checker message

* mypy

* remove test

* add checker for basicConfig

* more logging missed

* ignore basicConfig

* move out logger

* move out statement

* remove logging configuration
2023-02-23 20:42:42 +01:00
Stefano Fiorucci
5e85f33bd3
refactor: Remove deprecated nodes EvalDocuments and EvalAnswers (#4194)
* remove deprecated classed and update test

* remove deprecated classed and update test

* remove unused code

* remove unused import

* remove empty evaluator node

* unused import :-)

* move sas to metrics
2023-02-23 15:26:17 +01:00
ZanSara
f816efa50c
feat: reduce and focus telemetry (#4087)
* simplified telemetry and docker containers detection

* pylint

* mypy

* mypy

* Add new credentials and metadata

* remove prints

* mypy

* remove comment

* simplify inout len measurement

* black

* removed old telemetry, to revert

* reintroduce env function

* reintroduce old telemetry

* fix telemetry selection

* telemetry for promptnode

* telemetry for some training methods

* telemetry for eval and distillation

* mypy & pylint

* review

* Update lg

* mypy

* improve docstrings

* pylint

* mypy

* fix test

* linting

* remove old tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-22 19:02:47 +01:00
bogdankostic
05950719ba
fix: Deduplicate same Documents in isolated evaluation of Reader (#4114)
* Deduplicate same Documents in one MultiLabel

* Add tests

* Update label

* Update label

* Update test

* Update test

* Revert change to check CI

* Revert reversion

* Use deepcopy

* Update tests
2023-02-10 13:55:14 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
tstadel
9611b64ec5
fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885)
* fix document retrieval metrics for all document_relevance_criteria

* fix tests

* fix eval_batch metrics

* small refactorings

* evaluate metrics on label level

* document retrieval tests added

* fix pylint

* fix test

* support file retrieval

* add comment about threshold

* rename test
2023-02-02 15:00:07 +01:00
Zoltan Fedor
e447bd728a
feat: adding the ability to use Ray Serve async functionality (#3769)
* Adding the ability to call the Ray pipeline from concurrent apps with async

This is to fix #2968

* Fixes: mype + pylint (`invalid-overridden-method`)

* Simplifying - no real need for an `AsyncRayPipeline` anymore

* Moving the new `run_async` method to the `RayPipeline`

* Cleanup

* [EMPTY] Re-trigger CI
2023-01-23 16:23:09 +01:00
ZanSara
6f5a2fb1da
fix: remove string validation in YAML (#3854)
* remove string validation in YAML

* unused import

* fix import

* remove tests

* fix tests
2023-01-19 10:06:53 +01:00
Massimiliano Pippi
fa4404baa0
fix: ignore non-serializable params when hashing pipeline objects (#3842)
* ignore non-serializable params when hashing pipeline objects

* make tests more clear
2023-01-11 17:11:41 +01:00
Julian Risch
a2c160e7d8
bug: skip empty documents in reader (#3773)
* skip empty documents

* test eval_batch and account for tables
2023-01-03 15:50:14 +01:00
Julian Risch
b155297a06
feat: change PipelineConfigError to DocumentStoreError with more details (#3783) 2023-01-02 19:40:45 +01:00
bogdankostic
594d2a10f8
fix: Fix predict_batch in TransformersReader for single nested Document list (#3748)
* Fix restoring of list structure

* Add tests
2022-12-29 11:48:18 +01:00
Vladimir Blagojevic
e4c3817d01
Adjust get_type() method for pipelines (#3657) 2022-12-02 14:48:47 +01:00
Julian Risch
adb580b6b7
feat: add offsets_in_context to evaluation result (#3640)
* add offsets_in_context to eval result

* extend test case
2022-11-30 11:43:42 +01:00