185 Commits

Author SHA1 Message Date
Varun Krishnan
badb05b3ab
feat: allow DocumentJoiner to accept top_k parameter in run method (#7709)
* feat: allow DocumentJoiner to accept top_k parameter in run method

* Added release note for DocumentJoiner top_k fix
2024-05-23 16:03:26 +02:00
Massimiliano Pippi
482f60ec99
fix: exit early if the component receives no documents (#7732)
* exit early if the component receives no documents

* relnote
2024-05-23 09:35:10 +02:00
David S. Batista
a4fc2b66e6
style: adding progress bar to llm-based evaluators (#7726)
* adding progress bar

* fixing typo

* fixing tests

* Update test_llm_evaluator.py

* fixing missing colon

* passing directly to parent

* adding docstrings
2024-05-23 09:22:14 +02:00
Massimiliano Pippi
76224fc781
make SerperDevWebSearch more robust (#7725) 2024-05-22 13:14:39 +02:00
Stefano Fiorucci
7181f6b7e9
feat: change HTML conversion backend from boilerpy3 to Trafilatura (#7705)
* change HTML conversion backed to Trafilatura

* rm unused var
2024-05-17 10:38:47 +02:00
Carlos Fernández
57af95d7ea
add keep-id to DocumentCleaner (#7703) 2024-05-16 19:18:48 +02:00
Carlos Fernández
686a4999cf
feat: widen support of env vars in OpenAI components (#7653)
* add enviroment variables to the _enviroment.py file

* add support for two of the three variables

* Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder.

* Replicate support for env vars in OpenAITextEmbedder.

* Add support for env vars in OpenAIGenerator..

* Add support for env vars in OpenAIChatGenerator.

* add docstrings and reno

* add params to __init__ in OpenAIDocumentEmbedder

* add params to __init__ in OpenAITextEmbedder

* make fully functional implementation of env vars and unit tests

* update reno

* Update haystack/components/embedders/openai_text_embedder.py

* reverse changes to telemetry/_enviroment.py

* Update haystack/components/embedders/openai_text_embedder.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-15 21:58:41 +00:00
David S. Batista
96b9d3e32a
fix: Adding missing component decorator to AzureOpenAIGenerator (#7698)
* initial import

* adding release notes

* tests avoiding I/O operations

* Update fix-azure-generators-serialization-18fcdc9cbcb3732e.yaml
2024-05-15 10:00:38 +02:00
David S. Batista
798dc4a4a5
fix: avoid FaithfulnessEvaluator and ContextRelevanceEvaluator return Nan (#7685)
* initial import

* fixing tests

* relaxing condition

* adding safeguard for ContextRelevanceEvaluator as well

* adding release notes
2024-05-14 17:08:51 +02:00
Vladimir Blagojevic
4352b1688e
fix: Fix NamedEntityExtractor serde (#7684)
* Fix NamedEntityExtractor serde

* Add release note

* Linting, remove unit markers
2024-05-14 12:24:55 +02:00
Sebastian Husch Lee
a2be90b95a
fix: Update device deserialization for components that use local models (#7686)
* fix: Update device deserializtion for SentenceTransformersTextEmbedder

* Add unit test

* Fix unit test

* Make same change to doc embedder

* Add release notes

* Add same change to Diversity Ranker and Named Entity Extractor

* Add unit test

* Add the same for whisper local

* Update release notes
2024-05-14 08:36:14 +02:00
Vladimir Blagojevic
811b93db91
feat: Set ByteStream's mime_type attribute for web based resources (#7681) 2024-05-13 19:44:02 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Stefano Fiorucci
7c9532b200
fix broken serialization of HFAPI components (#7661) 2024-05-08 17:14:37 +02:00
Stefano Fiorucci
94467149c1
fix: fix serialization of DocumentRecallEvaluator (#7662)
* fix serialization of DocumentRecallEvaluator

* add requested tests
2024-05-08 16:00:49 +02:00
Vladimir Blagojevic
5f813373eb
chore: Update huggingface_hub classes used after library upgrade (#7631)
* Update huggingface_hub classes used after library upgrade

* Fix chat tests

* Update lazy import guard and other references to huggingface_hub>=0.23.0

* In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional

* More fixes

* Add reno note
2024-05-03 10:14:54 +02:00
Julian Risch
b0284977db
feat: Add document page number of ExtractedAnswer to meta (#7572)
* calculate page number of answer and add to meta

* fix mypy, add reno

* add test

* simplify unit test

* update release note

* undo @patch updates

* extend tests, check page_number type
2024-05-02 14:48:27 +02:00
Mo
2e35f13085
feat: add converter based on pdfminer (#7607)
* Initial commit pdfminer converter

* Revert back naming of argument all_text per pdfminer documentation

* Add the component decorator

* Add release notes

* Reformat code with black

* Remove LTPage and comments

* Update dependencies in pyproject.toml

* Added some tests and incorporated reference doc in docstring

* Added some tests and incorporated reference doc in docstring
2024-05-02 10:36:54 +02:00
Julian Risch
2509eeea7e
refactor: Rename FaithfulnessEvaluator input responses to predicted_answers (#7621) 2024-04-30 16:30:57 +02:00
Bohan Qu
40360e44ff
feat: add required flag for prompt builder inputs (#7553) 2024-04-29 14:21:53 +02:00
Carlos Fernández
d2c87b2fd9
feat: add page_number to metadata in DocumentSplitter (#7599)
* Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705.

* Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x
Solve some minor bugs spotted by tests.

* Update docstrings.

* Add reno.

* Update haystack/components/preprocessors/document_splitter.py

Update docstring from suggestion

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* solve suggestion to improve readability

* fragment tests

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* Update .gitignore

* Update .gitignore

* Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml

* blackening

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-04-29 12:51:18 +02:00
Madeesh Kannan
a881451d3a
refactor: Refactor EvaluationResult into BaseEvaluationRunResult and EvaluationRunResult (#7594)
The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.
2024-04-25 12:16:48 +02:00
Julian Risch
9c56dbe288
test: Make ContextRelevanceEvaluator integration test more robust (#7584) 2024-04-23 16:01:25 +00:00
Julian Risch
07307709ee
test: Make FaithfulnessEvaluator integration test more robust (#7582) 2024-04-23 15:44:00 +00:00
Stefano Fiorucci
081757c6b9
test: replace mistral-7b with zephyr-7b-beta in tests (#7576)
* replace mistral-7b with gemma-2b-it in tests

* rm wrong comment

* change model
2024-04-23 13:56:07 +02:00
Julian Risch
d7638cfd4b
refactor: FaithfulnessEvaluator specifies inputs explicitly (#7548)
* specify inputs explicitly. move out examples

* Update haystack/components/evaluators/faithfulness.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-22 12:52:10 +00:00
Julian Risch
b12e0db134
feat: Add ContextRelevanceEvaluator component (#7519)
* feat: Add ContextRelevanceEvaluator component

* reno

* fix expected inputs and example docstring

* remove responses parameter from tests

* specify inputs explicitly

* add new evaluator to api reference docs
2024-04-22 14:10:00 +02:00
Massimiliano Pippi
3a80c866c9
fix: do not use reserved attributes in the logger (#7545)
* avoid using reserved keywords in the logger

* make the tests independent from the log level

* relnotes
2024-04-12 14:07:18 +00:00
Massimiliano Pippi
2bad5bcb96
refactor: AnswerExactMatchEvaluator component inputs (#7536)
* refactor component inputs

* release notes

* Update class docstring

* pylint

* update existing note instead of creating a new one

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-04-12 06:59:16 +00:00
David S. Batista
9a9c8aa1c8
feat: implementing evalualtion results API (#7520)
* initial import

* adding tests

* attending PR comments

* fixing tests

* updating tests

* updating tests and code

* renaming

* fixing linting issues

* adding release notes

* adding docstrings

* latest fixes
2024-04-10 13:34:03 +00:00
Julian Risch
e974a23fa3
docs: Fix eval metric examples in docstrings (#7505)
* fix eval metric docstrings, change type of individual scores

* change import order

* change exactmatch docstring to single ground truth answer

* change exactmatch comment to single ground truth answer

* reverted changing docs to single ground truth

* add warm up in SASEvaluator example

* fix FaithfulnessEvaluator docstring example

* extend FaithfulnessEvaluator docstring example

* Update FaithfulnessEvaluator init docstring

* Remove outdated default from LLMEvaluator docstring

* Add examples param to LLMEvaluator docstring example

* Add import and print to LLMEvaluator docstring example
2024-04-10 11:00:20 +02:00
Stefano Fiorucci
39be515ba6
skip HF integrations tests if running from fork (#7517) 2024-04-09 17:47:13 +02:00
Vladimir Blagojevic
988c360b6d
feat: Azure converter updates (#7409)
* Initial commit

* Remove old mock tests

* Fix current_last_page_number calculation

* Carry over unit tests from the other side

* Update pydocs, skip failing tests

* Fix pylint and mypy

* Minor adjustments

* Add release note

* Minor touch ups

* Resolve Document unique id issue by using custom id calculation

* Better hashing, add unit tests

* Small fixes
2024-04-09 09:45:06 +02:00
Stefano Fiorucci
eff53a9131
feat: HuggingFaceAPIDocumentEmbedder (#7485)
* add HuggingFaceAPITextEmbedder

* add HuggingFaceAPITextEmbedder

* rm unneeded else

* wip

* small fixes

* deprecation; reno

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* make params mandatory

* changes requested

* fix test

* fix test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-08 15:06:26 +02:00
Stefano Fiorucci
c91bd49cae
feat: HuggingFaceAPITextEmbedder (#7484)
* add HuggingFaceAPITextEmbedder

* add HuggingFaceAPITextEmbedder

* rm unneeded else

* small fixes

* changes requested

* fix test
2024-04-08 14:22:54 +02:00
David S. Batista
aae2b31359
fix: typo in sas_evaluator arg (#7486)
* fixing typo on SAS arg

* fixing tests

* fixing tests
2024-04-08 10:21:37 +02:00
Stefano Fiorucci
0dbb98c0a0
feat: HuggingFaceAPIChatGenerator (#7480)
* draft

* docstrings and more tests

* deprecation; reno

* pydoc config

* better error messages

* wip

* add test

* better docstrings

* deprecation; reno

* pylint

* typo

* rm unneeded else

* rm unneeded else

* fixes from feedback

* docstring showing the enum

* improve docstring

* make params mandatory

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* document enum

* Update haystack/utils/hf.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* mandatory params

* fix test

* fix test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-05 18:48:34 +02:00
Stefano Fiorucci
1d083861ff
feat: HuggingFaceAPIGenerator (#7464)
* draft

* docstrings and more tests

* deprecation; reno

* pydoc config

* better error messages

* rm unneeded else

* make params mandatory

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* document enum

* Update haystack/utils/hf.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* fix test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-05 18:48:13 +02:00
Silvano Cerza
ff269db12d
Fix unit tests failing if HF_API_TOKEN is set (#7491) 2024-04-05 18:05:43 +02:00
Vladimir Blagojevic
c3b96392fd
feat: Use all HTMLToDocument extractors until content is extracted (#7452)
* Use all HTMLToDocument extractors until content is extracted

* Add release note

* Minor doc update

* Improvements, unit test fixes

* Add try_others init param, update tests

* Update haystack/components/converters/html.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* PR feedback - Stefano

* Improve reno release note, add  reference

* little fixes

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-04-05 16:02:34 +02:00
Julian Risch
9d02dc607a
feat: Add FaithfulnessEvaluator component (#7424)
* draft FaithfulnessEvaluator

* reno

* calculate score per statement and aggregate

* Update release note

* update default values in tests and fix import path

* remove instructions, inputs, outputs params

* remove unused imports

* add expected format example to docstring

* remove name 'llm' from tests and docstring
2024-04-04 16:33:59 +00:00
Julian Risch
8ef6062748
refactor: Remove name 'llm' from LLMEvaluator output (#7479) 2024-04-04 15:19:30 +00:00
Silvano Cerza
8b8a93bc0d
refactor: Rename DocumentMeanAveragePrecision and DocumentMeanReciprocalRank (#7470)
* Rename DocumentMeanAveragePrecision and DocumentMeanReciprocalRank

* Update releasenotes

* Simplify names
2024-04-04 17:04:59 +02:00
Silvano Cerza
bdc25ca2a0
feat: Add DocumentMeanReciprocalRank (#7468)
* Add DocumentMeanReciprocalRank

* Fix float precision error
2024-04-04 14:55:37 +02:00
Silvano Cerza
7799909069
feat: Add DocumentMeanAveragePrecision (#7461)
* Add DocumentMeanAveragePrecision

* Remove questions input

* Update docstrings

* Update haystack/components/evaluators/document_map.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-04 14:15:45 +02:00
Silvano Cerza
dc87f51759
refactor: Remove questions inputs from evaluators (#7466)
* Remove questions input from AnswerExactMatchEvaluator

* Remove questions input from DocumentRecallEvaluator
2024-04-04 14:14:18 +02:00
Silvano Cerza
12acb3f12e
feat: Add SASEvaluator (#7428)
* Add SASEvaluator

* Add release notes

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Simplify similarity calculation with bi-encoders models

* Fix linting

* Update docstrings

* Move tensor to CPU after calculating cosine similarity

* Fix CI failing

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-04 10:10:41 +02:00
Ashwin Mathur
1c7d1618d8
Add truncate and normalize parameters to TEI Embedders (#7460) 2024-04-03 16:41:30 +02:00
Vladimir Blagojevic
d83af92270
feat: Update searchapi format, default to Google, allow search engine selection (#7453)
* Update searchapi payload

* Add release note

* PR feedback - Stefano

* Adjust unit test for mandatory engine search_param field
2024-04-03 10:48:50 +02:00
Nicola Procopio
42c5b7af32
feat: added dimensions parameters to Azure OpenAI Embedders (#7449)
* added dimensions parameter to AzureOpenAIEmbedders

* created releasenote

* update release note

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-04-02 14:04:16 +02:00