Julian Risch
b0284977db
feat: Add document page number of ExtractedAnswer to meta ( #7572 )
...
* calculate page number of answer and add to meta
* fix mypy, add reno
* add test
* simplify unit test
* update release note
* undo @patch updates
* extend tests, check page_number type
2024-05-02 14:48:27 +02:00
Mo
2e35f13085
feat: add converter based on pdfminer ( #7607 )
...
* Initial commit pdfminer converter
* Revert back naming of argument all_text per pdfminer documentation
* Add the component decorator
* Add release notes
* Reformat code with black
* Remove LTPage and comments
* Update dependencies in pyproject.toml
* Added some tests and incorporated reference doc in docstring
* Added some tests and incorporated reference doc in docstring
2024-05-02 10:36:54 +02:00
Julian Risch
2509eeea7e
refactor: Rename FaithfulnessEvaluator input responses to predicted_answers ( #7621 )
2024-04-30 16:30:57 +02:00
Vladimir Blagojevic
8cb3cecf34
feat: Trace pipeline run input/output data ( #7590 )
...
* Trace pipeline run
* Add reno note
* Update tracing tests to check input_data and output_data
* empty
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-04-29 17:29:27 +02:00
Bohan Qu
40360e44ff
feat: add required flag for prompt builder inputs ( #7553 )
2024-04-29 14:21:53 +02:00
Carlos Fernández
d2c87b2fd9
feat: add page_number to metadata in DocumentSplitter ( #7599 )
...
* Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705 .
* Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x
Solve some minor bugs spotted by tests.
* Update docstrings.
* Add reno.
* Update haystack/components/preprocessors/document_splitter.py
Update docstring from suggestion
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* solve suggestion to improve readability
* fragment tests
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* Update .gitignore
* Update .gitignore
* Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml
* blackening
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-04-29 12:51:18 +02:00
Madeesh Kannan
a881451d3a
refactor: Refactor EvaluationResult into BaseEvaluationRunResult and EvaluationRunResult ( #7594 )
...
The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.
2024-04-25 12:16:48 +02:00
Madeesh Kannan
ec0e22265a
feat: Expand Pipeline.inputs and Pipeline.outputs to include connected sockets ( #7586 )
2024-04-24 12:27:18 +02:00
Stefano Fiorucci
19a46af9da
add __eq__ method to SparseEmbedding ( #7574 )
...
* add __eq__ method to SparseEmbedding
* reno
* improve reno
2024-04-23 19:03:41 +02:00
Julian Risch
9c56dbe288
test: Make ContextRelevanceEvaluator integration test more robust ( #7584 )
2024-04-23 16:01:25 +00:00
Julian Risch
07307709ee
test: Make FaithfulnessEvaluator integration test more robust ( #7582 )
2024-04-23 15:44:00 +00:00
Stefano Fiorucci
081757c6b9
test: replace mistral-7b with zephyr-7b-beta in tests ( #7576 )
...
* replace mistral-7b with gemma-2b-it in tests
* rm wrong comment
* change model
2024-04-23 13:56:07 +02:00
Julian Risch
d7638cfd4b
refactor: FaithfulnessEvaluator specifies inputs explicitly ( #7548 )
...
* specify inputs explicitly. move out examples
* Update haystack/components/evaluators/faithfulness.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-22 12:52:10 +00:00
Julian Risch
b12e0db134
feat: Add ContextRelevanceEvaluator component ( #7519 )
...
* feat: Add ContextRelevanceEvaluator component
* reno
* fix expected inputs and example docstring
* remove responses parameter from tests
* specify inputs explicitly
* add new evaluator to api reference docs
2024-04-22 14:10:00 +02:00
Massimiliano Pippi
3a80c866c9
fix: do not use reserved attributes in the logger ( #7545 )
...
* avoid using reserved keywords in the logger
* make the tests independent from the log level
* relnotes
2024-04-12 14:07:18 +00:00
Massimiliano Pippi
2bad5bcb96
refactor: AnswerExactMatchEvaluator component inputs ( #7536 )
...
* refactor component inputs
* release notes
* Update class docstring
* pylint
* update existing note instead of creating a new one
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-04-12 06:59:16 +00:00
Silvano Cerza
6a8834e43e
fix: Fix corner case when running Pipeline that causes it to get stuck in a loop ( #7531 )
...
* Fix corner case when running Pipeline that causes it to get stuck in a loop
* Update haystack/core/pipeline/pipeline.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-04-11 16:39:38 +02:00
Madeesh Kannan
b1760add56
feat: Add support for pipeline deserialization callbacks ( #7518 )
...
* feat: Add support for deserialization callbacks
* Lint
* Fix type hint for older Python versions
* Apply suggestions from code review
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Lint
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-04-10 17:47:14 +02:00
Madeesh Kannan
fd84cd5f9a
feat: Add support for returning intermediate outputs of pipeline components ( #7504 )
...
* feat: Add support for returning intermediate outputs of pipeline components
The `pipeline.run` method has been extended to accept a set of component
names whose inputs are returned in addition to the outputs of leaf components.
* Add reno
* Lint
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-04-10 17:16:00 +02:00
David S. Batista
9a9c8aa1c8
feat: implementing evalualtion results API ( #7520 )
...
* initial import
* adding tests
* attending PR comments
* fixing tests
* updating tests
* updating tests and code
* renaming
* fixing linting issues
* adding release notes
* adding docstrings
* latest fixes
2024-04-10 13:34:03 +00:00
Julian Risch
e974a23fa3
docs: Fix eval metric examples in docstrings ( #7505 )
...
* fix eval metric docstrings, change type of individual scores
* change import order
* change exactmatch docstring to single ground truth answer
* change exactmatch comment to single ground truth answer
* reverted changing docs to single ground truth
* add warm up in SASEvaluator example
* fix FaithfulnessEvaluator docstring example
* extend FaithfulnessEvaluator docstring example
* Update FaithfulnessEvaluator init docstring
* Remove outdated default from LLMEvaluator docstring
* Add examples param to LLMEvaluator docstring example
* Add import and print to LLMEvaluator docstring example
2024-04-10 11:00:20 +02:00
Stefano Fiorucci
39be515ba6
skip HF integrations tests if running from fork ( #7517 )
2024-04-09 17:47:13 +02:00
Vladimir Blagojevic
988c360b6d
feat: Azure converter updates ( #7409 )
...
* Initial commit
* Remove old mock tests
* Fix current_last_page_number calculation
* Carry over unit tests from the other side
* Update pydocs, skip failing tests
* Fix pylint and mypy
* Minor adjustments
* Add release note
* Minor touch ups
* Resolve Document unique id issue by using custom id calculation
* Better hashing, add unit tests
* Small fixes
2024-04-09 09:45:06 +02:00
Stefano Fiorucci
eff53a9131
feat: HuggingFaceAPIDocumentEmbedder ( #7485 )
...
* add HuggingFaceAPITextEmbedder
* add HuggingFaceAPITextEmbedder
* rm unneeded else
* wip
* small fixes
* deprecation; reno
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* make params mandatory
* changes requested
* fix test
* fix test
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-08 15:06:26 +02:00
Stefano Fiorucci
c91bd49cae
feat: HuggingFaceAPITextEmbedder ( #7484 )
...
* add HuggingFaceAPITextEmbedder
* add HuggingFaceAPITextEmbedder
* rm unneeded else
* small fixes
* changes requested
* fix test
2024-04-08 14:22:54 +02:00
David S. Batista
aae2b31359
fix: typo in sas_evaluator arg ( #7486 )
...
* fixing typo on SAS arg
* fixing tests
* fixing tests
2024-04-08 10:21:37 +02:00
Stefano Fiorucci
0dbb98c0a0
feat: HuggingFaceAPIChatGenerator ( #7480 )
...
* draft
* docstrings and more tests
* deprecation; reno
* pydoc config
* better error messages
* wip
* add test
* better docstrings
* deprecation; reno
* pylint
* typo
* rm unneeded else
* rm unneeded else
* fixes from feedback
* docstring showing the enum
* improve docstring
* make params mandatory
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* document enum
* Update haystack/utils/hf.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* mandatory params
* fix test
* fix test
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-05 18:48:34 +02:00
Stefano Fiorucci
1d083861ff
feat: HuggingFaceAPIGenerator ( #7464 )
...
* draft
* docstrings and more tests
* deprecation; reno
* pydoc config
* better error messages
* rm unneeded else
* make params mandatory
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* document enum
* Update haystack/utils/hf.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* fix test
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-05 18:48:13 +02:00
Silvano Cerza
ff269db12d
Fix unit tests failing if HF_API_TOKEN is set ( #7491 )
2024-04-05 18:05:43 +02:00
Vladimir Blagojevic
c3b96392fd
feat: Use all HTMLToDocument extractors until content is extracted ( #7452 )
...
* Use all HTMLToDocument extractors until content is extracted
* Add release note
* Minor doc update
* Improvements, unit test fixes
* Add try_others init param, update tests
* Update haystack/components/converters/html.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* PR feedback - Stefano
* Improve reno release note, add reference
* little fixes
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-04-05 16:02:34 +02:00
Julian Risch
9d02dc607a
feat: Add FaithfulnessEvaluator component ( #7424 )
...
* draft FaithfulnessEvaluator
* reno
* calculate score per statement and aggregate
* Update release note
* update default values in tests and fix import path
* remove instructions, inputs, outputs params
* remove unused imports
* add expected format example to docstring
* remove name 'llm' from tests and docstring
2024-04-04 16:33:59 +00:00
Julian Risch
8ef6062748
refactor: Remove name 'llm' from LLMEvaluator output ( #7479 )
2024-04-04 15:19:30 +00:00
Silvano Cerza
8b8a93bc0d
refactor: Rename DocumentMeanAveragePrecision and DocumentMeanReciprocalRank ( #7470 )
...
* Rename DocumentMeanAveragePrecision and DocumentMeanReciprocalRank
* Update releasenotes
* Simplify names
2024-04-04 17:04:59 +02:00
Silvano Cerza
bdc25ca2a0
feat: Add DocumentMeanReciprocalRank ( #7468 )
...
* Add DocumentMeanReciprocalRank
* Fix float precision error
2024-04-04 14:55:37 +02:00
Silvano Cerza
7799909069
feat: Add DocumentMeanAveragePrecision ( #7461 )
...
* Add DocumentMeanAveragePrecision
* Remove questions input
* Update docstrings
* Update haystack/components/evaluators/document_map.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-04 14:15:45 +02:00
Silvano Cerza
dc87f51759
refactor: Remove questions inputs from evaluators ( #7466 )
...
* Remove questions input from AnswerExactMatchEvaluator
* Remove questions input from DocumentRecallEvaluator
2024-04-04 14:14:18 +02:00
Silvano Cerza
12acb3f12e
feat: Add SASEvaluator ( #7428 )
...
* Add SASEvaluator
* Add release notes
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Simplify similarity calculation with bi-encoders models
* Fix linting
* Update docstrings
* Move tensor to CPU after calculating cosine similarity
* Fix CI failing
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-04 10:10:41 +02:00
Ashwin Mathur
1c7d1618d8
Add truncate and normalize parameters to TEI Embedders ( #7460 )
2024-04-03 16:41:30 +02:00
Vladimir Blagojevic
d83af92270
feat: Update searchapi format, default to Google, allow search engine selection ( #7453 )
...
* Update searchapi payload
* Add release note
* PR feedback - Stefano
* Adjust unit test for mandatory engine search_param field
2024-04-03 10:48:50 +02:00
Nicola Procopio
42c5b7af32
feat: added dimensions parameters to Azure OpenAI Embedders ( #7449 )
...
* added dimensions parameter to AzureOpenAIEmbedders
* created releasenote
* update release note
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-04-02 14:04:16 +02:00
Silvano Cerza
6e289698e9
fix: Fix Pipeline.run() getting stuck in a loop even though there are components that can run ( #7434 )
2024-03-28 12:31:36 +01:00
Vladimir Blagojevic
ce8e114769
feat: DynamicChatPromptBuilder add templating to all user/system messages ( #7423 )
2024-03-27 15:34:50 +01:00
Silvano Cerza
58d91b64dc
Fix: Fix Pipeline.run() running components with only defaults in the wrong order ( #7426 )
...
* Fix Pipeline.run() running components with only defaults in the wrong order
* Add release notes
2024-03-26 16:55:31 +01:00
Silvano Cerza
685343d13f
feat: Add DocumentRecallEvaluator ( #7399 )
...
* Add DocumentRecallEvaluator
* Fix mypy error
* Simplify recall logic and change output for single hit mode
* Remove unused import
* Add comment for RecallMode fields
* Reword RecallMode comments
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-03-26 16:15:03 +01:00
Stefano Fiorucci
e26ee0f1db
refactor!: make TGI generators compatible with huggingface_hub>=0.22.0 ( #7425 )
...
* progress
* progress
* better lazy imports
* fixes
* reno
2024-03-26 16:10:06 +01:00
David S. Batista
fcd48d662c
test: HuggingFaceLocalGenerator test stopwords ( #7416 )
...
* initial import
* Update test/components/generators/test_hugging_face_local_generator.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* attending PR comments
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-03-26 12:39:02 +01:00
Silvano Cerza
f398b29e7f
feat: Change outputs of AnswerExactMatchEvaluator ( #7390 )
...
* Change outputs of AnswerExactMatchEvaluator
* Changes scores to return the number of matches per question
* Revert "Changes scores to return the number of matches per question"
This reverts commit e4358720793d4584b0b961402d4557c50c4c2381.
* Change output names
2024-03-26 10:57:59 +01:00
Stefano Fiorucci
6925e3a2e1
refactor!: Improve PyPDFToDocument ( #7362 )
...
* first draft
* rm kwargs from protocol
* Simplify
* no breaking changes
* reno
* one more test of the deprecated registry
2024-03-26 10:09:29 +01:00
Julian Risch
bfd0d3eacd
feat: Add new LLMEvaluator component ( #7401 )
...
* draft llm evaluator
* docstrings
* flexible inputs; validate inputs and outputs
* add tests
* add release note
* remove example
* docstrings
* make outputs parameter optional. default:
* validate init parameters
* linting
* remove mention of binary scores from template
* make examples and outputs params non-optional
* removed leftover from optional outputs param
* simplify building examples section for template
* validate inputs and outputs in examples are dict with str as key
* fix pylint too-many-boolean-expressions
* increase test coverage
2024-03-25 07:05:27 +01:00
Stefano Fiorucci
c789f905bc
refactor: pass a role string to OpenAI API ( #7404 )
...
* draft
* rm unused imports
2024-03-22 09:36:56 +01:00