3597 Commits

Author SHA1 Message Date
tstadel
83d3970405
feat: extend PromptBuilder and deprecate DynamicPromptBuilder (#7655)
* feat: add default template to DynamicPromptBuilder

* fix mypy

* fix mypy

* extend PromptBuilder and deprecate DynamicPromptBuilder

* make backward-compatible: optional -> required

* make backward-compatible: _template_string

* make backward-compatible: missing_required_vars error

* add test for no template case

* better docstrings

* some chors

* some chors

* add reno

* revert test_dynamic_prompt_builder.py

* better docstring

* make backward-compatible: reorder init args

* fix tests

* add raises docstring

* make default template required and rework docstrings

* docs chores

* keep to_dict in place for easier review

* remove unnecessary logger

* update docstring
2024-05-23 16:03:39 +02:00
Varun Krishnan
badb05b3ab
feat: allow DocumentJoiner to accept top_k parameter in run method (#7709)
* feat: allow DocumentJoiner to accept top_k parameter in run method

* Added release note for DocumentJoiner top_k fix
2024-05-23 16:03:26 +02:00
Massimiliano Pippi
482f60ec99
fix: exit early if the component receives no documents (#7732)
* exit early if the component receives no documents

* relnote
2024-05-23 09:35:10 +02:00
David S. Batista
a4fc2b66e6
style: adding progress bar to llm-based evaluators (#7726)
* adding progress bar

* fixing typo

* fixing tests

* Update test_llm_evaluator.py

* fixing missing colon

* passing directly to parent

* adding docstrings
2024-05-23 09:22:14 +02:00
Massimiliano Pippi
76224fc781
make SerperDevWebSearch more robust (#7725) 2024-05-22 13:14:39 +02:00
Silvano Cerza
da088140ab
Group up Pipeline unit tests in a single class (#7706) 2024-05-21 16:12:28 +02:00
David S. Batista
e6db1502e6
initial import (#7720) 2024-05-21 15:08:03 +02:00
Stefano Fiorucci
6d27de0b40
fix release note (#7711) 2024-05-17 16:06:03 +02:00
Stefano Fiorucci
7181f6b7e9
feat: change HTML conversion backend from boilerpy3 to Trafilatura (#7705)
* change HTML conversion backed to Trafilatura

* rm unused var
2024-05-17 10:38:47 +02:00
Carlos Fernández
57af95d7ea
add keep-id to DocumentCleaner (#7703) 2024-05-16 19:18:48 +02:00
Carlos Fernández
686a4999cf
feat: widen support of env vars in OpenAI components (#7653)
* add enviroment variables to the _enviroment.py file

* add support for two of the three variables

* Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder.

* Replicate support for env vars in OpenAITextEmbedder.

* Add support for env vars in OpenAIGenerator..

* Add support for env vars in OpenAIChatGenerator.

* add docstrings and reno

* add params to __init__ in OpenAIDocumentEmbedder

* add params to __init__ in OpenAITextEmbedder

* make fully functional implementation of env vars and unit tests

* update reno

* Update haystack/components/embedders/openai_text_embedder.py

* reverse changes to telemetry/_enviroment.py

* Update haystack/components/embedders/openai_text_embedder.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-15 21:58:41 +00:00
Sebastian Husch Lee
af53e8430d
feat: Add inference mode to ExtractiveReader (#7699)
* Add inference mode to ExtractiveReader

* Add release notes
2024-05-15 19:33:57 +00:00
Vladimir Blagojevic
c8d53b3ebf
fix: Adjust serialization to handle PEP-585 generic types (#7690)
* Adjust serialization to handle PEP-585 generic types

* Add reno note

* Simplify

* PEP 585 serialization handling in sys.version_info < (3, 9)
2024-05-15 14:25:19 +02:00
David S. Batista
96b9d3e32a
fix: Adding missing component decorator to AzureOpenAIGenerator (#7698)
* initial import

* adding release notes

* tests avoiding I/O operations

* Update fix-azure-generators-serialization-18fcdc9cbcb3732e.yaml
2024-05-15 10:00:38 +02:00
Massimiliano Pippi
cc1d4b1c80
chore: Simplify Pipeline.run method by moving code to the base class (#7680)
* move graph initialization to the base class

* simplify data normalization

* deepcopy data in base class

* initialize inputs state

* move to_run preparation to the base class

* Test Pipeline._init_to_run()

* Test Pipeline._init_inputs_state()

* Test Pipeline._prepare_component_input_data()

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-05-14 23:25:46 +02:00
David S. Batista
798dc4a4a5
fix: avoid FaithfulnessEvaluator and ContextRelevanceEvaluator return Nan (#7685)
* initial import

* fixing tests

* relaxing condition

* adding safeguard for ContextRelevanceEvaluator as well

* adding release notes
2024-05-14 17:08:51 +02:00
Daria Fokina
cc869b10ad
add pdfminer (#7688) 2024-05-14 13:42:29 +02:00
Madeesh Kannan
2428bc2a92
fix: Pipeline.run correctly returns all outputs when the include_outputs_from parameter is used (#7697)
* fix: `Pipeline.run` correctly returns all outputs when the `include_outputs_from` parameter is used

* Add release note
2024-05-14 12:29:41 +02:00
Vladimir Blagojevic
4352b1688e
fix: Fix NamedEntityExtractor serde (#7684)
* Fix NamedEntityExtractor serde

* Add release note

* Linting, remove unit markers
2024-05-14 12:24:55 +02:00
David S. Batista
75cf35c743
fix: forcing response format to be JSON valid (#7692)
* forcing response format to be JSON valid

* adding release notes

* cleaning up

* Update haystack/components/evaluators/llm_evaluator.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-05-14 10:22:38 +00:00
Sebastian Husch Lee
a2be90b95a
fix: Update device deserialization for components that use local models (#7686)
* fix: Update device deserializtion for SentenceTransformersTextEmbedder

* Add unit test

* Fix unit test

* Make same change to doc embedder

* Add release notes

* Add same change to Diversity Ranker and Named Entity Extractor

* Add unit test

* Add the same for whisper local

* Update release notes
2024-05-14 08:36:14 +02:00
Vladimir Blagojevic
811b93db91
feat: Set ByteStream's mime_type attribute for web based resources (#7681) 2024-05-13 19:44:02 +02:00
Massimiliano Pippi
1d20ac3c5e
chore: extract BasePipeline (#7673)
* extract BasePipeline

* release note

* add missing headers

* move __eq__ to the base class

* proper check type equality, bless the tests
2024-05-10 11:35:15 +02:00
DL
27acb3c970
Update pipeline.py (#7679) 2024-05-09 18:51:48 +00:00
Silvano Cerza
0e1a5a65e8
Make SparseEmbedding a dataclass (#7678) 2024-05-09 15:11:43 +00:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Massimiliano Pippi
02b8a07e31
re-enable linting for the core package (#7677)
* re-enable linting for the core package

* fix docstring
2024-05-09 13:00:16 +00:00
Stefano Fiorucci
dd95def0d1
introduce any-of-labels (#7676) 2024-05-09 11:36:45 +02:00
Massimiliano Pippi
78e11bf764
Remove leftover from Haystack 1.x (#7664) 2024-05-08 17:34:21 +02:00
Massimiliano Pippi
c07cedf168
chore: Stop labelling PRs with 2.x, assuming it's default now (#7665) 2024-05-08 17:34:05 +02:00
Stefano Fiorucci
7c9532b200
fix broken serialization of HFAPI components (#7661) 2024-05-08 17:14:37 +02:00
Stefano Fiorucci
94467149c1
fix: fix serialization of DocumentRecallEvaluator (#7662)
* fix serialization of DocumentRecallEvaluator

* add requested tests
2024-05-08 16:00:49 +02:00
Bilge Yücel
f14bc5330f
Add "SentenceTransformersDiversityRanker" api reference (#7659) 2024-05-07 19:16:05 +02:00
Guest400123064
cd66a80ba2
perf: enhanced InMemoryDocumentStore BM25 query efficiency with incremental indexing (#7549)
* incorporating better bm25 impl without breaking interface

* all three bm25 algos

* 1. setting algo post-init not allowed; 2. remove extra underscore for naming consistency; 3. remove unused import

* 1. rename attribute name for IDF computation 2. organize document statistics as a dataclass instead of tuple to improve readability

* fix score type initialization (int -> float) to pass mypy check

* release note included

* fixing linting issues and mypy

* fixing tests

* removing heapq import and cleaning up logging

* changing indexing order

* adding more tests

* increasing tests

* removing rank_bm25 from pyproject.toml

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-05-03 12:10:15 +00:00
Julian Risch
48c7c6ad26
test: Rename responses and use preds instead of ground truth answers in e2e eval test (#7640)
* rename responses, use preds instead of ground truth answers

* fix typo in component name
2024-05-03 12:48:42 +02:00
Silvano Cerza
34a79e368e
Enhance version bump PR body description (#7644) 2024-05-03 12:45:18 +02:00
Haystack Bot
489349bcae
Update unstable version to 2.2.0-rc0 (#7643)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
v2.2.0-rc0
2024-05-03 11:55:43 +02:00
Silvano Cerza
3b64664eb7 Fix call to git in minor_version_release.yml v2.1.0-rc0 2024-05-03 11:49:00 +02:00
Silvano Cerza
c0fe5e660b
Rework minor_version_release.yml to create PR that bumps version (#7642) 2024-05-03 11:45:25 +02:00
Vladimir Blagojevic
5f813373eb
chore: Update huggingface_hub classes used after library upgrade (#7631)
* Update huggingface_hub classes used after library upgrade

* Fix chat tests

* Update lazy import guard and other references to huggingface_hub>=0.23.0

* In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional

* More fixes

* Add reno note
2024-05-03 10:14:54 +02:00
Silvano Cerza
db87074e68
Fix minor_version_release.yml workflow to work with both 1.x and 2.x (#7630) 2024-05-02 15:23:07 +02:00
Julian Risch
b0284977db
feat: Add document page number of ExtractedAnswer to meta (#7572)
* calculate page number of answer and add to meta

* fix mypy, add reno

* add test

* simplify unit test

* update release note

* undo @patch updates

* extend tests, check page_number type
2024-05-02 14:48:27 +02:00
Mo
2e35f13085
feat: add converter based on pdfminer (#7607)
* Initial commit pdfminer converter

* Revert back naming of argument all_text per pdfminer documentation

* Add the component decorator

* Add release notes

* Reformat code with black

* Remove LTPage and comments

* Update dependencies in pyproject.toml

* Added some tests and incorporated reference doc in docstring

* Added some tests and incorporated reference doc in docstring
2024-05-02 10:36:54 +02:00
Julian Risch
2509eeea7e
refactor: Rename FaithfulnessEvaluator input responses to predicted_answers (#7621) 2024-04-30 16:30:57 +02:00
evanderiel
5de5619abd
Add instance argument to code samples in docstrings for component.py (#7622) 2024-04-30 16:04:06 +02:00
Vladimir Blagojevic
8cb3cecf34
feat: Trace pipeline run input/output data (#7590)
* Trace pipeline run

* Add reno note

* Update tracing tests to check input_data and output_data

* empty

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-04-29 17:29:27 +02:00
Tobias Wochinger
451fae880e
ci: fix catch-all (#7215)
* ci: trigger separate workflow

* ci: temporary use current branch

* ci: fix workflow name

* ci: try with same job name

* ci: try with dispatch

* Revert "ci: try with dispatch"

This reverts commit bd66e56c0697ae97fc2599eebaceff417d9be65c.

* Revert "ci: try with same job name"

This reverts commit 9e2ae5b402758c14a9f812c2e06f820bd3ece767.

* ci: try with workflow call in both cases

* ci: introduce change to trigger CI

* Revert "ci: introduce change to trigger CI"

This reverts commit e3ec07c5e26f114364babea69535183253c801b7.

* ci: add name

* Revert "Revert "ci: introduce change to trigger CI""

This reverts commit 6718585fd24069112e0f773e010056e1d96e3eee.

* ci: improve naming

* ci: further improve naming

* Unset reusable workflow version and use relative path

* Remove CI trigger

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-04-29 14:54:12 +02:00
Bohan Qu
40360e44ff
feat: add required flag for prompt builder inputs (#7553) 2024-04-29 14:21:53 +02:00
Carlos Fernández
d2c87b2fd9
feat: add page_number to metadata in DocumentSplitter (#7599)
* Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705.

* Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x
Solve some minor bugs spotted by tests.

* Update docstrings.

* Add reno.

* Update haystack/components/preprocessors/document_splitter.py

Update docstring from suggestion

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* solve suggestion to improve readability

* fragment tests

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* Update .gitignore

* Update .gitignore

* Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml

* blackening

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-04-29 12:51:18 +02:00
David S. Batista
8d04e530da
test: end2end evaluation tests (#7601)
* initial import

* wip

* cleaning up tests

* fixing tests

* adding context relevance

* reverting some wrong changes to due PyCharm error in refactoring

* building eval pipeline only once

* handling mypy issues
2024-04-26 14:07:05 +00:00