Silvano Cerza
e0afe274d8
feat: Add method to set a Component input type with default value ( #6728 )
...
* Add method to set a Component input type with default value
* Add release notes
* Fix linting
* Stick to old set_input_types for now
2024-01-12 16:56:52 +01:00
ZanSara
288ed150c9
feat!: Rename model_name
or model_name_or_path
to model
in all Embedder classes ( #6733 )
...
* rename model parameter in the openai doc embedder
* fix tests for openai doc embedder
* rename model parameter in the openai text embedder
* fix tests for openai text embedder
* rename model parameter in the st doc embedder
* fix tests for st doc embedder
* rename model parameter in the st backend
* fix tests for st backend
* rename model parameter in the st text embedder
* fix tests for st text embedder
* fix docstring
* fix pipeline utils
* fix e2e
* reno
* fix the indexing pipeline _create_embedder function
* fix e2e eval rag pipeline
* pytest
2024-01-12 15:30:17 +01:00
ZanSara
ce7abc9bde
feat!: Rename model_name
or model_name_or_path
to model
in all Transcriber classes ( #6731 )
...
* rename model parameter in local transcriber
* fix tests for local transcriber
* rename model parameter in remote transcriber
* fix tests for remote transcriber
* reno
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-01-12 14:40:30 +01:00
sahusiddharth
dbdeb8259e
feat: rename model_name
or model_name_or_path
to model
in generators ( #6715 )
...
* renamed model_name or model_name_or_path to model
* added release notes
* Update releasenotes/notes/renamed-model_name-or-model_name_or_path-to-model-184490cbb66c4d7c.yaml
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2024-01-12 12:58:01 +01:00
Stefano Fiorucci
80c3e6825a
fix: serialize/deserialize torch dtype in the components that need it ( #6713 )
...
* first draft for ranker
* same for the reader
* consider also bnb_4bit_compute_dtype
* dtype serialization in hugging_face_local_generator
* add release note
* address dtype defined in huggingface_pipeline_kwargs
* test quantization options in reader
* fix
* serialize quantization_config
* test quantization_config serialization
* address feedback
* fix typo
2024-01-12 12:22:45 +01:00
Massimiliano Pippi
9e63492440
fix: Fix error when calling dir()
on a component instance ( #6730 )
...
* do not copy over __dict__ when creating the component class
* relnote
* let test run on core/*
2024-01-12 11:56:03 +01:00
ZanSara
60780ce897
feat: Tweak CacheChecker
output type ( #6719 )
...
* specify cache checker output type
* (de)serialization
* tests
* add default value for type
* reno
* mypy
* feedback
* reduce diff
* reduce diff
* reno
2024-01-11 12:33:26 +01:00
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores
namespace ( #6714 )
...
* remove symbols under the haystack.document_stores namespace
* Update haystack/document_stores/types/protocol.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* fix
* same for retrievers
* leftovers
* more leftovers
* add relnote
* leftovers
* one more
* fix examples
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00
Ashwin Mathur
374a937663
feat: Add calculate_metrics
and MetricsResult
( #6680 )
...
* Add calculate_metrics, MetricsResult, Exact Match
* Add additional tests for metric calculation
* Add release notes
* Add docstring for Exact Match metric
* Remove Exact Match Implementation
* Update release notes
* Remove unnecessary metrics implementation
* Simplify logic to run supported metrics
* Add some evaluation tests
* Fix linting
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-01-10 10:26:44 +01:00
Madeesh Kannan
e6d6ce1c73
feat: Add NamedEntityExtractor
component ( #6689 )
...
* feat: Add `NamedEntityExtractor`component
This component accepts a list of `Document`s which it annotates with named entities. The annotations are stored in the `meta` dictionary of each `Document` under a specific key.
The component currently support two backends for the annotation models: Hugging Face `transformers` and spaCy.
* Address comments
* Expand release note
* Add the `[torch]` extra package specifier to the lazy import
* Remove dead code
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-01-09 17:56:20 +01:00
ZanSara
9fe80fd225
feat: Add example script about routing metadata to converters in indexing pipelines ( #6702 )
...
* support single metadata dict in markdown2document
* reno
* unwrap list
* direct key access
* typing
* add example of indexing pipeline using Multiplexer
* reno
2024-01-09 14:59:22 +01:00
ZanSara
abd16ab796
feat: support single metadata dictionary in MarkdownToDocument
( #6629 )
...
* support single metadata dict in markdown2document
* reno
* unwrap list
* direct key access
* typing
* add explicit test
2024-01-09 14:44:39 +01:00
Massimiliano Pippi
9ace6bf63d
feat: store input's default value in InputSocket
( #6651 )
...
* track default value in sockets
* remove dead code
* include default value in socket description
* add unit test
* add relnote
* unused import
* clarify
2024-01-09 12:17:46 +01:00
ZanSara
175b5baf45
feat: support single metadata dictionary in AzureOCRDocumentConverter
( #6635 )
...
* support single metadata dict in azureconverter
* reno
* tests
* Update releasenotes/notes/single-meta-in-azureconverter-ce1cc196a9b161f3.yaml
2024-01-09 10:49:37 +01:00
ZanSara
974d65f30a
feat: support single metadata dictionary in TikaDocumentConverter
( #6698 )
...
* reno
* converter
* test
* comment
2024-01-09 09:49:47 +01:00
Massimiliano Pippi
93b2aaee09
chore: move DocumentJoiner
to new joiners
package ( #6692 )
...
* move DocumentJoiner to new joiners package
* relnote
* leftovers
* fix docstrings generation
* fix unrelated pydoc misconfiguration
* more unrelated work, yay!
* fix assertions
2024-01-08 22:06:27 +01:00
Vladimir Blagojevic
9e0b58784f
feat: Improve UrlCacheChecker, make it more generic ( #6699 )
...
* Rename UrlCacheChecker to CacheChecker, make it field generic
* Add release note
2024-01-08 16:15:27 +01:00
Sebastian Husch Lee
beade1cef9
feat: Add scaling and thresholding of the similarity ranker scores ( #6683 )
...
* Add scale_score functionality to the TransformersSimilarityRanker
* Updated test to check scores
* Use pytest approx when comparing floats
* Updated how scale score works and added calibration factor. Started to add score threshold.
* Add support for score_threshold
* Add some parameters to the run method
* Add release notes
* Fix mypy
* Be more tolerant on the score values
* Adding unit test for scale_score=False
* Add unit test for score threshold
* Update tests
* Rename test
* Fix typo
* PR comments
2024-01-08 09:05:24 +01:00
Vladimir Blagojevic
552f0e394b
feat: Add Azure embedders support ( #6676 )
...
* Add Azure embedders
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-01-05 15:49:25 +01:00
Vladimir Blagojevic
b7159ad7c2
feat: Add AzureOpenAIGenerator and AzureOpenAIChatGenerator ( #6648 )
...
* Add AzureOpenAIGenerator and AzureOpenAIChatGenerator
2024-01-05 15:48:28 +01:00
Vladimir Blagojevic
090d66b531
feat: Update OpenAIChatGenerator to handle both tools and functions calling ( #6639 )
...
* Handle tools parameter in OpenAIChatGenerator
* Handle tools/functions parameter in OpenAIChatGenerator streaming mode
* Adjust OpenAPIServiceConnector to handle tools parameter
* We never deal with functions/tools in non-chat generator
* Add release note
2023-12-28 17:29:47 +01:00
Stefano Fiorucci
c773c30c66
refactor!: rename all remaining metadata
to meta
( #6650 )
...
* change metadata to meta
* release note
2023-12-28 12:18:15 +01:00
Vladimir Blagojevic
ef2f6bd681
feat: Split DynamicPromptBuilder
and DynamicChatPromptBuilder
( #6557 )
...
* Split DynamicPromptBuilder
* Add release note
* Julian PR feedback
* dynamicchatbuilder lg upd
* dynamicpromptbuilder lg upd
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-12-26 15:27:43 +01:00
Vladimir Blagojevic
506ab81d26
chore: Rename GPT generators, deprecate old names ( #6626 )
2023-12-22 19:37:29 +01:00
ZanSara
c0f1dab454
feat: support single metadata dictionary in PyPDFToDocument
( #6615 )
...
* support single metadata dict in pypdf2document
* improve tests
* tests
* remove line
2023-12-22 14:13:11 +01:00
ZanSara
ff55985e2d
feat: support single metadata dictionary in HTMLToDocument
( #6613 )
...
* support single metadata in HTMLToDocument
* reno
* docstring
2023-12-21 16:45:31 +01:00
Vladimir Blagojevic
4d08be0c2a
feat: Update OpenAI Python Client in Haystack 2.x ( #6584 )
...
* Update openai python client
* Add release note
* Consolidate multiple mock_chat_completion into one
* Ensure all components have api_base_url, organization params
* Update tests
* Enable function calling
* Oversight
* Minor fixes, add streaming test mocks
* Apply suggestions from code review
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* metadata -> meta
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-12-21 16:21:24 +01:00
ZanSara
cf79aa1485
feat: add support for single meta dict in TextFileToDocument
( #6606 )
...
* add support for single meta dict
* reno
* reno
* mypy
* extract to function
* docstring
* mypy
2023-12-21 14:21:17 +01:00
sahusiddharth
3d17e6ff76
changed metadata to meta ( #6605 )
2023-12-21 12:39:58 +01:00
Ashwin Mathur
fc88ef7076
feat: Add HuggingFace TEI Embedders - HuggingFaceTEITextEmbedder
and HuggingFaceTEIDocumentEmbedder
( #6602 )
...
* Add TEI Embedders
* Add release notes
* Update release notes with usage examples
2023-12-21 12:16:36 +01:00
ZanSara
ae5297bfd7
example: self-correcting loop for RAG ( #6420 )
...
* add example
* docstrings
* reno
* use condrouter
* move functions
* tests
* reno
* add component
* reno
* add tests
* mypy
* pylint
* logger
* module name
* multiplexer
* draw
* query_multiplexer
* reno
* typo
2023-12-20 11:35:05 +01:00
ZanSara
5a0f0ce22f
feat: Multiplexer
( #6592 )
...
* move functions
* tests
* reno
* add component
* reno
* add tests
* mypy
* pylint
* logger
* module name
2023-12-20 11:03:22 +01:00
Silvano Cerza
e836fd6875
fix: Fix Pipeline.connect()
when multiple compatible sockets are found ( #6594 )
...
* Fix connect not picking the correct socket
* Add release notes
2023-12-20 11:01:18 +01:00
Silvano Cerza
f224f991be
Change DocumentWriter default policy from DuplicatePolicy.FAIL to DuplicatePolicy.NONE ( #6596 )
2023-12-19 17:46:16 +01:00
ZanSara
f877704839
chore: extract type serialization ( #6586 )
...
* move functions
* tests
* reno
2023-12-19 14:16:20 +01:00
Vladimir Blagojevic
2dd5a94b04
feat: Add RAG based OpenAPI service integration ( #6555 )
...
* Add OpenAPIServiceConnector and OpenAPIServiceToFunctions
* Add release note
* Add test deps
* Better docs on OpenAPI spec reqs, improve tests
* Silvano PR feedback
2023-12-19 13:27:41 +01:00
Stefano Fiorucci
94cfe5d9ae
feat!: HTMLToDocument
- allow choosing the boilerpy3 extractor ( #6582 )
...
* allow extractor customizability
* release note
* typo
2023-12-19 10:52:12 +01:00
Sebastian Husch Lee
dcf37c5173
feat: Extractive QA answer deduplication ( #6459 )
...
* Add answer deduplication
* Fix test
* Handle None case
* Release notes
* Handle cases where documents or answer spans could be None
* Adding checks for Nones and satisfying mypy
* Add option to turn off deduplication
* Adding unit tests
* Refactored tests to use fixtures
* Added overlap_threshold to run
* Update test
* Fixes related to the merge
* Remove casting, use direct variable names
* Move out if statement and add new test for it
* Update if statement to match comment
* Update how if statements work
2023-12-18 19:27:04 +01:00
Sebastian Husch Lee
c294b8ac8c
feat: Add auto device checks and model_kwargs
to TransformersSimilarityRanker
( #6561 )
...
* Add device checking and model_kwargs like we do in ExtractiveReader
* Add release notes
* Make a utility function for the device checking
* Better warning message and updated ExtractiveReader to use the util function
* Add unit tests for get_device
* Fix pylint
2023-12-18 15:13:42 +01:00
Ashwin Mathur
46b395eec3
feat: Add Eval and EvaluationResult ( #6505 )
...
* Add initial implementation for Eval and EvaluationResult
* Add release notes
* Update files with suggestions from review
* Remove serialization
* Add eval e2e tests
* Update eval e2e tests
2023-12-18 11:29:09 +01:00
Sebastian Husch Lee
3e0e81b1e0
feat: Add meta_fields_to_embed
to TransformersSimilarityRanker
( #6564 )
...
* Add initial implementation following SentenceTransformersDocumentEmbedder
* Add test for embedding metadata
* Add release notes
* Update name
* Fix tests and to dict
* Fix release notes
2023-12-18 11:28:16 +01:00
Massimiliano Pippi
0ac1bdc6a0
refactor!: uniform run api for LocalWhisperTranscriber ( #6542 )
...
* uniform run api for LocalWhisperTranscriber
* add relnote
* fix linter
2023-12-18 10:47:46 +01:00
Massimiliano Pippi
00fed32024
build: depend on haystack_bm25
instead of rank_bm25
( #6578 )
...
* use the forked package
* switch package dependency
* relnote
* fix package name
2023-12-18 10:47:15 +01:00
Stefano Fiorucci
2f034d3c97
refactor!: Converters - standardize inputs ( #6540 )
...
* standardize converters inputs: first draft
* fix precommit
* fix precommit 2
* fix precommit 3
* add default for optional param
* rm leftover
* install boilerpy in linting workflow
* add boilerpy3 to the core dependencies
* add reno
* remove boilerpy3 installation from test workflow
* fix pylint: import order and unused import
* fix import order
* add release note
* better Tika docstring
* rm boilerpy from linting
* leftover
* md link brackets
* feat: Converters - allow passing `meta` in the `run` method (#6554 )
* first impl for html
* progressing on other components
* fix test
* add tests - run with meta
* release note
* reintroduce patches wrongly deleted
* add patch in test
* fix tika test
* Update haystack/components/converters/azure.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Update releasenotes/notes/converters-standardize-inputs-ed2ba9c97b762974.yaml
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* simplify test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-12-15 16:41:35 +01:00
Vladimir Blagojevic
c642695ec0
feat: Add FileTypeRouter markdown support ( #6551 )
...
* Add FileTypeRouter markdown support
* Add releae note
2023-12-14 16:30:57 +01:00
Massimiliano Pippi
bc45170f4e
chore: add boilerpy3 to the core dependencies ( #6544 )
...
* add boilerpy3 to the core dependencies
* remove boilerpy3 installation from test workflow
* fix pylint: import order and unused import
* fix import order
* add release note
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-12-14 11:53:38 +01:00
Massimiliano Pippi
a55024bee7
fix: do not dump pipeline graph into the debug payload ( #6528 )
2023-12-12 18:24:23 +01:00
Massimiliano Pippi
09abcc1d4c
allow connecting the same components multiple times ( #6530 )
2023-12-12 16:01:09 +01:00
Julian Risch
25a6eaae05
feat!: Rename ExtractiveReader's confidence_threshold
to score_threshold
( #6532 )
...
* rename to score_threshold
* Update haystack/components/readers/extractive.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-12-12 15:12:28 +01:00
Silvano Cerza
18dbce25fc
refacotr: Refactor answer dataclasses ( #6523 )
...
* Refactor answer dataclasses
* Add release notes
* Fix tests
* Fix end to end tests
* Enhance ExtractiveReader
2023-12-11 18:50:49 +01:00