sahusiddharth
3d17e6ff76
changed metadata to meta ( #6605 )
2023-12-21 12:39:58 +01:00
Ashwin Mathur
fc88ef7076
feat: Add HuggingFace TEI Embedders - HuggingFaceTEITextEmbedder
and HuggingFaceTEIDocumentEmbedder
( #6602 )
...
* Add TEI Embedders
* Add release notes
* Update release notes with usage examples
2023-12-21 12:16:36 +01:00
Silvano Cerza
8a513f3b8c
test: Add fixture to block requests in tests ( #6585 )
...
* Add fixture to block requests in tests
* Mark tests making requests as integration
2023-12-21 08:51:54 +01:00
ZanSara
5a0f0ce22f
feat: Multiplexer
( #6592 )
...
* move functions
* tests
* reno
* add component
* reno
* add tests
* mypy
* pylint
* logger
* module name
2023-12-20 11:03:22 +01:00
Silvano Cerza
e836fd6875
fix: Fix Pipeline.connect()
when multiple compatible sockets are found ( #6594 )
...
* Fix connect not picking the correct socket
* Add release notes
2023-12-20 11:01:18 +01:00
Silvano Cerza
f224f991be
Change DocumentWriter default policy from DuplicatePolicy.FAIL to DuplicatePolicy.NONE ( #6596 )
2023-12-19 17:46:16 +01:00
ZanSara
f877704839
chore: extract type serialization ( #6586 )
...
* move functions
* tests
* reno
2023-12-19 14:16:20 +01:00
Vladimir Blagojevic
2dd5a94b04
feat: Add RAG based OpenAPI service integration ( #6555 )
...
* Add OpenAPIServiceConnector and OpenAPIServiceToFunctions
* Add release note
* Add test deps
* Better docs on OpenAPI spec reqs, improve tests
* Silvano PR feedback
2023-12-19 13:27:41 +01:00
Stefano Fiorucci
94cfe5d9ae
feat!: HTMLToDocument
- allow choosing the boilerpy3 extractor ( #6582 )
...
* allow extractor customizability
* release note
* typo
2023-12-19 10:52:12 +01:00
Sebastian Husch Lee
dcf37c5173
feat: Extractive QA answer deduplication ( #6459 )
...
* Add answer deduplication
* Fix test
* Handle None case
* Release notes
* Handle cases where documents or answer spans could be None
* Adding checks for Nones and satisfying mypy
* Add option to turn off deduplication
* Adding unit tests
* Refactored tests to use fixtures
* Added overlap_threshold to run
* Update test
* Fixes related to the merge
* Remove casting, use direct variable names
* Move out if statement and add new test for it
* Update if statement to match comment
* Update how if statements work
2023-12-18 19:27:04 +01:00
Sebastian Husch Lee
c294b8ac8c
feat: Add auto device checks and model_kwargs
to TransformersSimilarityRanker
( #6561 )
...
* Add device checking and model_kwargs like we do in ExtractiveReader
* Add release notes
* Make a utility function for the device checking
* Better warning message and updated ExtractiveReader to use the util function
* Add unit tests for get_device
* Fix pylint
2023-12-18 15:13:42 +01:00
Sebastian Husch Lee
3e0e81b1e0
feat: Add meta_fields_to_embed
to TransformersSimilarityRanker
( #6564 )
...
* Add initial implementation following SentenceTransformersDocumentEmbedder
* Add test for embedding metadata
* Add release notes
* Update name
* Fix tests and to dict
* Fix release notes
2023-12-18 11:28:16 +01:00
Massimiliano Pippi
0ac1bdc6a0
refactor!: uniform run api for LocalWhisperTranscriber ( #6542 )
...
* uniform run api for LocalWhisperTranscriber
* add relnote
* fix linter
2023-12-18 10:47:46 +01:00
Stefano Fiorucci
2f034d3c97
refactor!: Converters - standardize inputs ( #6540 )
...
* standardize converters inputs: first draft
* fix precommit
* fix precommit 2
* fix precommit 3
* add default for optional param
* rm leftover
* install boilerpy in linting workflow
* add boilerpy3 to the core dependencies
* add reno
* remove boilerpy3 installation from test workflow
* fix pylint: import order and unused import
* fix import order
* add release note
* better Tika docstring
* rm boilerpy from linting
* leftover
* md link brackets
* feat: Converters - allow passing `meta` in the `run` method (#6554 )
* first impl for html
* progressing on other components
* fix test
* add tests - run with meta
* release note
* reintroduce patches wrongly deleted
* add patch in test
* fix tika test
* Update haystack/components/converters/azure.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Update releasenotes/notes/converters-standardize-inputs-ed2ba9c97b762974.yaml
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* simplify test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-12-15 16:41:35 +01:00
Stefano Fiorucci
cf47abdff5
chore: simplify the management of test dependencies ( #6559 )
...
* remove audio dep group
* extract dependencies
* beautify
* rm one step
2023-12-15 16:40:41 +01:00
Vladimir Blagojevic
c642695ec0
feat: Add FileTypeRouter markdown support ( #6551 )
...
* Add FileTypeRouter markdown support
* Add releae note
2023-12-14 16:30:57 +01:00
Massimiliano Pippi
09abcc1d4c
allow connecting the same components multiple times ( #6530 )
2023-12-12 16:01:09 +01:00
Julian Risch
25a6eaae05
feat!: Rename ExtractiveReader's confidence_threshold
to score_threshold
( #6532 )
...
* rename to score_threshold
* Update haystack/components/readers/extractive.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-12-12 15:12:28 +01:00
Silvano Cerza
18dbce25fc
refacotr: Refactor answer dataclasses ( #6523 )
...
* Refactor answer dataclasses
* Add release notes
* Fix tests
* Fix end to end tests
* Enhance ExtractiveReader
2023-12-11 18:50:49 +01:00
Vladimir Blagojevic
628e8aa3d4
feat: Improve getting started examples ( #6510 )
...
* Improve rag and indexing pipelines
* Update examples
* Simplify user interface and code, improve embedder model
* Improve default vals for embedder
* resolve typing
* resolve typing 2
* Fix unit test
---------
Co-authored-by: Timo Möller <timo.moeller@deepset.ai>
2023-12-09 19:01:13 +01:00
bogdankostic
728383a149
fix: Make TransformersSimilarityRanker
run with single document list ( #6503 )
...
* Make `TransformersSimilarityRanker` run with single document list
* Add release note
* Remove unused import in test
2023-12-08 16:18:46 +01:00
Bijay Gurung
c5342d1110
fix: Prevent invalid answer from being selected in ExtractiveReader ( #6460 )
...
* Fix invalid answer being selected issue on ExtractiveReader
* Rename variables to not shadow arguments
2023-12-06 09:49:02 +01:00
Massimiliano Pippi
1b247ca395
clean up tests ( #6488 )
2023-12-05 15:49:10 +01:00
Vladimir Blagojevic
008a322023
feat: Add Indexing Pipeline ( #6424 )
...
* Add build_indexing_pipeline utils function
* Pylint fixes
* Move into another package to avoid circular deps
* Revert change
* Revert haystack/utils/__init__.py change
* Add example
* Use DocumentStore type, remove typing checks
2023-12-04 16:08:53 +01:00
ZanSara
a38f871dbd
feat: Add RAG pipeline ( #6461 )
...
* add rag pipeline
* Update examples/getting_started/rag.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-12-04 15:25:29 +01:00
Stefano Fiorucci
4912f7cb58
refactor!: improve the deserialization logic for components that use a Document Store ( #6466 )
...
* improve deserialization
* rm ds decorator
* improve tests
* fix pylint
* rm decorator from module init
* rm decorator
* rm decorator from factory
* fix tests
* release note
* rm print
2023-12-04 15:17:28 +01:00
Massimiliano Pippi
a86807b834
move Cohere generator into dedicated integration ( #6475 )
2023-12-04 11:16:12 +01:00
Vladimir Blagojevic
b9bf83bbef
feat: Allow flat dictionary Pipeline.run()
inputs ( #6413 )
...
* Initial implementation, release note, update API and unit test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-30 14:37:55 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker ( #6450 )
2023-11-29 19:24:25 +01:00
Massimiliano Pippi
84da80c1f3
chore: make core
tests layout consistent ( #6449 )
...
* move unit tests up
* move tests up one dir, make them unit
2023-11-29 18:58:44 +01:00
Silvano Cerza
831d0611d9
feat: Change default DuplicatePolicy
in DocumentStore.write_documents()
( #6438 )
...
* Change default DuplicatePolicy in DocumentStore.write_documents()
* Add release notes
2023-11-28 12:30:17 +01:00
Massimiliano Pippi
00e1dd6eb8
chore: rearrange the core
package, move tests and clean up ( #6427 )
...
* rearrange code
* fix tests
* relnote
* merge test modules
* remove extra
* rearrange draw tests
* forgot
* remove unused import
2023-11-28 09:58:56 +01:00
Silvano Cerza
e6637f5ec2
Fix all tests
2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
Silvano Cerza
fd16ec63cb
refactor: Add support for new filters declaration ( #6397 )
...
* Rework filter logic for InMemoryDocumentStore to support new filters
declaration
* Fix legacy filters tests
* Simplify logic and handle dates comparison
* Rework MetadataRouter to support new filters
* Update docstrings
* Add release notes
* Fix linting
* Avoid duplicating filters specifications
* Handle corner case
* Simplify docstring
* Fix filters logic and tests
* Fix Document Store testing legacy filters tests
2023-11-24 11:22:46 +01:00
SebastjanPrachovskij
28c2b09d90
Add SearchApi integration for websearch ( #6400 )
2023-11-24 11:18:43 +01:00
ZanSara
c45d8c39c7
fix: make ExtractiveReader
handle situations where token_to_chars
returns None instead of a (start, end) tuple ( #6382 )
...
* fix reader bug
* add test
* log
* fix logging
* improve error message
2023-11-24 09:08:56 +01:00
ZanSara
f3b73030a1
Fix wrong import in cohere.py
and change model
to model_name
for consistency ( #6405 )
...
* Fix wrong import in `cohere.py`
* model -> model_name
* fix tests too
* black
* typo
* typo
2023-11-23 19:54:50 +01:00
ZanSara
4ec6a60a76
feat: CohereGenerator
( #6395 )
...
* added CohereGenerator with unit tests
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. added releasenote
2. removed commented files in test-cohere_generators
3. removed unused imports
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. move client creation to __init__
2. remove dict casting of metadata in run
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* few fixes
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* add cohere to git workflows
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. CohereGenerator as top level import in generators
2. small change in doc string
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. corrected git workflow files for cohere import
2. changed api key env var from CO_API_KEY to COHERE_API_KEY
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* added cohere in missed out workflow installs
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. Removed default_streaming_callback from cohere.py and added in test.
2. Added kwargs doc strings for CohereGenerator
3. removed type hints for metadata and replies
4. use COHERE_API_URL instead of hard coded URL.
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* move out of folder
* black
* fix tests
* feedback
* black
* remove api key from tests
* read api key from env var if missing
* typo
* black
* missing import
---------
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
Co-authored-by: sunilkumardash9 <sunilkumardash9@gmail.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-11-23 17:21:07 +01:00
jlonge4
c44e2cf49b
feat: add microsoft pptx file converter ( #6399 )
...
* Create pptx.py
* feat: pptx converter import __init__.py
* feat: add pptx import __init__.py
* feat: add python-pptx dependency
* feat: add sample pptx for testing
* feat: add pptx file-converter test
* feat: release note pptx-file-converter-3e494d2747637eb2.yaml
* feat: Update releasenotes/notes/pptx-file-converter-3e494d2747637eb2.yaml
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* feat: refactor haystack/nodes/file_converter/pptx.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* fix imports
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-23 16:46:41 +01:00
Stefano Fiorucci
b0b514778d
fix!: make PyPDFToDocument
JSON-serializable ( #6396 )
...
* add registry
* release not
* add checks
* rm superflous check
* fix typo
* rm print :-)
2023-11-23 15:37:20 +01:00
Ben Heckmann
a492771b4d
feat: PreProcessor split by token (tiktoken & Hugging Face) ( #5276 )
...
* #4983 implemented split by token for tiktoken tokenizer
* #4983 added unit test for tiktoken splitting
* #4983 implemented and added a test for splitting documents with HuggingFace tokenizer
* #4983 added support for passing HF model names (instead of objects) and added an example to the HF token splitting test
* mocked HTTP model loading in unit tests, fixed pylint error
* fix lossy tokenizers splitting, use LazyImport, ignore UnicodeEncodeError for tiktoken
* reno
* rename reno file
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 12:26:37 +01:00
Vladimir Blagojevic
e04a1f16bb
feat: Add DynamicPromptBuilder to Haystack 2.x ( #6328 )
...
* Add DynamicPromptBuilder
* Improve pydocs, add unit tests
* Add release note
* Make expected_runtime_variables optional
* Add pydocs usage example
* Add more pydocs
* Remove test markers
* Update type in unit test
* Update after canals upgrade
* add to api ref
* docstrings updates
* Update test/preview/components/builders/test_dynamic_prompt_builder.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/components/builders/dynamic_prompt_builder.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Deparametrize init test
* Rename expected_runtime_variables to runtime_variables
* Rephrase docstring so meaning is clearer
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-23 11:41:57 +01:00
Vladimir Blagojevic
e57a593d2e
fix: Revert back to straightforward PromptBuilder ( #6335 )
...
* Revert back to simple PromptBuilder
* Updating to full typing
2023-11-23 11:34:06 +01:00
Vladimir Blagojevic
cfff0d5212
Rename file_converters to converters ( #6390 )
2023-11-23 10:28:40 +01:00
Vladimir Blagojevic
b557f3035e
feat: Add ConditionalRouter
Haystack 2.x component ( #6147 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-23 10:28:08 +01:00
Stefano Fiorucci
e91f7a8a4d
refactor!: improve the public interface of Generators ( #6374 )
...
* merge lazy import blocks
* refactor generators
* release note
* revert unrelated changes
2023-11-22 10:40:48 +01:00
ZanSara
b751978d65
Extends input types of RemoteWhisperTranscriber
( #6218 )
...
* fix tests
* reno
* tests
* retain file name
* paths are strings for openai sdk
* streams->sources
* feedback
* always add name to file
* mypy
* test placeholder with extension
* fallback
* paths
* path test
* path must be a string
* fix test
2023-11-22 09:57:45 +01:00
Ashwin Mathur
e6c8374562
feat: Add ByteStream
metadata and other metadata to Documents
created by HTMLToDocument
( #6304 )
...
* Refactor HTMLToDocument
* Add release notes
* Add additional tests
* remove progress bar
* Add additional test for metadata
* remove progress bar from release notes
* Update tests
* Use truthiness checks instead of is not None
2023-11-21 21:44:02 +01:00