David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter ( #8617 )
...
* wip: initial import
* wip: refactoring
* wip: refactoring tests
* wip: refactoring tests
* making all NLTKSplitter related tests work
* refactoring
* docstrings
* refactoring and removing NLTKDocumentSplitter
* fixing tests for custom sentence tokenizer
* fixing tests for custom sentence tokenizer
* cleaning up
* adding release notes
* reverting some changes
* cleaning up tests
* fixing serialisation and adding tests
* cleaning up
* wip
* renaming and cleaning
* adding NLTK files
* updating docstring
* adding import to init
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* updating tests
* wip
* adding sentence/period change warning
* fixing LICENSE header
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00
David S. Batista
6cceaac15f
docs: add deprecation warning nltk document splitter ( #8628 )
...
* adding deprecation warning
* adding release notes
* adding release notes
* updating message
* Update haystack/components/preprocessors/nltk_document_splitter.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-12 15:16:54 +01:00
Michele Pangrazzi
21d53d0ec6
update default value of 'store_full_path' to False in converters ( #8619 )
2024-12-10 16:03:38 +01:00
Anton Pelykh
6f983a22ca
fix: add missing stream mime type assignment to the LinkContentFetcher
( #8596 )
...
* add missing stream mime type assignment to the `LinkContentFetcher`
* fix release note fmt
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-09 14:51:14 +00:00
ArzelaAscoIi
ed2f37da60
fix: docstring for normalization ( #8604 )
...
* fix: docstring for normalization
* chore: add reno
* fixing docstrings and adding pylint disable too many args
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-12-06 17:13:30 +01:00
Michele Pangrazzi
b32f85cca2
remove deprecated 'converter' init parameter from PyPDFToDocument component ( #8609 )
2024-12-06 15:43:43 +01:00
David S. Batista
3da5bac8c4
refactor: converting some DocumentJoiner methods to staticmethod ( #8606 )
...
* converting some methods to static, since they change/depend on state of the object
* adding release notes
* removing tab
2024-12-06 10:28:41 +01:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever
returns List[Document]
with docs ordered by split_idx_start
( #8590 )
...
* initial import
* adding a few pylint disable
* adding tests
* fixing integration tests
* adding release notes
* fixing types and docstrings
2024-12-04 16:55:56 +01:00
David S. Batista
c5ef0b2956
chore: adding a deprecation warning on the SentenceWindowRetriever
( #8597 )
...
* linting
* improving message
* fixing header
* adding deprecation in the release notes
2024-12-03 17:41:19 +01:00
Amna Mubashar
4c8eb54049
feat: Add store_full_path to converters (3/3) ( #8585 )
...
* Add store_full_path params
2024-12-03 13:48:56 +05:00
Stefano Fiorucci
de7099e560
ci: add job to check imports ( #8594 )
...
* try checking imports
* clarify error message
* better fmt
* do not show complete list of successfully imported packages
* refinements
* relnote
* add missing forward references
* better function name
* linting
* fix linting
* Update .github/utils/check_imports.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-11-29 14:00:59 +00:00
Stefano Fiorucci
c8685aa141
refactor: update components to access ChatMessage.text
instead of content
( #8589 )
...
* introduce text property and deprecate content
* release note
* use chatmessage.text
* release note
* linting
2024-11-28 10:16:07 +00:00
Stefano Fiorucci
fb1baf4921
refactor: ChatMessage
- introduce text
property and deprecate content
( #8588 )
...
* introduce text property and deprecate content
* release note
* minor test refactoring
---------
Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com>
2024-11-28 09:53:02 +00:00
Stefano Fiorucci
51c1390426
chore: use class methods to create ChatMessage
( #8581 )
...
* use class methods to build messages
* fix failing format
2024-11-28 09:35:24 +00:00
Stefano Fiorucci
fb42c035c5
feat: PyPDFToDocument
- add new customization parameters ( #8574 )
...
* deprecat converter in pypdf
* fix linting of MetaFieldGroupingRanker
* linting
* pypdftodocument: add customization params
* fix mypy
* incorporate feedback
2024-11-26 16:37:59 +01:00
Stefano Fiorucci
2440a5ee17
chore:PyPDFToDocument
- deprecate converter
init parameter ( #8569 )
...
* deprecat converter in pypdf
* fix linting of MetaFieldGroupingRanker
* linting
2024-11-26 14:47:04 +01:00
Michele Pangrazzi
f0c3692cf2
Remove is_greedy
deprecated argument from @component
decorator ( #8580 )
...
* Remove 'is_greedy' deprecated argument from @component decorator
* Remove unused import
2024-11-26 10:44:50 +00:00
Vladimir Blagojevic
59f1e182db
feat: Add variable to specify inputs as optional to ConditionalRouter ( #8568 )
...
* Add optional_variables in ConditionalRouter
* Add reno note
* Add more unit test with various complex scenarios
* Add more unit tests
* Add pylint disable=too-many-positional-arguments
* PR feedback from @sjrl
2024-11-26 10:48:55 +01:00
Matt G
e3b73e048b
fix: bug on tracing where components are in a loop in a pipeline ( #8576 )
...
* Fix to tracing parent spans on loops
* Fix linting
* Add release notes
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-25 14:21:08 +01:00
Silvano Cerza
ab840351f8
Fix DocumentCleaner not preserving Document fields ( #8578 )
2024-11-25 13:08:59 +01:00
Amna Mubashar
9302d3d9f0
feat: Add store_full_path to converters (2/3) ( #8573 )
2024-11-25 15:22:19 +05:00
Sebastian Husch Lee
eace2a99e5
feat: Add Literal["*"] option to required_variables in ChatPrompBuilder and PromptBuilder ( #8572 )
...
* Add new option for required_variables in PromptBuilder and ChatPromptBuilder
* Add reno note
* Add tests
2024-11-22 16:27:50 +01:00
David S. Batista
b5a2fad642
feat: adding Maximum Margin Relevance Ranker ( #8554 )
...
* initial import
* linting
* adding MRR tests
* adding release notes
* fixing tests
* adding linting ignore to cross-encoder ranker
* update docstring
* refactoring
* making strategy Optional instead of Literal
* wip: adding unit tests
* refactoring MMR algorithm
* refactoring tests
* cleaning up and updating tests
* adding empty line between license + code
* bug in tests
* using Enum for strategy and similarity metric
* adding more tests
* adding empty line between license + code
* removing run time params
* PR comments
* PR comments
* fixing
* fixing serialisation
* fixing serialisation tests
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* fixing tests
* PR comments
* PR comments
* PR comments
* PR comments
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-22 14:58:45 +00:00
Richard Hudson
a8eeb2024f
feat: Allow unverified OpenAPI calls ( #8562 )
...
* Feed through ssl_verify value to OpenAPI
* Add release note
* Update serialization methods
* Applied black formatting
2024-11-22 15:45:00 +01:00
Amna Mubashar
21906d0558
feat: Add store_full_path
to converters (1/3) ( #8566 )
...
* Add store_full_path param to 3 converters
2024-11-22 13:55:08 +01:00
Ulises M
b1353f4f0f
fix: append runtime meta to ChatMessage's extracted meta in AnswerBuilder ( #8544 )
...
* append runtime meta to extracted meta
* add pylint ignore flag to .run()
* explicitly convert reply to string
2024-11-20 20:07:04 +01:00
Silvano Cerza
3ef8c081be
fix: OpenAIChatGenerator
and OpenAIGenerator
crashing when streaming with usage tracking ( #8558 )
...
* Fix OpenAIGenerator crashing with tracking usage with streaming enabled
* Fix OpenAIChatGenerator crashing with tracking usage with streaming enabled
* Add release notes
* Fix linting
2024-11-20 10:27:22 +01:00
Silvano Cerza
bd77120cf3
Fix DocumentSplitter
not splitting by function ( #8549 )
...
* Fix DocumentSplitter not splitting by function
* Make the split_by mapping a constant
2024-11-18 11:54:30 +01:00
Ivo Bellin Salarin
c78545dfc0
feat(openai): be tolerant to exceptions ( #8526 )
...
* feat: be tolerant to exceptions
if ever an error is raised by the OpenAI API, don't fail the entire processing
* fix: missing import, string separator
* Enhance error handling
* Use batched from more_itertools for compatibility with older Python versions
* Fix batching and add test
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-15 10:52:44 +01:00
Sebastian Husch Lee
e45d3329a1
feat: Adding DALLE image generator ( #8448 )
...
* First pass at adding DALLE image generator
* Add missing header
* Fix tests
* Add tests
* Fix mypy
* Make mypy happy
* More unit tests
* Adding release notes
* Add a test for run
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Fix pylint
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-14 16:19:49 +01:00
Sriniketh J
a045c0eabb
feat: added split by line to DocumentSplitter ( #8525 )
...
* feat: added split by line to DocumentSplitter
* fix: pr review comments
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-11-14 16:09:01 +01:00
Sebastian Husch Lee
0c11c7b98e
fix: Bring in fix from custom nodes ( #8539 )
...
* Bring in fix from custom nodes
* Add to_dict function and test
* reno
* Fix pylint
2024-11-14 13:00:28 +01:00
Anes Benmerzoug
f5683bc8fa
fix: document joiner division by zero with distribution based rank fusion ( #8520 )
...
* Parametrize document joiner tests with empty lists
* Skip loop in _distribution_based_rank_fusion if document list is empty
* Parametrize test_empty_list with join_mode
* Prevent division by zero in _merge and _reciprocal_rank_fusion
* Add release notes
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-14 11:41:28 +00:00
David S. Batista
e5a80722c2
feat: adding metadata grouper component ( #8512 )
...
* initial import
* making tests more readable; adding docstring
* adding release notes
* adding LICENSE header
* Update test/components/rankers/test_metadata_grouper.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* refactoring
* fixing docstring
* fixing types
* test docstrings
* renaming test
* handling too-many-arguments
* liting
* Update haystack/components/rankers/metadata_grouper.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* changing name
* Update haystack/components/rankers/metadata_grouper.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/metadata_grouper.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* assiging value inside function for re-use
* improving docstring
* updating name to MetaFieldGroupingRanker
* adding to pydocs
* fixing imports
* adding output docstring
* Update haystack/components/rankers/meta_field_grouper_ranker.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update haystack/components/rankers/__init__.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update releasenotes/notes/add-metadata-grouper-21ec05fd4a307425.yaml
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update test/components/rankers/test_metadata_grouper.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* update docstring tests
* fixing imports
* rename modules for consistency
* fix pydocs
* simplification + more tests
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-12 16:01:53 +01:00
Sebastian Husch Lee
911f3523ab
feat: Increase logging transparency for empty Documents during conversion ( #8509 )
...
* Add log lines for PDF conversion and make skipping more explicit in DocumentSplitter
* Add logging statement for PDFMinerToDocument as well
* Add tests
* Remove unused line
* Remove unused line
* add reno
* Add in PDF file
* Update checks in PDF converters and add tests for document splitter
* Revert
* Remove line
* Fix comment
* Make mypy happy
* Make mypy happy
2024-11-04 09:26:57 +01:00
Bohan Qu
2595e68050
feat: Add TTFT support in OpenAI chat generator ( #8444 )
...
* feat: Add TTFT support in OpenAI generators
* pylint fixes
* correct disable
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-10-31 16:56:17 +01:00
Sebastian Husch Lee
294a67e426
feat: Adding StringJoiner ( #8357 )
...
* Adding StringJoiner
* Release notes
* Remove typing
* Remove unused import
* Try to fix header
* Fix one test
* Add to docs, move test to behavioral pipeline test
* Undo changes
* Fix test
* Update haystack/components/joiners/string_joiner.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update haystack/components/joiners/string_joiner.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Provide usage example
* Apply suggestions from code review
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-30 15:03:41 +00:00
Stefano Fiorucci
700684a31c
fix: HuggingFaceAPIGenerator
- use forward references ( #8502 )
...
* hf API generator: forward references + refactor
* release note
2024-10-30 11:51:07 +01:00
Silvano Cerza
8a35e792b9
feat: Add route output type validation in ConditionalRouter
( #8500 )
2024-10-29 18:06:54 +01:00
Madeesh Kannan
33675b4caf
chore: Remove deprecated DefaultConverter
for PyPDFToDocument
( #8501 )
...
* chore: Remove deprecated `DefaultConverter` for `PyPDFToDocument`
* Remove unused imports
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-10-29 16:42:48 +00:00
Bohan Qu
081b143aae
feat!: tracing with concurrency ( #8489 )
2024-10-29 17:39:41 +01:00
Vladimir Blagojevic
28161f7bb9
feat: DOCXToDocument: add table extraction ( #8457 )
...
* DOCXToDocument: add table extraction
* Add reno note
* mypy fixes
* add unit tests
* Add csv table support
* Update release note
* Add TableFormat enum
* Add table_format as str init param
* Update docx.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* PR feedback
* PR feedback
---------
Co-authored-by: medsriha <medsriha@gmail.com>
Co-authored-by: Mo Sriha <22803208+medsriha@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-10-29 16:20:27 +01:00
Silvano Cerza
8205724395
feat: Rework Pipeline.run()
to better handle cycles ( #8431 )
...
* draft
* Enhance
* Almost works
* Simplify some parts and handle intermediate outputs
* Handle connections with default
* Handle cycles with multiple connections from two components
* Update distributed outputs at the correct time
* Remove Component inputs after it runs
* Add agent pipeline test case
* Fix infite loop test
* Handle some corner cases with loops checking and inputs deletion
* Fix tests
* Add new behavioral test
* Remove unused code in behavioural test
* Fix behavioural test
* Fix max run check
* Simplify outputs distribution
* Simplify subgraph run check
* Remove unused _init_run_queue function
* Remove commented code
* Add some missing type hints
* Simplify cycles breaking
* Fix _distribute_output test
* Fix _find_components_that_will_receive_no_input test
* Fix validation test
* Fix tracer losing Component inputs
* Fix some linting issues
* Remove ignore pylint rule
* Rename method that break cycles and make it raise
* Add docstring to _run_subgraph
* Update Pipeline.run() docstring
* Update comment to clarify cycles execution
* Remove SelfLoop sample Component
* Add behavioural test for unsupported cycles
* Rename behavioural test to be more specific
* Add new behavioural test
* Add release notes
* Remove commented out code and random pass
* Use more efficient function to find cycles
* Simplify _break_supported_cycles_in_graph by using defaultdict
* Stop breaking edges as soon as we make the graph acyclic
* Fix docstring and add some more comments
* Fix _distribute_output docstring
* Fix _find_receivers_from docstring
* More detailed release notes
* Minimize calls to networkx.is_directed_acyclic_graph
* Add some more info on edges keys
* Adjust components_in_cycles comment
* Add new Pipeline behavioural test
* Enhance _find_components_that_will_receive_no_input to cover more cases
* Explain why run_queue is reset after running a subgraph cycle
* Rename _init_inputs_state to _normalize_input_data
* Better explain the subgraph output distribution
* Remove for else
* Fix some comments and docstrings
* Fix linting
* Add missing return type
* Fix typo
* Rename _normalize_input_data to _normalize_varidiac_input_data and add more documentation
* Remove unused import
---------
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2024-10-29 15:43:16 +01:00
tstadel
d430833f8f
feat: streaming_callback as run param from HF generators ( #8406 )
...
* feat: streaming_callback as run param from HF generators
* apply feedback
* add reno
* fix test
* fix test
* fix mypy
* fix excessive linting rule
2024-10-29 15:32:06 +01:00
Stefano Fiorucci
c7b898994e
build: unpin numpy
+ use Python 3.9 in CI ( #8492 )
...
* try unpinning numpy
* try python 3.9
* release note
2024-10-28 12:15:17 +01:00
Stefano Fiorucci
78292422f0
feat: allow passing meta
in the run
method of FileTypeRouter
( #8486 )
...
* initial refactoring
* progress
* refinements
* serde methods + tests
* release note
* comment
* make additional_mimetypes internal attribute
2024-10-24 16:21:15 +02:00
Madeesh Kannan
906177329b
fix: Enforce basic Python types restriction on serialized component data ( #8473 )
2024-10-22 17:08:36 +02:00
Alper
a556e11bf1
fix: window_size set during run instead of construction ( #8463 )
...
* window_size set during runtime
* revert init and update run with window_size
* improved doc, removed print
* adding release notes
* updating tests
* reverting docstring example
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-22 14:01:26 +00:00
Stefano Fiorucci
f6935d1456
ci: add pip
to test
dependencies ( #8475 )
...
* add pip to test dependencies
* trigger
* release note
* rm trigger
2024-10-22 08:35:30 +00:00
Stefano Fiorucci
322f63de6d
feat: Logging Tracer ( #8447 )
...
* logging tracer: first draft
* progress
* more tests
* license header
* avoid interference with other tests
* release note
* incorporate feedback from review
* Update haystack/tracing/logging_tracer.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-10-21 09:47:46 +02:00