Silvano Cerza
4d67b552e1
Fix Pipeline skipping a Component with Variadic input ( #8347 )
...
* Fix Pipeline skipping a Component with Variadic input
* Simplify _find_components_that_will_receive_no_input
2024-09-10 14:59:53 +02:00
Ulises M
145ca89a3f
feat: Expose default_headers and add kwargs for Azure Client ( #8244 )
...
* default_headers and azure_kwargs added
* update docstrings
* dont forget about chat generator
* Remove azure_kwargs argument
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-09-10 10:29:56 +00:00
jpatra72
b126c14e51
feat: Adds support for zero-shot document classification ( #7669 ) ( #8193 )
...
* feat: adds support for zero short document classification (#7669 )
Also, supports multi-label classification
* pytests for zero shot document classification
* release note
* added licence info to py scripts
* updated the format of licence info
* Added doc string and example code
* added review points highlighted in the PR
* feat: adds support for zero short document classification (#7669 )
Also, supports multi-label classification
* pytests for zero shot document classification
* release note
* added licence info to py scripts
* updated the format of licence info
* Added doc string and example code
* added review points highlighted in the PR
* Applied suggestions from doc string review
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* fixed pytest for init
* added output type
* added test for pipeline (de-) serialization
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2024-09-10 11:00:05 +02:00
ArzelaAscoIi
720e54970f
fix: make from dict conditional router more resilient ( #8343 )
...
* fix: make from dict conditional router more resilient
* refactor: remove
* dos: add release notes
* fix: format
2024-09-09 15:11:52 +02:00
Mo Sriha
75955922b9
feat: Add current date in UTC to PromptBuilder ( #8233 )
...
* initial commit
* add unit tests
* add release notes
* update function name
2024-09-09 09:47:03 +02:00
Sebastian Husch Lee
06dd5c2f37
feat (v2): Update so model_max_length
updates max_seq_length
for Sentence Transformers ( #8334 )
...
* Update so model_max_length does what is expected
* Add release notes
* Some fixes
* Another test
2024-09-06 11:37:56 +02:00
Sriniketh J
e98a6fea04
Convertor: CSVToDocument ( #8328 )
...
* carry forwarded initial commit
* fix: doc strings
* fix: update docstrings
* fix: docstring update
* fix: csv encoding in actions
* fix: line endings through hooks
* fix: converter docs addition
2024-09-06 10:59:12 +02:00
David S. Batista
1f3cb68d9f
fix: meta
prefix missing in the sentence window retriever filters ( #8309 )
...
* initial import
* listing supported doc stores in docstring
* adding release notes
2024-09-03 10:57:11 +02:00
Vladimir Blagojevic
b2c19a8c7a
feat: ChatPromptBuilder
copies entire ChatMessage
rather than copying content field only ( #8317 )
...
* Initial implementation of ChatMessage copy and deepcopy
* Add reno release note
* Satisfy hawkeye
* Remove copy and deepcopy, no need to complicate things
* Add new reno note
* Add unit test
2024-09-02 18:06:38 +02:00
Silvano Cerza
3e3f79b928
feat: Add unsafe
init arg in ConditionalRouter
and OutputAdapter
to enable previous behaviour ( #8176 )
...
* Add unsafe behaviour to OutputAdapter
* Add unsafe behaviour to ConditionalRouter
* Add release notes
* Fix mypy
* Add documentation links
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-09-02 14:14:54 +00:00
Alper
e614fa0c62
refactor: Rename deserialize_document_store_in_init_parameters ( #8302 )
...
* 8259
* update function name
* rename and update docstring
* fix linting
* add a release note
2024-09-02 11:42:23 +02:00
Alper
7dbc51a3e7
doc: warning added for deprectaion of gpt-3.5
as default model for OpenAI generators ( #8300 )
...
* warning added for gpt3.5 usage
* Revert "warning added for gpt3.5 usage"
This reverts commit 035a0ab9eaa9306171439fe128a78b7898ffe486.
* update openaigenerator and openaichatgenerator with warnings
* if cond removed
* update description
* adding release notes
* linting
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-08-29 09:31:59 +02:00
Julian Risch
51180e060e
chore: Remove emojis from release notes config ( #8305 )
2024-08-28 16:14:06 +02:00
David S. Batista
2f3257b77a
chore: removing deprecated SentenceWindowRetrieval
( #8294 )
...
* removing deprecated SentenceWindowRetrieval
* adding release notes
* Rename TestSentenceWindowRetrieval to TestSentenceWindowRetriever
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-08-28 10:04:52 +02:00
Stefano Fiorucci
6b0ee4c193
chore: update test dependency and LazyImport
block to make compatibility with sentence-transformers>=3.0.0
explicit ( #8295 )
...
* sentence-transformers-3 update test dep and lazyimport block
* clearer release note
2024-08-27 15:51:03 +00:00
Madeesh Kannan
f0b45c873f
feat: Extend core component machinery to support an async run method (experimental) ( #8279 )
...
* feat: Extend core component machinery to support an async run method
* Add reno
* Fix incorrect docstring
* Make `async_run` a coroutine
* Make `supports_async` a dunder field
2024-08-27 14:20:13 +02:00
Madeesh Kannan
1fa30d4aaa
chore: Remove deprecated debug
param from Pipeline.run
( #8288 )
...
* chore: Remove deprecated `debug` param from `Pipeline.run`
* Fix tests
2024-08-27 11:27:38 +02:00
David S. Batista
b411c14414
feat: The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window ( #8283 )
...
* initial import
* adding release notes
* linting
* improving docs and release notes
* updating example
2024-08-27 10:30:12 +02:00
Stefano Fiorucci
2e619f06c8
fix: make meta produced by DOCXToDocument
JSON serializable ( #8263 )
...
* make meta from DOCXToDocument JSON serializable
* unused import
* update docstrings
2024-08-22 12:24:32 +00:00
Stefano Fiorucci
aca8f09f7d
fix: DOCXToDocument
converter - use forward reference to Paragraph
( #8260 )
...
* docx paragraph forward ref
* fix
2024-08-21 12:37:43 +02:00
Jon Strutz
471f07c8fe
fix: extract page breaks from .docx files ( #8232 )
...
* fix: extract page breaks from .docx files
Context: Currently, DOCXToDocument does not extract page breaks from
word documents. This makes it impossible to do things like split by page
or get correct page number metadata after using something like
DocumentSplitter. For example, if you split by word, the 'page_number'
metadata field will be 1 for all documents.
Solution: Added a method to DOCXToDocument that extracts page breaks
from word documents as '\f' characters so that they are recognized by
DocumentSplitter.
Caveat: Due to the way the python-docx library is set up, you can only
accurately determine the location of the first page break for a given
paragraph. In the rare case that a paragraph contains more than one page
break (which means it is an extremely long paragraph spanning multiple
pages), the 2nd, 3rd, etc. page break locations are not known. To sort
of fix this, I just appended the page break characters to the end of
the paragraph text to keep the overall page number values for the
document consistent.
* Apply suggestions from code review
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-08-21 09:48:02 +00:00
Sebastian Husch Lee
7fd0b6a013
feat: Add min_top_k
to TopPSampler ( #8228 )
...
* Add feature to Top P Sampler
* Add release notes
* Fix zip call
* Fix mypy
* Restore doc string and make mypy happy hopefully
* Make mypy happy
* PR comment
* Revert change to make mypy happy
* Add back type ignore
* try to fix typing
* Update haystack/components/samplers/top_p.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/samplers/top_p.py
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-08-21 11:29:23 +02:00
Madeesh Kannan
cf5fd2a821
chore: Remove deprecated ChatMessage.to_openai_format
( #8242 )
...
* chore: Remove deprecated `ChatMessage.to_openai_format`
* lint
2024-08-16 10:34:44 +02:00
Stefano Fiorucci
bcc4104729
refactor: utility function for docstore deserialization ( #8226 )
...
* refactor docstore deserialization
* more tests
* reno; headers
* expose key
2024-08-14 13:29:27 +02:00
Stefano Fiorucci
109e98aa44
fix: deserialize Document Stores using specific from_dict
class methods ( #8207 )
...
* use from_dict
* unused import
* improve logic
* improve reno
2024-08-14 10:56:32 +02:00
Amna Mubashar
373de97426
Deprecate SentenceWindowRetrieval ( #8206 )
2024-08-13 13:49:41 +02:00
Vladimir Blagojevic
21c507331c
feat: Implement apply_filter_policy and FilterPolicy.MERGE for the new filters ( #8042 )
2024-08-09 12:04:24 +02:00
Nicola Procopio
4c798470b2
added precision
parameter to sentence transformers embeddings ( #8179 )
...
* added `precision` parameter to sentence transformers embeddings
* fixed test
* Update haystack/components/embedders/sentence_transformers_document_embedder.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update test/components/embedders/test_sentence_transformers_text_embedder.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update test/components/embedders/test_sentence_transformers_text_embedder.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* fix format
* Update sentence_transformers_text_embedder.py
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-09 11:38:47 +02:00
Marie-Luise Klaus
ec02817f14
fix: OutputAdapter from_dict with custom_filters None ( #8173 )
...
Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>
2024-08-08 14:02:40 +02:00
Stefano Fiorucci
a4eb88e7ea
rm serialize callback handler ( #8172 )
2024-08-08 11:54:31 +02:00
Corentin Meyer
58517014ec
fix: DocumentCleaner: keep the \f in text ( #8078 )
...
* Keep the \f in Document Cleaner
* Add Reno
* Add Test
* Simplified _remove_empty_lines() code
2024-08-07 14:50:14 +02:00
Marie-Luise Klaus
031b0bfbd8
fix: ChatPromptBuilder from_dict if template is None ( #8165 )
...
* fix ChatPromptBuilder from dict if template=None
* fix ChatPromptBuilder from dict if template=None
* leave template None
---------
Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>
2024-08-06 14:48:04 +02:00
Tim Wellbrock
2e2f5f17bb
feat: add unicode normalization & ascii_only mode for DocumentCleaner ( #8103 )
...
* feat: add unicode normalization & ascii_only mode for DocumentCleaner.
* feat: add unicode_normalization parameter valdiation to DocumentCleaner.
* test: fix the unit test to work after code linting.
2024-08-05 13:00:39 +02:00
Stefano Fiorucci
e17d0c4192
chore: deprecate to_openai_format
and create similar utility functions ( #8146 )
...
* deprecate and add new specific functions
* reno
2024-08-02 16:47:17 +02:00
Tobias Wochinger
5a3ea75196
docs: document Python 3.11 and 3.12 support ( #8159 )
...
* docs: add Python 3.11 and 3.12 to supported versions
* docs: add release notes
2024-08-02 14:46:20 +02:00
Sebastian Husch Lee
c90495c2e8
feat: Add model and tokenizer kwargs to TransformersSimilarityRanker
, SentenceTransformersDocumentEmbedder
, SentenceTransformersTextEmbedder
( #8145 )
...
* Start adding model and tokenizer kwargs support
* Add model and tokenizer kwargs to doc embedder
* Some updates and fixes in tests
* Fix more tests
* Fix tests
* Add release note
* Fix test
* Add from_dict tests
2024-08-02 10:37:10 +02:00
Vladimir Blagojevic
25d3520f5a
feat: Add AnswerJoiner
new component ( #8122 )
...
* Initial AnswerJoiner
* Initial tests
* Add release note
* Resove mypy warning
* Add custom join function
* Serialize custom join function
* Handle all Answer types, add integration test, improve pydoc
* Make fixes
* Add to API docs
* Add more tests
* Update haystack/components/joiners/answer_joiner.py
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
* Update docstrings and release notes
* update docstrings
---------
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Darja Fokina <daria.fokina@deepset.ai>
2024-08-01 12:51:17 +02:00
Silvano Cerza
c7e29a83c1
fix: Fix infinite loop when running Pipeline ( #8123 )
...
* Fix infinite loop when running Pipeline
* Simplify if
2024-07-30 15:00:12 +02:00
Corentin Meyer
1c53aae8f0
fix: Tika converter not yielding page break tags (\f
) ( #8082 )
...
* Fix TikaConverter not having \f page tag by using HTML mode of parsing and then parsing the HTML to text using the old Haystack 1.X integration as template.
* Add Reno
* Fix test by making Mock Tika return XML (before parsing)
* refinements and test
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-07-26 20:13:47 +02:00
Amna Mubashar
e0de423ee0
Rename SentenceWindowRetrieval to SentenceWindowRetriever
2024-07-26 17:46:44 +02:00
Silvano Cerza
3fed1366c4
fix: Fix issue that could lead to RCE if using unsecure Jinja templates ( #8095 )
...
* Fix issue that could lead to RCE if using unsecure Jinja templates
* Add comment explaining exception suppression
* Update release note
* Update release note
2024-07-26 14:02:09 +00:00
Nicola Procopio
47f4db8698
added truncate_dim to sentence transformers embedder ( #8077 )
...
* added truncate_dim to sentence transformers embedder
* Update haystack/components/embedders/sentence_transformers_document_embedder.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update releasenotes/notes/release-note-2b603a123cd36214.yaml
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* fixed parameter description
* added test for truncation to text embedder
* fix format
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-26 10:39:48 +02:00
Madeesh Kannan
b2aef217da
chore: Remove deprecated DynamicPromptBuilder
and DynamicChatPromptBuilder
components ( #8085 )
2024-07-26 10:00:59 +02:00
Madeesh Kannan
f9e4d5dc58
chore: Deprecate the debug
parameter in Pipeline.run
( #8075 )
2024-07-25 09:58:57 +00:00
Amna Mubashar
b374c528b2
Assign streaming_callback to OpenAIGenerator and OpenAIChatGenerator in run() method ( #8054 )
...
* Add optional parameter for streaming_callback in run() method
2024-07-24 15:49:19 +02:00
Sebastian Husch Lee
baed478f23
fix: Fix split_start_idx
and _split_overlap
information in DocumentSplitter
( #8046 )
...
* Fix bug in DocumentSplitter and expand tests to catch said bug
* Fix split overlap information calc and actually test it
* Add release notes
* Remove comments
* Same fix in SentenceWindowRetrieval
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-24 15:15:36 +02:00
Stefano Fiorucci
b36ec0a38c
fix release note ( #8070 )
2024-07-24 15:03:01 +02:00
Tobias Wochinger
38d38678c7
fix: fix PPTX import ( #8069 )
...
* fix: fix PPTX import
* docs: add release notes
2024-07-24 14:50:47 +02:00
Madeesh Kannan
4650263bc3
chore: Remove deprecated init paramters from HTMLToDocument
( #8056 )
...
* chore: Remove deprecated init paramters from `HTMLToDocument`
* Fix reno
2024-07-24 13:16:47 +02:00
David S. Batista
0c9dc008f0
fix: improve context relevancy metric ( #7964 )
...
* fixing tests
* fixing tests
* updating tests
* updating tests
* updating docstring
* adding release notes
* making the insufficient information more robust
* updating docstring and release notes
* empty list instead of informative string
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* fixing tests
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* reverting commit
* reverting again commit
* fixing docstrings
* removing deprecation warning
* removing warning import
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-22 15:13:46 +02:00