Vladimir Blagojevic
b2c19a8c7a
feat: ChatPromptBuilder copies entire ChatMessage rather than copying content field only ( #8317 )
...
* Initial implementation of ChatMessage copy and deepcopy
* Add reno release note
* Satisfy hawkeye
* Remove copy and deepcopy, no need to complicate things
* Add new reno note
* Add unit test
2024-09-02 18:06:38 +02:00
Haystack Bot
9c1ad8e8ea
Update unstable version to 2.6.0-rc0 ( #8318 )
...
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-09-02 16:44:50 +02:00
Silvano Cerza
3e3f79b928
feat: Add unsafe init arg in ConditionalRouter and OutputAdapter to enable previous behaviour ( #8176 )
...
* Add unsafe behaviour to OutputAdapter
* Add unsafe behaviour to ConditionalRouter
* Add release notes
* Fix mypy
* Add documentation links
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
v2.6.0-rc0
2024-09-02 14:14:54 +00:00
Alper
e614fa0c62
refactor: Rename deserialize_document_store_in_init_parameters ( #8302 )
...
* 8259
* update function name
* rename and update docstring
* fix linting
* add a release note
2024-09-02 11:42:23 +02:00
Alper
7dbc51a3e7
doc: warning added for deprectaion of gpt-3.5 as default model for OpenAI generators ( #8300 )
...
* warning added for gpt3.5 usage
* Revert "warning added for gpt3.5 usage"
This reverts commit 035a0ab9eaa9306171439fe128a78b7898ffe486.
* update openaigenerator and openaichatgenerator with warnings
* if cond removed
* update description
* adding release notes
* linting
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-08-29 09:31:59 +02:00
Julian Risch
51180e060e
chore: Remove emojis from release notes config ( #8305 )
2024-08-28 16:14:06 +02:00
Stefano Fiorucci
842a7b80a8
rm sentence_window_retrieval ( #8303 )
2024-08-28 10:51:07 +02:00
David S. Batista
2f3257b77a
chore: removing deprecated SentenceWindowRetrieval ( #8294 )
...
* removing deprecated SentenceWindowRetrieval
* adding release notes
* Rename TestSentenceWindowRetrieval to TestSentenceWindowRetriever
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-08-28 10:04:52 +02:00
Stefano Fiorucci
25d333bed3
update transformers ( #8296 )
2024-08-27 16:04:11 +00:00
Stefano Fiorucci
6b0ee4c193
chore: update test dependency and LazyImport block to make compatibility with sentence-transformers>=3.0.0 explicit ( #8295 )
...
* sentence-transformers-3 update test dep and lazyimport block
* clearer release note
2024-08-27 15:51:03 +00:00
Madeesh Kannan
f0b45c873f
feat: Extend core component machinery to support an async run method (experimental) ( #8279 )
...
* feat: Extend core component machinery to support an async run method
* Add reno
* Fix incorrect docstring
* Make `async_run` a coroutine
* Make `supports_async` a dunder field
2024-08-27 14:20:13 +02:00
Madeesh Kannan
1fa30d4aaa
chore: Remove deprecated debug param from Pipeline.run ( #8288 )
...
* chore: Remove deprecated `debug` param from `Pipeline.run`
* Fix tests
2024-08-27 11:27:38 +02:00
David S. Batista
b411c14414
feat: The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window ( #8283 )
...
* initial import
* adding release notes
* linting
* improving docs and release notes
* updating example
2024-08-27 10:30:12 +02:00
dependabot[bot]
83e9542a62
chore(deps): bump fossas/fossa-action from 1.3.3 to 1.4.0 ( #8167 )
...
Bumps [fossas/fossa-action](https://github.com/fossas/fossa-action ) from 1.3.3 to 1.4.0.
- [Release notes](https://github.com/fossas/fossa-action/releases )
- [Commits](https://github.com/fossas/fossa-action/compare/v1.3.3...v1.4.0 )
---
updated-dependencies:
- dependency-name: fossas/fossa-action
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-23 17:55:33 +02:00
David S. Batista
acfe28b5ed
docs: updating DocumentSplitter docstring, adding supported DocumentSores ( #8270 )
...
* initial import
* adding Chroma with limited support
* updating
* Update document_splitter.py
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update document_splitter.py
* linting
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-23 14:24:08 +00:00
Souf G
3163fbb835
fix discord link in README.md ( #8274 )
...
* fix discord link in README.md
* Update README.md
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-23 10:29:56 +02:00
Stefano Fiorucci
2e619f06c8
fix: make meta produced by DOCXToDocument JSON serializable ( #8263 )
...
* make meta from DOCXToDocument JSON serializable
* unused import
* update docstrings
2024-08-22 12:24:32 +00:00
dependabot[bot]
0a1a64cb0c
build(deps): bump tj-actions/changed-files from 44 to 45 ( #8269 )
...
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files ) from 44 to 45.
- [Release notes](https://github.com/tj-actions/changed-files/releases )
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md )
- [Commits](https://github.com/tj-actions/changed-files/compare/v44...v45 )
---
updated-dependencies:
- dependency-name: tj-actions/changed-files
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-21 17:00:17 +02:00
Stefano Fiorucci
aca8f09f7d
fix: DOCXToDocument converter - use forward reference to Paragraph ( #8260 )
...
* docx paragraph forward ref
* fix
2024-08-21 12:37:43 +02:00
Jon Strutz
471f07c8fe
fix: extract page breaks from .docx files ( #8232 )
...
* fix: extract page breaks from .docx files
Context: Currently, DOCXToDocument does not extract page breaks from
word documents. This makes it impossible to do things like split by page
or get correct page number metadata after using something like
DocumentSplitter. For example, if you split by word, the 'page_number'
metadata field will be 1 for all documents.
Solution: Added a method to DOCXToDocument that extracts page breaks
from word documents as '\f' characters so that they are recognized by
DocumentSplitter.
Caveat: Due to the way the python-docx library is set up, you can only
accurately determine the location of the first page break for a given
paragraph. In the rare case that a paragraph contains more than one page
break (which means it is an extremely long paragraph spanning multiple
pages), the 2nd, 3rd, etc. page break locations are not known. To sort
of fix this, I just appended the page break characters to the end of
the paragraph text to keep the overall page number values for the
document consistent.
* Apply suggestions from code review
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-08-21 09:48:02 +00:00
Sebastian Husch Lee
7fd0b6a013
feat: Add min_top_k to TopPSampler ( #8228 )
...
* Add feature to Top P Sampler
* Add release notes
* Fix zip call
* Fix mypy
* Restore doc string and make mypy happy hopefully
* Make mypy happy
* PR comment
* Revert change to make mypy happy
* Add back type ignore
* try to fix typing
* Update haystack/components/samplers/top_p.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/samplers/top_p.py
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-08-21 11:29:23 +02:00
Daria Fokina
35b1215b00
clean up docstrings: WhisperTranscribers ( #8235 )
...
* clarify docstrings
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-16 11:28:42 +00:00
Daria Fokina
bbe18cfdaf
clean up docstrings: DocumentLanguageClassifier ( #8215 )
...
* doclangclass-strings
* simplify sentence
* simplify sentence 2
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-16 12:45:54 +02:00
Daria Fokina
4a058032e7
clean up docstrings: TransformersTextRouter ( #8229 )
...
* Update transformers_text_router.py
* article
* article 2
2024-08-16 12:44:39 +02:00
Daria Fokina
b51bb6e5a9
Update zero_shot_text_router.py ( #8231 )
2024-08-16 12:43:13 +02:00
Daria Fokina
b5d0bfa9df
Update cache_checker.py ( #8237 )
2024-08-16 12:22:09 +02:00
Madeesh Kannan
cf5fd2a821
chore: Remove deprecated ChatMessage.to_openai_format ( #8242 )
...
* chore: Remove deprecated `ChatMessage.to_openai_format`
* lint
2024-08-16 10:34:44 +02:00
Agnieszka Marzec
9427d7aee6
update docstrings ( #8225 )
2024-08-14 15:33:21 +02:00
Stefano Fiorucci
bcc4104729
refactor: utility function for docstore deserialization ( #8226 )
...
* refactor docstore deserialization
* more tests
* reno; headers
* expose key
2024-08-14 13:29:27 +02:00
Stefano Fiorucci
109e98aa44
fix: deserialize Document Stores using specific from_dict class methods ( #8207 )
...
* use from_dict
* unused import
* improve logic
* improve reno
2024-08-14 10:56:32 +02:00
Daria Fokina
5ac56ebdaf
clean up docstrings: ChatPromptBuilder ( #8210 )
...
* chatbuilder-docstrings
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-14 08:44:42 +02:00
Daria Fokina
f0816f67ba
Update metadata_router.py ( #8217 )
2024-08-14 08:40:18 +02:00
Daria Fokina
b5125c73d3
clean up docstrings: TextLanguageRouter ( #8219 )
...
* Update text_language_router.py
* small upd
2024-08-14 08:04:07 +02:00
David S. Batista
c4be8370c6
doc: adding some more hatch info to CONTRIBUTING.md ( #8201 )
...
* adding some more hatch info to CONTRIBUTING.md
* nit
2024-08-13 15:36:41 +02:00
Silvano Cerza
ab7eb25856
Add utility then step in feature testing to draw pipeline to file ( #8209 )
2024-08-13 14:49:13 +02:00
Vladimir Blagojevic
3318d894c0
Add sede_with_list_output_type_in_pipeline unit test ( #8196 )
2024-08-13 14:37:24 +02:00
Daria Fokina
1284ca285b
clean up docstrings: AzureOpenAIDocumentEmbedder & AzureOpenAITextEmbedder ( #8182 )
...
* clarify docstrings
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-13 12:17:47 +00:00
Daria Fokina
e343f8fbd5
clean up docstrings: HuggingFaceAPIDocumentEmbedder & HuggingFaceAPITextEmbedder ( #8184 )
...
* hf-clarify docstrings
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-13 12:06:54 +00:00
Daria Fokina
861c470ef6
clean up docstrings: OpenAIDocumentEmbedder ( #8186 )
...
* docstrings upd
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-13 12:05:46 +00:00
Daria Fokina
741dd07227
clean up docstrings: TextCleaner ( #8202 )
...
* update textcleaner strings
* Update haystack/components/preprocessors/text_cleaner.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-13 12:02:58 +00:00
Amna Mubashar
373de97426
Deprecate SentenceWindowRetrieval ( #8206 )
2024-08-13 13:49:41 +02:00
Haystack Bot
565d802db9
Update unstable version to 2.5.0-rc0 ( #8195 )
...
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-08-12 13:45:59 +02:00
Vladimir Blagojevic
21c507331c
feat: Implement apply_filter_policy and FilterPolicy.MERGE for the new filters ( #8042 )
v2.5.0-rc0
2024-08-09 12:04:24 +02:00
Nicola Procopio
4c798470b2
added precision parameter to sentence transformers embeddings ( #8179 )
...
* added `precision` parameter to sentence transformers embeddings
* fixed test
* Update haystack/components/embedders/sentence_transformers_document_embedder.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update test/components/embedders/test_sentence_transformers_text_embedder.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update test/components/embedders/test_sentence_transformers_text_embedder.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* fix format
* Update sentence_transformers_text_embedder.py
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-09 11:38:47 +02:00
Marie-Luise Klaus
ec02817f14
fix: OutputAdapter from_dict with custom_filters None ( #8173 )
...
Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>
2024-08-08 14:02:40 +02:00
Stefano Fiorucci
a4eb88e7ea
rm serialize callback handler ( #8172 )
2024-08-08 11:54:31 +02:00
Corentin Meyer
58517014ec
fix: DocumentCleaner: keep the \f in text ( #8078 )
...
* Keep the \f in Document Cleaner
* Add Reno
* Add Test
* Simplified _remove_empty_lines() code
2024-08-07 14:50:14 +02:00
Marie-Luise Klaus
031b0bfbd8
fix: ChatPromptBuilder from_dict if template is None ( #8165 )
...
* fix ChatPromptBuilder from dict if template=None
* fix ChatPromptBuilder from dict if template=None
* leave template None
---------
Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>
2024-08-06 14:48:04 +02:00
Tim Wellbrock
2e2f5f17bb
feat: add unicode normalization & ascii_only mode for DocumentCleaner ( #8103 )
...
* feat: add unicode normalization & ascii_only mode for DocumentCleaner.
* feat: add unicode_normalization parameter valdiation to DocumentCleaner.
* test: fix the unit test to work after code linting.
2024-08-05 13:00:39 +02:00
Stefano Fiorucci
e17d0c4192
chore: deprecate to_openai_format and create similar utility functions ( #8146 )
...
* deprecate and add new specific functions
* reno
2024-08-02 16:47:17 +02:00