David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter ( #8617 )
...
* wip: initial import
* wip: refactoring
* wip: refactoring tests
* wip: refactoring tests
* making all NLTKSplitter related tests work
* refactoring
* docstrings
* refactoring and removing NLTKDocumentSplitter
* fixing tests for custom sentence tokenizer
* fixing tests for custom sentence tokenizer
* cleaning up
* adding release notes
* reverting some changes
* cleaning up tests
* fixing serialisation and adding tests
* cleaning up
* wip
* renaming and cleaning
* adding NLTK files
* updating docstring
* adding import to init
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* updating tests
* wip
* adding sentence/period change warning
* fixing LICENSE header
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever
returns List[Document]
with docs ordered by split_idx_start
( #8590 )
...
* initial import
* adding a few pylint disable
* adding tests
* fixing integration tests
* adding release notes
* fixing types and docstrings
2024-12-04 16:55:56 +01:00
Alper
a556e11bf1
fix: window_size set during run instead of construction ( #8463 )
...
* window_size set during runtime
* revert init and update run with window_size
* improved doc, removed print
* adding release notes
* updating tests
* reverting docstring example
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-22 14:01:26 +00:00
David S. Batista
b411c14414
feat: The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window ( #8283 )
...
* initial import
* adding release notes
* linting
* improving docs and release notes
* updating example
2024-08-27 10:30:12 +02:00
Stefano Fiorucci
bcc4104729
refactor: utility function for docstore deserialization ( #8226 )
...
* refactor docstore deserialization
* more tests
* reno; headers
* expose key
2024-08-14 13:29:27 +02:00
Amna Mubashar
e0de423ee0
Rename SentenceWindowRetrieval to SentenceWindowRetriever
2024-07-26 17:46:44 +02:00
Sebastian Husch Lee
baed478f23
fix: Fix split_start_idx
and _split_overlap
information in DocumentSplitter
( #8046 )
...
* Fix bug in DocumentSplitter and expand tests to catch said bug
* Fix split overlap information calc and actually test it
* Add release notes
* Remove comments
* Same fix in SentenceWindowRetrieval
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-24 15:15:36 +02:00
David S. Batista
431aa4a406
updating sentence window retriever tests ( #8034 )
...
* updating sentence window retriever tests
* fix
2024-07-16 22:10:55 +02:00
David S. Batista
ebfeb571d7
feat: add sentence window retrieval ( #7997 )
...
* initial import
* adding tests
* adding license and release notes
* adding missing release notes
* working with any type of doc store
* nit
* adding get_class_object to serialization package
* nit
* refactoring get_class_object()
* refactoring get_class_object()
* chaning type and var names
* more refactoring
* Update haystack/core/serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Update haystack/core/serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Update test/core/test_serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* more refactoring
* more refactoring
* Pydoc syntax
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-10 13:13:46 +00:00