David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter ( #8617 )
...
* wip: initial import
* wip: refactoring
* wip: refactoring tests
* wip: refactoring tests
* making all NLTKSplitter related tests work
* refactoring
* docstrings
* refactoring and removing NLTKDocumentSplitter
* fixing tests for custom sentence tokenizer
* fixing tests for custom sentence tokenizer
* cleaning up
* adding release notes
* reverting some changes
* cleaning up tests
* fixing serialisation and adding tests
* cleaning up
* wip
* renaming and cleaning
* adding NLTK files
* updating docstring
* adding import to init
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* updating tests
* wip
* adding sentence/period change warning
* fixing LICENSE header
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever
returns List[Document]
with docs ordered by split_idx_start
( #8590 )
...
* initial import
* adding a few pylint disable
* adding tests
* fixing integration tests
* adding release notes
* fixing types and docstrings
2024-12-04 16:55:56 +01:00
Alper
a556e11bf1
fix: window_size set during run instead of construction ( #8463 )
...
* window_size set during runtime
* revert init and update run with window_size
* improved doc, removed print
* adding release notes
* updating tests
* reverting docstring example
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/components/retrievers/sentence_window_retriever.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-22 14:01:26 +00:00
Ajit Singh
6cf13e8b98
enhancement: reduced usage of numpy and substituted built-in libraries ( #8418 )
...
* reduced usage of numpy and substituted built-in libraries
* added release note
* edited expit function to support both float as well as list (this case was giving error CI)
* revert code , numpy can't be removed here
* more cleaning
* fix relnote
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-10-18 15:42:19 +02:00
Stefano Fiorucci
842a7b80a8
rm sentence_window_retrieval ( #8303 )
2024-08-28 10:51:07 +02:00
David S. Batista
2f3257b77a
chore: removing deprecated SentenceWindowRetrieval
( #8294 )
...
* removing deprecated SentenceWindowRetrieval
* adding release notes
* Rename TestSentenceWindowRetrieval to TestSentenceWindowRetriever
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-08-28 10:04:52 +02:00
David S. Batista
b411c14414
feat: The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window ( #8283 )
...
* initial import
* adding release notes
* linting
* improving docs and release notes
* updating example
2024-08-27 10:30:12 +02:00
Stefano Fiorucci
bcc4104729
refactor: utility function for docstore deserialization ( #8226 )
...
* refactor docstore deserialization
* more tests
* reno; headers
* expose key
2024-08-14 13:29:27 +02:00
Amna Mubashar
373de97426
Deprecate SentenceWindowRetrieval ( #8206 )
2024-08-13 13:49:41 +02:00
Amna Mubashar
e0de423ee0
Rename SentenceWindowRetrieval to SentenceWindowRetriever
2024-07-26 17:46:44 +02:00
Sebastian Husch Lee
baed478f23
fix: Fix split_start_idx
and _split_overlap
information in DocumentSplitter
( #8046 )
...
* Fix bug in DocumentSplitter and expand tests to catch said bug
* Fix split overlap information calc and actually test it
* Add release notes
* Remove comments
* Same fix in SentenceWindowRetrieval
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-24 15:15:36 +02:00
David S. Batista
431aa4a406
updating sentence window retriever tests ( #8034 )
...
* updating sentence window retriever tests
* fix
2024-07-16 22:10:55 +02:00
David S. Batista
ebfeb571d7
feat: add sentence window retrieval ( #7997 )
...
* initial import
* adding tests
* adding license and release notes
* adding missing release notes
* working with any type of doc store
* nit
* adding get_class_object to serialization package
* nit
* refactoring get_class_object()
* refactoring get_class_object()
* chaning type and var names
* more refactoring
* Update haystack/core/serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Update haystack/core/serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Update test/core/test_serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* more refactoring
* more refactoring
* Pydoc syntax
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-10 13:13:46 +00:00
Vladimir Blagojevic
678f193f10
feat: Add filter_policy init parameter to in memory retrievers ( #7795 )
...
* Add filter_policy init parameter to in-memory retrievers
2024-06-04 17:51:16 +02:00
Silvano Cerza
854c4173f2
feat: Add memory sharing between different instances of InMemoryDocumentStore
( #7781 )
...
* Add memory sharing between different instances of InMemoryDocumentStore
* Fix FilterRetriever tests
* Fix InMemoryBM25Retriever tests
2024-05-31 16:44:14 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules ( #7675 )
...
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Bijay Gurung
74683fe74d
Feat: Add FilterRetriever ( #6836 )
...
* Add FilterRetriever draft
* Implement FilterRetriever and add tests
* Update comparison to compare whole docs instead of just contents
* Expose FilterRetriever at the retrievers level
* Update docstring (add example usage)
* Add filter_retriever in the API reference docs config
Update retriever search path to start one dir level higher
* simplify _documents_equal
* improve usage example
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-02-08 08:48:46 +01:00
ZanSara
1182c08daf
fix: Dont filter negative scores when using BM25Okapi
and scale_score=False
( #6889 )
...
* dont filter negatives for unscaled Okapi
* change BM25 algorithm default to BM25L
* Update haystack/document_stores/in_memory/document_store.py
* improve comment
2024-02-06 11:07:27 +01:00
Madeesh Kannan
a5189dd035
fix!: InMemoryBM25Retriever
no longer returns documents that have a score of 0.0 ( #6717 )
...
* fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0
Also update tests to accommodate the new behavior.
* Remove superfluous code
2024-01-12 17:50:55 +01:00
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores
namespace ( #6714 )
...
* remove symbols under the haystack.document_stores namespace
* Update haystack/document_stores/types/protocol.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* fix
* same for retrievers
* leftovers
* more leftovers
* add relnote
* leftovers
* one more
* fix examples
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00
Stefano Fiorucci
4912f7cb58
refactor!: improve the deserialization logic for components that use a Document Store ( #6466 )
...
* improve deserialization
* rm ds decorator
* improve tests
* fix pylint
* rm decorator from module init
* rm decorator
* rm decorator from factory
* fix tests
* release note
* rm print
2023-12-04 15:17:28 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker ( #6450 )
2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2
Fix all tests
2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00