Silvano Cerza
|
7287657f0e
|
refactor: Rename Document 's text field to content (#6181)
* Rework Document serialisation
Make Document backward compatible
Fix InMemoryDocumentStore filters
Fix InMemoryDocumentStore.bm25_retrieval
Add release notes
Fix pylint failures
Enhance Document kwargs handling and docstrings
Rename Document's text field to content
Fix e2e tests
Fix SimilarityRanker tests
Fix typo in release notes
Rename Document's metadata field to meta (#6183)
* fix bugs
* make linters happy
* fix
* more fix
* match regex
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
|
2023-10-31 12:44:04 +01:00 |
|
Silvano Cerza
|
ec376c7dbd
|
Remove id_hash_keys from DocumentCleaner (#6123)
|
2023-10-20 15:16:06 +02:00 |
|
Julian Risch
|
9f3b6512be
|
refactor: Remove reimplementations of default from_dict /to_dict and corresponding tests in 2.0 (#6108)
* whisper transcriber
* remove from/to_dict from builders
* remove from/to_dict from embedders
* remove from/to_dict from fetcher, file_converters
* remove from/to_dict from generators, preprocessors
* remove from/to_dict from ranker, reader
* remove from/to_dict from router, sampler, websearch
* pylint
* reno
* refactor import
* remove unused import
|
2023-10-19 11:17:02 +02:00 |
|
Julian Risch
|
90ddeba579
|
fix: DocumentSplitter and DocumentCleaner copy id_hash_keys to newly created Documents (#6083)
* copy id_hash_keys in splitter and cleaner
* reno
|
2023-10-17 11:03:48 +02:00 |
|
Julian Risch
|
aaee03aee8
|
feat: Add DocumentCleaner 2.0 (#5976)
* remove whitespaces, substrings, regex, empty lines
* remove repeated substrings
* reno
* return empty string as shortest common ngram
* address first half of review feedback
* address second half of review feedback
* mention \f page separator for header/footer removal
* mention \f page separator for header/footer removal
* mark example usage as python code
|
2023-10-13 12:39:55 +02:00 |
|