haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-23 08:52:16 +00:00

Author	SHA1	Message	Date
Silvano Cerza	7287657f0e	refactor: Rename `Document`'s `text` field to `content` (#6181 ) * Rework Document serialisation Make Document backward compatible Fix InMemoryDocumentStore filters Fix InMemoryDocumentStore.bm25_retrieval Add release notes Fix pylint failures Enhance Document kwargs handling and docstrings Rename Document's text field to content Fix e2e tests Fix SimilarityRanker tests Fix typo in release notes Rename Document's metadata field to meta (#6183) * fix bugs * make linters happy * fix * more fix * match regex --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-10-31 12:44:04 +01:00
Silvano Cerza	ec376c7dbd	Remove id_hash_keys from DocumentCleaner (#6123 )	2023-10-20 15:16:06 +02:00
Julian Risch	9f3b6512be	refactor: Remove reimplementations of default `from_dict`/`to_dict` and corresponding tests in 2.0 (#6108 ) * whisper transcriber * remove from/to_dict from builders * remove from/to_dict from embedders * remove from/to_dict from fetcher, file_converters * remove from/to_dict from generators, preprocessors * remove from/to_dict from ranker, reader * remove from/to_dict from router, sampler, websearch * pylint * reno * refactor import * remove unused import	2023-10-19 11:17:02 +02:00
Julian Risch	90ddeba579	fix: DocumentSplitter and DocumentCleaner copy `id_hash_keys` to newly created Documents (#6083 ) * copy id_hash_keys in splitter and cleaner * reno	2023-10-17 11:03:48 +02:00
Julian Risch	aaee03aee8	feat: Add DocumentCleaner 2.0 (#5976 ) * remove whitespaces, substrings, regex, empty lines * remove repeated substrings * reno * return empty string as shortest common ngram * address first half of review feedback * address second half of review feedback * mention \f page separator for header/footer removal * mention \f page separator for header/footer removal * mark example usage as python code	2023-10-13 12:39:55 +02:00

5 Commits