9 Commits

Author SHA1 Message Date
ZanSara
e888852aec
Standardize TextFileToDocument (#6232)
* simplify textfiletodocument

* fix error handling and tests

* stray print

* reno

* streams->sources

* reno

* feedback

* test

* fix tests
2023-11-17 15:39:39 +01:00
Silvano Cerza
7287657f0e
refactor: Rename Document's text field to content (#6181)
* Rework Document serialisation

Make Document backward compatible

Fix InMemoryDocumentStore filters

Fix InMemoryDocumentStore.bm25_retrieval

Add release notes

Fix pylint failures

Enhance Document kwargs handling and docstrings

Rename Document's text field to content

Fix e2e tests

Fix SimilarityRanker tests

Fix typo in release notes

Rename Document's metadata field to meta (#6183)

* fix bugs

* make linters happy

* fix

* more fix

* match regex

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-10-31 12:44:04 +01:00
Julian Risch
9f3b6512be
refactor: Remove reimplementations of default from_dict/to_dict and corresponding tests in 2.0 (#6108)
* whisper transcriber

* remove from/to_dict from builders

* remove from/to_dict from embedders

* remove from/to_dict from fetcher, file_converters

* remove from/to_dict from generators, preprocessors

* remove from/to_dict from ranker, reader

* remove from/to_dict from router, sampler, websearch

* pylint

* reno

* refactor import

* remove unused import
2023-10-19 11:17:02 +02:00
Christian Clauss
bf6d306d68
ci: Simplify Python code with ruff rules SIM (#5833)
* ci: Simplify Python code with ruff rules SIM

* Revert #5828

* ruff --select=I --fix haystack/modeling/infer.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-20 08:32:44 +02:00
ZanSara
6e70d403f8
feat: Improve Document for Haystack 2.0 (#5738)
* initial draft

* tests

* add proposal

* proposal number

* reno

* fix tests and usage of content and content_type

* update branch & fix more tests

* mypy

* add docstring

* fix more tests

* review feedback

* improve __str__

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/preview/dataclasses/document.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* improve __str__

* fix tests

* fix more tests

* Update haystack/preview/document_stores/memory/document_store.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-11 17:40:00 +02:00
ZanSara
b1daa7c647
chore: migrate to canals==0.7.0 (#5647)
* add default_to_dict and default_from_dict placeholders to ease migration to canals 0.7.0

* canals==0.7.0

* whisper components

* add to_dict/from_dict stubs

* import serialization methods in init to hide canals imports

* reno

* export deserializationerror too

* Update haystack/preview/__init__.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* serialization methods for LocalWhisperTranscriber (#5648)

* chore: serialization methods for `FileExtensionClassifier` (#5651)

* serialization methods for FileExtensionClassifier

* Update test_file_classifier.py

* chore: serialization methods for `SentenceTransformersDocumentEmbedder` (#5652)

* serialization methods for SentenceTransformersDocumentEmbedder

* fix device management

* serialization methods for SentenceTransformersTextEmbedder (#5653)

* serialization methods for TextFileToDocument (#5654)

* chore: serialization methods for `RemoteWhisperTranscriber` (#5650)

* serialization methods for RemoteWhisperTranscriber

* remove patches

* Add default to_dict and from_dict in document stores built with factory (#5674)

* fix tests (#5671)

* chore: simplify serialization methods for `MemoryDocumentStore` (#5667)

* simplify serialization for MemoryDocumentStore

* remove redundant tests

* pylint

* chore: serialization methods for `MemoryRetriever` (#5663)

* serialization method for MemoryRetriever

* more tests

* remove hash from default_document_store_to_dict

* remove diff in factory.py

* chore: serialization methods for `DocumentWriter` (#5661)

* serialization methods for DocumentWriter

* more tests

* use factory

* black

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-29 18:15:07 +02:00
Silvano Cerza
66f615a3a4
Remove BaseTestComponent (#5613)
* Remove BaseTestComponent

* Add release notes
2023-08-23 17:03:37 +02:00
ZanSara
5ca4874df9
Migrate existing v2 components to Canals 0.4.0 (#5532)
* pin canals==0.4.0

* update audio components

* allow audio components to receive whisper_params in init too

* migrating memoryretriever

* migrate memoryretriever

* migrate TextFileToDocument

* fix TextFileToDocument tests

* fix pipeline tests

* fix defaults management

* reno

* inverted assignments

* Simplify release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-09 15:51:32 +02:00
bogdankostic
a51ca19fe4
feat: Add TextFileToDocument component (v2) (#5467)
* Add TextfileToDocument component

* Add docstrings

* Add unit tests

* Add release note file

* Make use of progress bar

* Add TextfileToDocument to __init__.py

* Use lazy % formatting in logging functions

* Remove f from non-f-string

* Add TextfileToDocument to __init__.py

* Use correct dependency extra

* Compare file path against path object

* PR feedback

* PR feedback

* Update haystack/preview/components/file_converters/txt.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update docstrings

* Add error handling

* Add unit test

* Reintroduce falsely removed caplog

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-01 11:34:52 +02:00