19 Commits

Author SHA1 Message Date
tstadel
7625829684
fix: EvaluationResult serialization changes dataframes (#4906)
* fix nan and index values

* add test

* make test for None values after evalresult read explicit
2023-05-16 16:03:09 +02:00
Sebastian
eff420cce0
test: Update unit tests for schema (#4835)
* Updated text_label tests to match tabel_label tests. Also added answer text as part of the Answer.__eq__ comparison.

* Updated text document unit tests to match ones from table docs

* Converting text answer unit tests to match table answer

* Update some document tests

* Minor update

* Separating unit tests
2023-05-10 16:16:45 +02:00
Sebastian
a67ca289db
refactor: Update schema objects to handle Dataframes in to_{dict,json} and from_{dict,json} (#4747)
* Adding support for table Documents when serializing Labels in Haystack

* Fix table label equality test

* Add serialization support and __eq__ support for table answers

* Made convenience functions for converting dataframes. Added some TODOs. Epxanded schema tests for table labels. Updated Multilabel to not convert Dataframes into strings.

* get Answer and Label to_json working with DataFrame

* Fix from_dict method of Label

* Use Dict and remove unneccessary if check

* Using pydantic instead of builtins for type detection

* Update haystack/schema.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/schema.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/schema.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Separated table label equivalency tests and added pytest.mark.unit


* Added unit test for _dict_factory

* Using more descriptive variable names

* Adding json files to test to_json and from_json functions

* Added sample files for tests

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-03 09:42:07 +02:00
Sebastian
8d9136bad4
feat: Implementation of Table Cell Proposal (#4616)
* Starting adding support for TableCell

* Update tests to use row and col

* Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables.

* Update eval test to use TableCell

* Added more schema tests for table docs, labels and answers.

* Add boolean to toggle between Span and TableCell

* Add deprecation message

* Test that table answers work as responses in the rest API

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-19 13:14:49 +02:00
Silvano Cerza
5ac3dffbef
test: Rework conftest (#4614)
* Split root conftest into multiple ones and remove unused fixtures

* Remove some constants and make them fixtures

* Remove unnecessary fixture scoping

* Fix failing whisper tests

* Fix image_file_paths fixture
2023-04-11 10:33:43 +02:00
Silvano Cerza
b59cf76093
refactor: Remove AnswerToSpeech and DocumentToSpeech nodes (#4391)
* Remove AnswerToSpeech and DocumentToSpeech nodes

* Remove unused dataclasses

* Remove unnecessary dependencies

* Remove unused error class and imports
2023-03-15 19:31:13 +01:00
Daniel Bichuetti
af6efbdcb0
refactor: Allow flexible document id generation (#4326) 2023-03-07 07:25:27 +01:00
bogdankostic
18e7b8399b
refactor: Remove id_hash_keys parameter in from_dict method (#4207)
* Remove id_hash_keys parameter in from_dict method

* Remove unused import

* Adapt `from_dict` of `SpeechDocument`

* Revert "Adapt `from_dict` of `SpeechDocument`"

This reverts commit 309cbeb7fbb3094c43be76d9e431db9391913144.

* Adapt `from_dict` of `SpeechDocument`
2023-02-20 17:37:35 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
ZanSara
94f660c56f
feat: store id_hash_keys in Document objects to make documents clonable (#3697)
* store id_hash_keys in Document objects

* fix id_hash_keys calls throughout codebase

* generate schema

* fix es

* fix weaviate

* backward compatible

* openapi schema

* remove unused deprecation warning

* remove unused imports

* openapi

* unused var

* Apply suggestions from code review

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/schema.py

* Apply suggestions from code review

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/schema.py

* review feedback

* trailing spaces

* pylint

* add deprecation test

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-01-23 15:00:52 +01:00
Stefano Fiorucci
e1401f79b6
refactor: improve Multilabel design (#3658)
* first try and new test

* fix test

* fix unused import

* remove comments

* no more dataclass

* add __eq__ and extend test

* better design from review

* Update schema.py

* fix black

* fix openapi

* fix openapi 2

* new try to fix openapi

* remove newline from openapi json
2022-12-13 10:45:56 +01:00
Julian Risch
8052632b64
test: add test to check id_hash_keys is not ignored (#3577) 2022-11-17 09:25:02 +01:00
Stefano Fiorucci
54ec13eaf7
refactor: Change no_answer attribute (#3411)
* always run validation

* update schemas

* no_answer as a property. break things!

* forgotten schema

* fix

* update openapi

* removed my unnecessary test

* fix sql document store

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-10-25 13:07:00 +02:00
Sara Zan
cbf44413d8
feat: add __cointains__ to Span (#3446)
* add __contains__

* add tests
2022-10-21 13:58:17 +02:00
Massimiliano Pippi
8fbccbda82
fix: handle Documents containing dataframes in Multilabel constructor (#3237)
* format

* fix docs
2022-09-19 14:59:20 +02:00
camille
f363b152ff
bug: make MultiLabel ids consistent across python interpreters (#2998)
* use hashlib.md5() instead of (interpreter dependent) hash() funtion to generate MultiLabel id

* add tests to assess constancy of MultiLabel.id

* make test_multilabel_id test ensure that MultiLabel ids are always the same
2022-08-10 09:43:21 +02:00
Stefano Fiorucci
09707b576a
Make MultiLabel preserve order (#2956)
* try simple approach

* added test

* add requested test
2022-08-09 15:53:24 +02:00
Sara Zan
584e046642
AnswerToSpeech (#2584)
* Add new audio answer primitives

* Add AnswerToSpeech

* Add dependency group

* Update Documentation & Code Style

* Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives

* Add tests

* Update Documentation & Code Style

* Add ability to compress audio and more tests

* Add audio group to test, all and all-gpu

* fix pylint

* Update Documentation & Code Style

* Accidental git tag

* Try pleasing mypy

* Update Documentation & Code Style

* fix pylint

* Add warning for missing OS library and support in CI

* Try fixing mypy

* Update Documentation & Code Style

* Add docs, simplify args for audio nodes and add tutorials

* Fix mypy

* Fix run_batch

* Feedback on tutorials

* fix mypy and pylint

* Fix mypy again

* Fix mypy yet again

* Fix the ci

* Fix dicts merge and install ffmpeg on CI

* Make the audio nodes import safe

* Trying to increase tolerance in audio test

* Fix import paths

* fix linter

* Update Documentation & Code Style

* Add audio libs in unit tests

* Update _text_to_speech.py

* Update answer_to_speech.py

* Use dedicated dataset & update telemetry

* Remove  and use distilled roberta

* Revert special primitives so that the nodes run in indexing

* Improve tutorials and fix smaller bugs

* Update Documentation & Code Style

* Fix serialization issue

* Update Documentation & Code Style

* Improve tutorial

* Update Documentation & Code Style

* Update _text_to_speech.py

* Minor lg updates

* Minor lg updates to tutorial

* Making indexing work in tutorials

* Update Documentation & Code Style

* Improve docstrings

* Try to use GPU when available

* Update Documentation & Code Style

* Fixi mypy and pylint

* Try to pass the device correctly

* Update Documentation & Code Style

* Use type of device

* use .cpu()

* Improve .ipynb

* update apt index to be able to download libsndfile1

* Fix SpeechDocument.from_dict()

* Change pip URL

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-06-15 10:13:18 +02:00
Sara Zan
ff4303c51b
[CI refactoring] Categorize tests into folders (#2554)
* Categorize tests into folders

* Fix linux_ci.yml and an import

* Wrong path
2022-05-17 09:55:53 +01:00