3803 Commits

Author SHA1 Message Date
James Briggs
520b23ec1b
fix: pinecone metadata format (#3660)
* fix for multilevel metadata dictionaries

* add metadata dict formating to update function

* typing

* added check for labels meta

* added more info to input parameters

* added test for multilayer metadata

* removed todo
2022-12-13 10:11:24 +01:00
github-actions[bot]
5405d9d7f8
Update unstable version and openapi schema (#3700)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-12-13 09:59:52 +01:00
Sara Zan
eba518a589
add trailing newlines to make end-of-file-fixer happy (#3699) 2022-12-12 14:42:25 +01:00
tstadel
600dc2d611
refactor: filters type (#3682)
* consolidate filters type

* remove unnecessary optionals

* fix mypy

* fix pylint

* fix pylint

* move FilterType to schema

* remove Optional from FilterType

* move to Dict[str, Any]

* Revert "move to Dict[str, Any]"

This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5.

* fix mypy

* fix pylint

* revert isort changes in elasticsearch

* remove todos in milvus.py

* remove todos in sql.py

* add aggregate_labels tests

* consolidate aggregate_labels tests

* remove superfluous type todos

* remove ALL superfluous #todos
2022-12-12 14:04:29 +01:00
Sara Zan
8e3c7bc6be
fix: pin espnet in the audio extra (#3693)
* downgrade pytorch in the audio extra

* pin torch

* remove torch pin and pin espnet

* add comment
2022-12-12 13:09:26 +01:00
Sara Zan
b1fc912859
refactor: remove test extra (#3679)
* remove test extra, make dev install all

* remove all from dev

* reduce diff
2022-12-12 11:22:03 +01:00
Sara Zan
642fa3a6b7
fix typing (#3680) 2022-12-12 11:20:48 +01:00
Vladimir Blagojevic
c28f6688f5
proposal: New EmbeddingRetriever for Haystack 2.0 (#3558)
* Add EmbeddingRetriever proposal

* Update with Sara's feedback

* Consistent naming
2022-12-12 10:06:35 +01:00
Unai Garay Maestre
77cea8b140
feat: Adds all_terms_must_match parameter to BM25Retriever at runtime (#3627)
* Adds all_terms_must_match implementation and tests

* Adds all_terms_must_match as Optional

Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com>

* Avoid mypy error and follow pattern checking var is None

* Mypy works ok on this file now

* added mypy ignores to BaseRetriever

* ignoring all overrides for this file

* Updates sparse retriever `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retriever `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retriever `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retrieve_batch `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retrieve_batch `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retrieve_batch `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* marked elasticsearch

Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-08 17:18:43 +05:30
tstadel
c1c1c97bb2
feat: add query_by_embedding_batch (#3546)
* add query_by_embedding_batch

* fix mypy

* fix pylint

* add test

* move query_by_embedding_batch to search_engine

* fix and add tests

* fix pylint

* remove Retriever query logs

* add test for multimodal batch retrieval

* allow for np.ndarray
2022-12-08 08:28:43 +01:00
Sebastian
25bf95d47f
Update table reader tests to include checking the score of answers. (#3641) 2022-12-07 07:30:49 -08:00
Stefano Fiorucci
399d8f1668
monkey patch sklearn (#3678) 2022-12-07 10:31:32 +01:00
Vladimir Blagojevic
18444427da
Use from tqdm.auto import tqdm instead of from tqdm import tqdm (#3672) 2022-12-06 22:53:41 +01:00
Sara Zan
0c71849e4a
remove beir from all-gpu (#3669) 2022-12-06 14:56:27 +01:00
Sara Zan
fc89f6ea74
fix: revert Weaviate query with filters and improve tests (#3646)
* revert weaviate query with filters and improve tests

* pylint

* upgrade weaviate container

* use latest docker tag

* fix text

* fix text
2022-12-06 14:48:58 +01:00
Vladimir Blagojevic
e4c3817d01
Adjust get_type() method for pipelines (#3657) 2022-12-02 14:48:47 +01:00
Julian Risch
adb580b6b7
feat: add offsets_in_context to evaluation result (#3640)
* add offsets_in_context to eval result

* extend test case
2022-11-30 11:43:42 +01:00
Massimiliano Pippi
af06519fc4
re-enable hooks (#3629) 2022-11-29 09:00:45 +01:00
Sebastian
c7c2235874
Move all of the forward pass to under torch.no_grad() (#3636) 2022-11-29 08:59:49 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class (#3637)
* move more tests to the base class

* skip tests where unsupported

* do not pass index label explicitly

* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Ivan Lopez
839eef6695
fix rest_api paths in docker-compose-gpu.yml (#3532)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-29 07:47:14 +01:00
Mayank Jobanputra
95cf666a20
refactor: change MultiModal retriever to be of type DenseRetriever (#3598)
* changed Multimodal retriever to be of type DenseRetriever

* format fix

* Pylint fix

* Added embed_queries and tests
2022-11-28 19:24:22 +01:00
Massimiliano Pippi
6f9a0f2215
use 9200 as the default port in launch_opensearch (#3630) 2022-11-28 19:06:45 +05:30
Sara Zan
eb7b9452d0
refactor: Weaviate query with filters (#3628) 2022-11-28 12:26:33 +01:00
Branden Chan
4a83b2049d
docs: Reformat code blocks in docstrings (#3580)
* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* fix: discard metadata fields if not set in Weaviate (#3578)

* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment

* Update unstable version and openapi schema (#3584)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: Flatten `DocumentClassifier` output in `SQLDocumentStore`; remove `_sql_session_rollback` hack in tests (#3273)

* first draft

* fix

* fix

* move test to test_sql

* test: add test to check id_hash_keys is not ignored (#3577)

* refactor: Generate JSON schema when missing (#3533)

* removed unused script

* print info logs when generating openapi schema

* create json schema only when needed

* fix tests

* Remove leftover

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* move milvus tests to their own module (#3596)

* feat: store metadata using JSON in SQLDocumentStore (#3547)

* add warnings

* make the field cachable

* review comment

* Pin faiss-cpu as 1.7.3 seems to have problems (#3603)

* Update Haystack imports (#3599)

* Update Python version (#3602)

* fix: `ParsrConverter` fails on pages without text (#3605)

* try to fix bug

* remove print

* leftover

* refactor: update Squad data  (#3513)

* refractor the to_squad data class

* fix the validation label

* refractor the to_squad data class

* fix the validation label

* add the test for the to_label object function

* fix the tests for to_label_objects

* move all the test related to squad data to one file

* remove unused imports

* revert tiny_augmented.json

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* Url fixes (#3592)

* add 2 example scripts

* fixing faq script

* fixing some urls

* removing example scripts

* black reformatting

* add labeler to the repo (#3609)

* convert eval metrics to python float (#3612)

* feat: add support for `BM25Retriever` in `InMemoryDocumentStore` (#3561)

* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents

* Incorporate Reviewer feedback

* refactor: replace `torch.no_grad` with `torch.inference_mode` (where possible) (#3601)

* try to replace torch.no_grad

* revert erroneous change

* revert other module breaking

* revert training/base

* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* Incorporate Reviewer feedback

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Espoir Murhabazi <espoir.mur@gmail.com>
Co-authored-by: Tuana Celik <tuana.celik@deepset.ai>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-11-28 09:21:07 +01:00
Massimiliano Pippi
c6890c3e86
chore: remove redundant tests (#3620)
* remove redundant tests

* skip test on win

* fix missing import

* revert mistake

* revert
2022-11-25 20:55:21 +05:30
Tuana Celik
ed7d03665d
fixing the url for document merger (#3615) 2022-11-25 14:40:55 +01:00
Massimiliano Pippi
ddeaf2c98c
clean up colab dependencies (#3626) 2022-11-24 18:37:57 +01:00
Tuana Celik
0771cf1cce
Update CONTRIBUTING.md (#3624) 2022-11-24 13:59:49 +00:00
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore tests to their own class (#3614)
* move tests to their own class

* move more tests

* add specific job

* fix test

* Update test/document_stores/test_memory.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
tstadel
0e05f71f33
fix return type typing (#3617) 2022-11-23 10:23:20 +01:00
Stefano Fiorucci
f43bc562d3
refactor: replace torch.no_grad with torch.inference_mode (where possible) (#3601)
* try to replace torch.no_grad

* revert erroneous change

* revert other module breaking

* revert training/base
2022-11-23 09:26:11 +01:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever in InMemoryDocumentStore (#3561)
* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents
2022-11-22 09:24:52 +01:00
tstadel
0d45cbce56
convert eval metrics to python float (#3612) 2022-11-22 09:05:10 +01:00
Massimiliano Pippi
2fadcf2859
add labeler to the repo (#3609) 2022-11-21 20:49:25 +05:30
Tuana Celik
78ec528e26
Url fixes (#3592)
* add 2 example scripts

* fixing faq script

* fixing some urls

* removing example scripts

* black reformatting
2022-11-21 11:26:16 +01:00
Espoir Murhabazi
d114a994f1
refactor: update Squad data (#3513)
* refractor the to_squad data class

* fix the validation label

* refractor the to_squad data class

* fix the validation label

* add the test for the to_label object function

* fix the tests for to_label_objects

* move all the test related to squad data to one file

* remove unused imports

* revert tiny_augmented.json

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-21 11:06:14 +01:00
Stefano Fiorucci
5f62494105
fix: ParsrConverter fails on pages without text (#3605)
* try to fix bug

* remove print

* leftover
2022-11-21 10:54:40 +01:00
Massimiliano Pippi
7e0aa82eb8
Update Python version (#3602) 2022-11-21 10:16:47 +01:00
Branden Chan
f85ead431a
Update Haystack imports (#3599) 2022-11-21 10:15:57 +01:00
Massimiliano Pippi
c7e3483f62
Pin faiss-cpu as 1.7.3 seems to have problems (#3603) 2022-11-18 09:18:24 +01:00
Massimiliano Pippi
ea75e2aab5
feat: store metadata using JSON in SQLDocumentStore (#3547)
* add warnings

* make the field cachable

* review comment
2022-11-18 08:26:19 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module (#3596) 2022-11-17 16:22:02 +01:00
Massimiliano Pippi
6cd0e337d0
refactor: Generate JSON schema when missing (#3533)
* removed unused script

* print info logs when generating openapi schema

* create json schema only when needed

* fix tests

* Remove leftover

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-17 11:09:27 +01:00
Julian Risch
8052632b64
test: add test to check id_hash_keys is not ignored (#3577) 2022-11-17 09:25:02 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier output in SQLDocumentStore; remove _sql_session_rollback hack in tests (#3273)
* first draft

* fix

* fix

* move test to test_sql
2022-11-16 12:20:57 +01:00
github-actions[bot]
af78f8b431
Update unstable version and openapi schema (#3584)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-11-16 10:09:40 +01:00
Massimiliano Pippi
ba75d39029
fix: discard metadata fields if not set in Weaviate (#3578)
* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment
2022-11-15 22:02:53 +01:00
tstadel
6ce2d296f4
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta (#3572)
* fix meta bug

* adjust brownfield test
2022-11-15 12:13:21 +01:00
Mayank Jobanputra
3098440a27
bug: fix release number (#3559)
* Added haystack version in docker base build

* test version -- name's bond
2022-11-15 16:31:10 +05:30