1724 Commits

Author SHA1 Message Date
Sebastian
756e0114e6
refactor: Remove duplicate code in TableReader (#3708)
* Refactor table reader to use util functions to reduce code duplication.

* Expanding the tests for the table reader

* Adding types

* Updating tests to work for RCIReader

* Fix bug in RCIReader. Saving the wrong queries list.

* Update _flatten_inputs to not change input variable

* Remove duplicate code
2022-12-21 14:33:19 +01:00
bogdankostic
12c264603e
fix: Fix number of concurrent requests in RequestLimiter (#3705) 2022-12-21 11:40:33 +01:00
Stefano Fiorucci
82ad408a74
refactor: remove unused code in TfidfRetriever (#3733) 2022-12-20 17:51:46 +01:00
Vladimir Blagojevic
9ebf164cfd
feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate (#3667)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-12-20 11:21:26 +01:00
Stefano Fiorucci
559f6e0569
better compatibility with different versions of sklearn (#3732) 2022-12-20 09:59:36 +01:00
Zoltan Fedor
e143f7cc36
Fixing broken BM25 support with Weaviate - fixes #3720 (#3723)
* Fixing broken BM25 support with Weaviate - fixes #3720

Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit.

Please see more under issue #3720.

* Fixing mypy issue - method signature wasn't matching the base class

* Mypy related test fix

Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime.
I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests.

* Adding a note regarding an upcomming fix in Weaviate v1.17.0

* Apply suggestions from code review

* revert

* [EMPTY] Re-trigger CI
2022-12-19 17:24:46 +01:00
Vladimir Blagojevic
56803e5465
feat: Enable text-embedding-ada-002 for EmbeddingRetriever (#3721)
* Enable text-embedding-ada-002 for EmbeddingRetriever

* Easier to understand code, more unit tests
2022-12-19 17:06:48 +01:00
Massimiliano Pippi
8edfd8978e
Update the proposals process (#3718)
* update the proposals process

* add stalebot to manage proposals lifecycle

* typo

* Update 0000-template.md

* clarify PR labelling staying away from implementation details
2022-12-19 14:35:07 +01:00
Sebastian
d7fabb569b
feat: Use torch.inference_mode() for TableQA (#3731)
* Update to make inference_mode work in TableQA

* Update variable names

* Added torch.inference_mode() for the RCIReader model forward passes
2022-12-19 13:07:07 +01:00
Stefano Fiorucci
5b9c661155
feat: add index parameter to TfidfRetriever (#3666)
* first draft to add index param to tfidf

* better mypy handling

* Revert "better mypy handling"

This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c.

* new check in auto_fit

* new check also in retrieve

* better dict typings

* new test and improvements to other test

* remove unnecessary lambda

* improve test

* remove newline from openapi json

* fix test

* language fix

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 2

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 3

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 4

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 5

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* language fix 6

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* explicit index value handling

* fix test

* better error messages

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-19 12:07:49 +01:00
Agnieszka Marzec
a1d8557c80
Update the readme action version (#3726)
* Update the readme action version

Updated the rdme action version to the latest one.

* Update the version
2022-12-19 10:23:05 +01:00
Zoltan Fedor
3990697869
Fixing the query_batch method of the deepsetcloud document store - … (#3724)
* Fixing the `query_batch` method of the deepsetcloud document store - fixes #3722

* Trigger Build

* Trigger Build

* Trigger CI

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-12-19 09:57:26 +01:00
Julian Risch
adde194b04
build: upgrade torch and let transformers pick the version (#3727)
* test torch 1.13.1 release

* let transformers handle torch version
2022-12-16 21:33:01 +05:30
Vladimir Blagojevic
42926596e4
Update cohere embedding models (#3704) 2022-12-16 16:49:59 +01:00
Sebastian
4afdbc33b2
fix: Removed overlooked torch scatter references (#3719)
* Removed torch scatter references

* Add back /
2022-12-16 10:36:19 +01:00
Vladimir Blagojevic
c69222faf4
Add PromptNode proposal (#3665) 2022-12-16 10:27:58 +01:00
Agnieszka Marzec
a23f425877
Fix lg (#3725) 2022-12-16 09:43:22 +01:00
Sebastian
54bf7ad343
Remove && \ from end of line (#3710) 2022-12-13 21:29:18 +05:30
Sebastian
d0f786af9f
feat: Bump transformers version to remove torch scatter dependency (#3703)
* Bump transformers version so we can remove torch scatter dependency

* manual re-merge

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2022-12-13 18:33:07 +05:30
Sara Zan
f24cbdbb5d
remove beir from the base GPU image (#3692) 2022-12-13 11:11:58 +01:00
Stefano Fiorucci
e1401f79b6
refactor: improve Multilabel design (#3658)
* first try and new test

* fix test

* fix unused import

* remove comments

* no more dataclass

* add __eq__ and extend test

* better design from review

* Update schema.py

* fix black

* fix openapi

* fix openapi 2

* new try to fix openapi

* remove newline from openapi json
2022-12-13 10:45:56 +01:00
James Briggs
520b23ec1b
fix: pinecone metadata format (#3660)
* fix for multilevel metadata dictionaries

* add metadata dict formating to update function

* typing

* added check for labels meta

* added more info to input parameters

* added test for multilayer metadata

* removed todo
2022-12-13 10:11:24 +01:00
github-actions[bot]
5405d9d7f8
Update unstable version and openapi schema (#3700)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-12-13 09:59:52 +01:00
Sara Zan
eba518a589
add trailing newlines to make end-of-file-fixer happy (#3699) 2022-12-12 14:42:25 +01:00
tstadel
600dc2d611
refactor: filters type (#3682)
* consolidate filters type

* remove unnecessary optionals

* fix mypy

* fix pylint

* fix pylint

* move FilterType to schema

* remove Optional from FilterType

* move to Dict[str, Any]

* Revert "move to Dict[str, Any]"

This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5.

* fix mypy

* fix pylint

* revert isort changes in elasticsearch

* remove todos in milvus.py

* remove todos in sql.py

* add aggregate_labels tests

* consolidate aggregate_labels tests

* remove superfluous type todos

* remove ALL superfluous #todos
2022-12-12 14:04:29 +01:00
Sara Zan
8e3c7bc6be
fix: pin espnet in the audio extra (#3693)
* downgrade pytorch in the audio extra

* pin torch

* remove torch pin and pin espnet

* add comment
2022-12-12 13:09:26 +01:00
Sara Zan
b1fc912859
refactor: remove test extra (#3679)
* remove test extra, make dev install all

* remove all from dev

* reduce diff
2022-12-12 11:22:03 +01:00
Sara Zan
642fa3a6b7
fix typing (#3680) 2022-12-12 11:20:48 +01:00
Vladimir Blagojevic
c28f6688f5
proposal: New EmbeddingRetriever for Haystack 2.0 (#3558)
* Add EmbeddingRetriever proposal

* Update with Sara's feedback

* Consistent naming
2022-12-12 10:06:35 +01:00
Unai Garay Maestre
77cea8b140
feat: Adds all_terms_must_match parameter to BM25Retriever at runtime (#3627)
* Adds all_terms_must_match implementation and tests

* Adds all_terms_must_match as Optional

Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com>

* Avoid mypy error and follow pattern checking var is None

* Mypy works ok on this file now

* added mypy ignores to BaseRetriever

* ignoring all overrides for this file

* Updates sparse retriever `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retriever `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retriever `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retrieve_batch `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retrieve_batch `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Updates sparse retrieve_batch `all_terms_must_match` docstring

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* marked elasticsearch

Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-08 17:18:43 +05:30
tstadel
c1c1c97bb2
feat: add query_by_embedding_batch (#3546)
* add query_by_embedding_batch

* fix mypy

* fix pylint

* add test

* move query_by_embedding_batch to search_engine

* fix and add tests

* fix pylint

* remove Retriever query logs

* add test for multimodal batch retrieval

* allow for np.ndarray
2022-12-08 08:28:43 +01:00
Sebastian
25bf95d47f
Update table reader tests to include checking the score of answers. (#3641) 2022-12-07 07:30:49 -08:00
Stefano Fiorucci
399d8f1668
monkey patch sklearn (#3678) 2022-12-07 10:31:32 +01:00
Vladimir Blagojevic
18444427da
Use from tqdm.auto import tqdm instead of from tqdm import tqdm (#3672) 2022-12-06 22:53:41 +01:00
Sara Zan
0c71849e4a
remove beir from all-gpu (#3669) 2022-12-06 14:56:27 +01:00
Sara Zan
fc89f6ea74
fix: revert Weaviate query with filters and improve tests (#3646)
* revert weaviate query with filters and improve tests

* pylint

* upgrade weaviate container

* use latest docker tag

* fix text

* fix text
2022-12-06 14:48:58 +01:00
Vladimir Blagojevic
e4c3817d01
Adjust get_type() method for pipelines (#3657) 2022-12-02 14:48:47 +01:00
Julian Risch
adb580b6b7
feat: add offsets_in_context to evaluation result (#3640)
* add offsets_in_context to eval result

* extend test case
2022-11-30 11:43:42 +01:00
Massimiliano Pippi
af06519fc4
re-enable hooks (#3629) 2022-11-29 09:00:45 +01:00
Sebastian
c7c2235874
Move all of the forward pass to under torch.no_grad() (#3636) 2022-11-29 08:59:49 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class (#3637)
* move more tests to the base class

* skip tests where unsupported

* do not pass index label explicitly

* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Ivan Lopez
839eef6695
fix rest_api paths in docker-compose-gpu.yml (#3532)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-29 07:47:14 +01:00
Mayank Jobanputra
95cf666a20
refactor: change MultiModal retriever to be of type DenseRetriever (#3598)
* changed Multimodal retriever to be of type DenseRetriever

* format fix

* Pylint fix

* Added embed_queries and tests
2022-11-28 19:24:22 +01:00
Massimiliano Pippi
6f9a0f2215
use 9200 as the default port in launch_opensearch (#3630) 2022-11-28 19:06:45 +05:30
Sara Zan
eb7b9452d0
refactor: Weaviate query with filters (#3628) 2022-11-28 12:26:33 +01:00
Branden Chan
4a83b2049d
docs: Reformat code blocks in docstrings (#3580)
* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* fix: discard metadata fields if not set in Weaviate (#3578)

* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment

* Update unstable version and openapi schema (#3584)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: Flatten `DocumentClassifier` output in `SQLDocumentStore`; remove `_sql_session_rollback` hack in tests (#3273)

* first draft

* fix

* fix

* move test to test_sql

* test: add test to check id_hash_keys is not ignored (#3577)

* refactor: Generate JSON schema when missing (#3533)

* removed unused script

* print info logs when generating openapi schema

* create json schema only when needed

* fix tests

* Remove leftover

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* move milvus tests to their own module (#3596)

* feat: store metadata using JSON in SQLDocumentStore (#3547)

* add warnings

* make the field cachable

* review comment

* Pin faiss-cpu as 1.7.3 seems to have problems (#3603)

* Update Haystack imports (#3599)

* Update Python version (#3602)

* fix: `ParsrConverter` fails on pages without text (#3605)

* try to fix bug

* remove print

* leftover

* refactor: update Squad data  (#3513)

* refractor the to_squad data class

* fix the validation label

* refractor the to_squad data class

* fix the validation label

* add the test for the to_label object function

* fix the tests for to_label_objects

* move all the test related to squad data to one file

* remove unused imports

* revert tiny_augmented.json

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* Url fixes (#3592)

* add 2 example scripts

* fixing faq script

* fixing some urls

* removing example scripts

* black reformatting

* add labeler to the repo (#3609)

* convert eval metrics to python float (#3612)

* feat: add support for `BM25Retriever` in `InMemoryDocumentStore` (#3561)

* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents

* Incorporate Reviewer feedback

* refactor: replace `torch.no_grad` with `torch.inference_mode` (where possible) (#3601)

* try to replace torch.no_grad

* revert erroneous change

* revert other module breaking

* revert training/base

* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* Incorporate Reviewer feedback

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Espoir Murhabazi <espoir.mur@gmail.com>
Co-authored-by: Tuana Celik <tuana.celik@deepset.ai>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-11-28 09:21:07 +01:00
Massimiliano Pippi
c6890c3e86
chore: remove redundant tests (#3620)
* remove redundant tests

* skip test on win

* fix missing import

* revert mistake

* revert
2022-11-25 20:55:21 +05:30
Tuana Celik
ed7d03665d
fixing the url for document merger (#3615) 2022-11-25 14:40:55 +01:00
Massimiliano Pippi
ddeaf2c98c
clean up colab dependencies (#3626) 2022-11-24 18:37:57 +01:00
Tuana Celik
0771cf1cce
Update CONTRIBUTING.md (#3624) 2022-11-24 13:59:49 +00:00