2539 Commits

Author SHA1 Message Date
Sara Zan
fc89f6ea74
fix: revert Weaviate query with filters and improve tests (#3646)
* revert weaviate query with filters and improve tests

* pylint

* upgrade weaviate container

* use latest docker tag

* fix text

* fix text
2022-12-06 14:48:58 +01:00
Vladimir Blagojevic
e4c3817d01
Adjust get_type() method for pipelines (#3657) 2022-12-02 14:48:47 +01:00
Julian Risch
adb580b6b7
feat: add offsets_in_context to evaluation result (#3640)
* add offsets_in_context to eval result

* extend test case
2022-11-30 11:43:42 +01:00
Massimiliano Pippi
af06519fc4
re-enable hooks (#3629) 2022-11-29 09:00:45 +01:00
Sebastian
c7c2235874
Move all of the forward pass to under torch.no_grad() (#3636) 2022-11-29 08:59:49 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class (#3637)
* move more tests to the base class

* skip tests where unsupported

* do not pass index label explicitly

* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Ivan Lopez
839eef6695
fix rest_api paths in docker-compose-gpu.yml (#3532)
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-29 07:47:14 +01:00
Mayank Jobanputra
95cf666a20
refactor: change MultiModal retriever to be of type DenseRetriever (#3598)
* changed Multimodal retriever to be of type DenseRetriever

* format fix

* Pylint fix

* Added embed_queries and tests
2022-11-28 19:24:22 +01:00
Massimiliano Pippi
6f9a0f2215
use 9200 as the default port in launch_opensearch (#3630) 2022-11-28 19:06:45 +05:30
Sara Zan
eb7b9452d0
refactor: Weaviate query with filters (#3628) 2022-11-28 12:26:33 +01:00
Branden Chan
4a83b2049d
docs: Reformat code blocks in docstrings (#3580)
* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* fix: discard metadata fields if not set in Weaviate (#3578)

* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment

* Update unstable version and openapi schema (#3584)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: Flatten `DocumentClassifier` output in `SQLDocumentStore`; remove `_sql_session_rollback` hack in tests (#3273)

* first draft

* fix

* fix

* move test to test_sql

* test: add test to check id_hash_keys is not ignored (#3577)

* refactor: Generate JSON schema when missing (#3533)

* removed unused script

* print info logs when generating openapi schema

* create json schema only when needed

* fix tests

* Remove leftover

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* move milvus tests to their own module (#3596)

* feat: store metadata using JSON in SQLDocumentStore (#3547)

* add warnings

* make the field cachable

* review comment

* Pin faiss-cpu as 1.7.3 seems to have problems (#3603)

* Update Haystack imports (#3599)

* Update Python version (#3602)

* fix: `ParsrConverter` fails on pages without text (#3605)

* try to fix bug

* remove print

* leftover

* refactor: update Squad data  (#3513)

* refractor the to_squad data class

* fix the validation label

* refractor the to_squad data class

* fix the validation label

* add the test for the to_label object function

* fix the tests for to_label_objects

* move all the test related to squad data to one file

* remove unused imports

* revert tiny_augmented.json

Co-authored-by: ZanSara <sarazanzo94@gmail.com>

* Url fixes (#3592)

* add 2 example scripts

* fixing faq script

* fixing some urls

* removing example scripts

* black reformatting

* add labeler to the repo (#3609)

* convert eval metrics to python float (#3612)

* feat: add support for `BM25Retriever` in `InMemoryDocumentStore` (#3561)

* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents

* Incorporate Reviewer feedback

* refactor: replace `torch.no_grad` with `torch.inference_mode` (where possible) (#3601)

* try to replace torch.no_grad

* revert erroneous change

* revert other module breaking

* revert training/base

* Fix docstrings for DocumentStores

* Fix docstrings for AnswerGenerator

* Fix docstrings for Connector

* Fix docstrings for DocumentClassifier

* Fix docstrings for LabelGenerator

* Fix docstrings for QueryClassifier

* Fix docstrings for Ranker

* Fix docstrings for Retriever and Summarizer

* Fix docstrings for Translator

* Fix docstrings for Pipelines

* Fix docstrings for Primitives

* Fix Python code block spacing

* Add line break before code block

* Fix code blocks

* Incorporate Reviewer feedback

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Espoir Murhabazi <espoir.mur@gmail.com>
Co-authored-by: Tuana Celik <tuana.celik@deepset.ai>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-11-28 09:21:07 +01:00
Massimiliano Pippi
c6890c3e86
chore: remove redundant tests (#3620)
* remove redundant tests

* skip test on win

* fix missing import

* revert mistake

* revert
2022-11-25 20:55:21 +05:30
Tuana Celik
ed7d03665d
fixing the url for document merger (#3615) 2022-11-25 14:40:55 +01:00
Massimiliano Pippi
ddeaf2c98c
clean up colab dependencies (#3626) 2022-11-24 18:37:57 +01:00
Tuana Celik
0771cf1cce
Update CONTRIBUTING.md (#3624) 2022-11-24 13:59:49 +00:00
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore tests to their own class (#3614)
* move tests to their own class

* move more tests

* add specific job

* fix test

* Update test/document_stores/test_memory.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
tstadel
0e05f71f33
fix return type typing (#3617) 2022-11-23 10:23:20 +01:00
Stefano Fiorucci
f43bc562d3
refactor: replace torch.no_grad with torch.inference_mode (where possible) (#3601)
* try to replace torch.no_grad

* revert erroneous change

* revert other module breaking

* revert training/base
2022-11-23 09:26:11 +01:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever in InMemoryDocumentStore (#3561)
* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents
2022-11-22 09:24:52 +01:00
tstadel
0d45cbce56
convert eval metrics to python float (#3612) 2022-11-22 09:05:10 +01:00
Massimiliano Pippi
2fadcf2859
add labeler to the repo (#3609) 2022-11-21 20:49:25 +05:30
Tuana Celik
78ec528e26
Url fixes (#3592)
* add 2 example scripts

* fixing faq script

* fixing some urls

* removing example scripts

* black reformatting
2022-11-21 11:26:16 +01:00
Espoir Murhabazi
d114a994f1
refactor: update Squad data (#3513)
* refractor the to_squad data class

* fix the validation label

* refractor the to_squad data class

* fix the validation label

* add the test for the to_label object function

* fix the tests for to_label_objects

* move all the test related to squad data to one file

* remove unused imports

* revert tiny_augmented.json

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-21 11:06:14 +01:00
Stefano Fiorucci
5f62494105
fix: ParsrConverter fails on pages without text (#3605)
* try to fix bug

* remove print

* leftover
2022-11-21 10:54:40 +01:00
Massimiliano Pippi
7e0aa82eb8
Update Python version (#3602) 2022-11-21 10:16:47 +01:00
Branden Chan
f85ead431a
Update Haystack imports (#3599) 2022-11-21 10:15:57 +01:00
Massimiliano Pippi
c7e3483f62
Pin faiss-cpu as 1.7.3 seems to have problems (#3603) 2022-11-18 09:18:24 +01:00
Massimiliano Pippi
ea75e2aab5
feat: store metadata using JSON in SQLDocumentStore (#3547)
* add warnings

* make the field cachable

* review comment
2022-11-18 08:26:19 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module (#3596) 2022-11-17 16:22:02 +01:00
Massimiliano Pippi
6cd0e337d0
refactor: Generate JSON schema when missing (#3533)
* removed unused script

* print info logs when generating openapi schema

* create json schema only when needed

* fix tests

* Remove leftover

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-17 11:09:27 +01:00
Julian Risch
8052632b64
test: add test to check id_hash_keys is not ignored (#3577) 2022-11-17 09:25:02 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier output in SQLDocumentStore; remove _sql_session_rollback hack in tests (#3273)
* first draft

* fix

* fix

* move test to test_sql
2022-11-16 12:20:57 +01:00
github-actions[bot]
af78f8b431
Update unstable version and openapi schema (#3584)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-11-16 10:09:40 +01:00
Massimiliano Pippi
ba75d39029
fix: discard metadata fields if not set in Weaviate (#3578)
* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment
2022-11-15 22:02:53 +01:00
tstadel
6ce2d296f4
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta (#3572)
* fix meta bug

* adjust brownfield test
2022-11-15 12:13:21 +01:00
Mayank Jobanputra
3098440a27
bug: fix release number (#3559)
* Added haystack version in docker base build

* test version -- name's bond
2022-11-15 16:31:10 +05:30
Massimiliano Pippi
0c1de3745d
fix milvus imports (#3576) 2022-11-15 10:58:51 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" (#3548)
* add_all fixes the bug

* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x (#3552)
* remove milvus1

* leftover

* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests (#3555)
* add pytest option to unmock pinecone

* first try

* handle missing answer

* fix labels metadata

* more tests

* adapt workflow

* typo

* address review comments
2022-11-14 15:19:15 +01:00
ju-gu
559730649b
fix: [rest_api] support TableQA in the endpoint /documents/get_by_filters (#3551)
* Update schema.py

* Update document.py
2022-11-14 15:09:14 +01:00
Massimiliano Pippi
ef558fa5e3
ignore proposals in release notes (#3564) 2022-11-14 14:42:27 +01:00
Massimiliano Pippi
7af22cd98c
CI: install httpx to run tests (#3565)
* install httpx to run tests

* try
2022-11-14 12:52:04 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests (#3541)
* refactor tests

* fix job

* revert

* revert

* revert

* use latest weaviate

* fix abstract methods signatures

* pass class_name to all the CRUD methods

* finish moving all the tests

* bump weaviate version

* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
da6b0dc66f
feat: introduce proposal design process (#3333)
* add RFC process

* migrate old ADR to the new process

* typo

* review comments

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* [skip ci] review feedback

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* [skip ci] leftover

* rename to proposals

* Adjust naming

* Update 2170-pydantic-dataclasses.md

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2022-11-11 12:49:23 +01:00
Vladimir Blagojevic
005025bd14
feat: include error message in HaystackError telemetry events(#3543)
* Telemetry: add message to all HaystackError(s)

* do not send message for DocumentStoreError and API node errors

* change default payload from None to {}

* add default send_message_in_event to NodeError and type annotations

* black

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-11-10 12:00:37 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests (#3537)
* fix write docs behaviour

* refactor FAISS tests

* do not remove the sqlite db

* try

* remove extra slash

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_faiss.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Sara Zan
9539a209ae
refactor: apply pep-484 (#3542)
* apply pep-484

* another implicit optional

* apply pep-484 on rest_api and ui too
2022-11-08 14:30:33 +01:00
Sara Zan
43b24fd1a7
fix: strip whitespaces safely from FARMReader's answers (#3526)
* remove .strip()

* check for right-side offset

* return the whitespace-cleaned answer

* lstrip, not rstrip :D

* remove int

* left_offset

* slightly refactor reader fixture

* extend test_output
2022-11-08 09:26:47 +01:00
Branden Chan
e6b7109164
docs: Update docker readme (#3531)
* Update docker readme

* Make language changes
2022-11-08 09:06:18 +01:00