3803 Commits

Author SHA1 Message Date
Massimiliano Pippi
0c1de3745d
fix milvus imports (#3576) 2022-11-15 10:58:51 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" (#3548)
* add_all fixes the bug

* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x (#3552)
* remove milvus1

* leftover

* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests (#3555)
* add pytest option to unmock pinecone

* first try

* handle missing answer

* fix labels metadata

* more tests

* adapt workflow

* typo

* address review comments
2022-11-14 15:19:15 +01:00
ju-gu
559730649b
fix: [rest_api] support TableQA in the endpoint /documents/get_by_filters (#3551)
* Update schema.py

* Update document.py
2022-11-14 15:09:14 +01:00
Massimiliano Pippi
ef558fa5e3
ignore proposals in release notes (#3564) 2022-11-14 14:42:27 +01:00
Massimiliano Pippi
7af22cd98c
CI: install httpx to run tests (#3565)
* install httpx to run tests

* try
2022-11-14 12:52:04 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests (#3541)
* refactor tests

* fix job

* revert

* revert

* revert

* use latest weaviate

* fix abstract methods signatures

* pass class_name to all the CRUD methods

* finish moving all the tests

* bump weaviate version

* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
da6b0dc66f
feat: introduce proposal design process (#3333)
* add RFC process

* migrate old ADR to the new process

* typo

* review comments

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* [skip ci] review feedback

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* [skip ci] leftover

* rename to proposals

* Adjust naming

* Update 2170-pydantic-dataclasses.md

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2022-11-11 12:49:23 +01:00
Vladimir Blagojevic
005025bd14
feat: include error message in HaystackError telemetry events(#3543)
* Telemetry: add message to all HaystackError(s)

* do not send message for DocumentStoreError and API node errors

* change default payload from None to {}

* add default send_message_in_event to NodeError and type annotations

* black

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-11-10 12:00:37 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests (#3537)
* fix write docs behaviour

* refactor FAISS tests

* do not remove the sqlite db

* try

* remove extra slash

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_faiss.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Sara Zan
9539a209ae
refactor: apply pep-484 (#3542)
* apply pep-484

* another implicit optional

* apply pep-484 on rest_api and ui too
2022-11-08 14:30:33 +01:00
Sara Zan
43b24fd1a7
fix: strip whitespaces safely from FARMReader's answers (#3526)
* remove .strip()

* check for right-side offset

* return the whitespace-cleaned answer

* lstrip, not rstrip :D

* remove int

* left_offset

* slightly refactor reader fixture

* extend test_output
2022-11-08 09:26:47 +01:00
Branden Chan
e6b7109164
docs: Update docker readme (#3531)
* Update docker readme

* Make language changes
2022-11-08 09:06:18 +01:00
Massimiliano Pippi
af96e002a4
merge black job into testing workflow (#3539) 2022-11-07 20:01:02 +05:30
Mayank Jobanputra
794fe5ffa4
bug: didn't clean up model files after running pytest for test_table_text_retriever_training (#3534)
* Added tmp path to avoid clean up of model files later
2022-11-07 15:07:04 +05:30
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job (#3529)
* move dC tests to their own module and job

* restore global var

* revert
2022-11-04 17:05:10 +01:00
Sara Zan
815017ad5b
Deploy the demo only manually (#3525) 2022-11-04 12:15:58 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests (#3517)
* port SQL tests

* cleanup document_store_tests.py from sql tests

* leftover

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_base.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Stefano Fiorucci
1a60e21137
refactor: simplify Summarizer, add Document Merger (#3452)
* remove generate_single_summary

* update schemas

* remove unused import

* fix mypy

* fix mypy

* test: summarizer doesnt change content

* other test correction

* move test_summarizer_translation to test_extractor_translation

* fix test

* first try for doc merger

* reintroduce and deprecate generate_single_summary

* progress in document merger

* document merger!

* mypy, pylint fixes

* use generator

* added test that will fail in 1.12

* adapt to review

* extended deprecation docstring

* Update test/nodes/test_extractor_translation.py

* Update test/nodes/test_summarizer.py

* Update test/nodes/test_summarizer.py

* black

* documents fixture

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-11-03 16:04:53 +01:00
Massimiliano Pippi
0a04dec808
Use a dedicated PAT (#3511)
* Use a dedicated PAT

* Update project.yml
2022-11-03 12:58:01 +01:00
bogdankostic
fc551b90ac
docs: Fix link directing to tutorials in README.md (#3516) 2022-11-02 18:11:03 +01:00
Sara Zan
f0be78c6a6
bug: remove useless import in conftest.py (#3362)
* Remove useless milvus import in conftest

* schemas

* schemas
2022-11-02 19:22:24 +05:30
Stefano Fiorucci
4b0894f4c2
fix: support long texts for labels in ElasticsearchDocumentStore (#3346) 2022-11-02 11:16:36 +01:00
Sara Zan
b93bbb1cab
refactor: upgrade actions version (#3506)
* upgrade actions version

* upgrade cache action too
2022-11-02 10:35:10 +01:00
Sara Zan
bb1d9983b0
refactor: remove YAML save/load methods for subclasses of BaseStandardPipeline (#3443)
* remove methods & update docstring

* remove irrelevant test
2022-11-02 10:14:33 +01:00
Branden Chan
0b2e71daf6
feat: Create the TextIndexingPipeline (#3473)
* Add TextIndexingPipeline

* Run Black formatting

* Incorporate reviewer feedback

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-01 10:52:37 +01:00
bogdankostic
60224412bc
feat: Add headline extraction to ParsrConverter (#3488)
* Add headline extraction to ParsrConverter

* Add sample PDF file

* Add test

* Use extract_headlines if set in convert method

* Integrate PR feedback
2022-10-31 19:00:02 +01:00
Sara Zan
8ddeda811a
generate docs for search.engine.py (#3507) 2022-10-31 16:57:39 +01:00
Massimiliano Pippi
9fe2f69d56
add workflow to triage new issues with GH projects (#3508) 2022-10-31 16:01:59 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring (#3449)
* add new marker

* start using test hierarchies

* move ES tests into their own class

* refactor test workflow

* job steps

* add more tests

* move more tests

* more tests

* test labels

* add more tests

* Update tests.yml

* Update tests.yml

* fix

* typo

* fix es image tag

* map es ports

* try

* fix

* default port

* remove opensearch from the markers sorcery

* revert

* skip new tests in old jobs

* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Mayank Jobanputra
85cdc1040a
Added telemetry changes (#3503) 2022-10-31 12:49:52 +01:00
Sara Zan
adc982a624
fix: do not reference package directory in PDFToTextOCRConverter.convert() (#3478)
* remove weird temp path from PDFToTextOCRConverter.convert()

* remove debug lines

* remove os import
2022-10-31 12:48:43 +01:00
Massimiliano Pippi
17cd79e2c8
[release process] Create new schema when bumping unstable (#3416)
* also create new schema when bumping unstable version

* openapi schema

* no need to update the json schema anymore
2022-10-31 12:26:48 +01:00
Sara Zan
54cc9cd4cf
refactor: remove json-schemas (#3485)
* remove json-schemas

* main schema can be removed too

* add .gitignore to schemas folder

* try to explicitly get the new haystack in the rest api tests

* fix workflow again

* fix version string in rest api tests

* add pip freeze

* debug statements in workflow

* -U prevents schema generation
2022-10-31 11:24:43 +01:00
Massimiliano Pippi
b52ed52c4e
fix docker minimal deprecated image (#3497) 2022-10-28 16:46:48 +02:00
Sebastian
384663981d
Fixed bug in onnx converter for XLMRoberta architecture (#3470) 2022-10-28 15:35:53 +02:00
Massimiliano Pippi
9f4a9a76a3
fix: pattern to match tags push (#3469) 2022-10-28 14:52:30 +02:00
Sara Zan
823d0d3006
Add Schemas badge on README.md (#3493) 2022-10-28 13:57:42 +02:00
Sara Zan
a66e7caa34
feat: hatch-autorun generates schemas (#3484)
* hatch-run generates the schemas

* fix path

* keep schemas for now

* fix path

* schemas

* Do not generate rc schemas

* make the autorun hook self-destroy

* typo

* schemas

* schemas were ok

* improve logs to make generate_schema.py usable standalone too

* fix warning

* Update warning

* Update generate_schema.py

* black
2022-10-28 13:55:11 +02:00
Massimiliano Pippi
1f9f4ab03a
fix: fix docs badge (#3491)
* fix: fix docs badge

* format

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-10-28 11:59:49 +02:00
Sara Zan
f377b78263
refactor: replace YAML schema check with a dispatch call (#3482)
* Replace yaml check with a dispatch call

* split workflow

* add branch for testing

* access secrets properly

* remove testing branch trigger
2022-10-28 10:48:59 +02:00
Sebastian
8db7dfb884
refactor: TableReader (#3456)
* Refactoring table reader
2022-10-26 20:57:28 +02:00
Sebastian
59857cb492
feat: Speed up reader tests (#3476)
* Use a smaller reader where possible

* Change scope to module of reader to get faster load times
2022-10-26 19:04:18 +02:00
Tuana Celik
a4002ae87c
Updating readme to point to new docs site (#3336)
* Updating readme to point to new docs site

* updating some links

* updating docs link
2022-10-26 17:28:46 +02:00
Sara Zan
dd774b867d
add missing schemas (#3480) 2022-10-26 17:27:42 +02:00
Sara Zan
05c68b6624
feat: add document_store to all BaseRetriever.retrieve() and BaseRetriever.retrieve_batch() implementations (#3379)
* add document_store to retrieve()]

* mypy & pylint

* pass docstore to embedding encoders

* schemas

* mypy and pylint

* fix tfidfretriever

* pylint

* mypy

* pylint

* fix tfidf

* mypy

* pylint

* schemas

* another fix for tfidf

* fix question generation tests

* remove docstore from embedding encoder signature

* pylint

* revert accidental test changes

* Apply suggestions from code review

* check for docstore similarity function only if the docstore is present

* check for docstore similarity function only if the docstore is present
2022-10-26 15:47:06 +02:00
Julian Risch
d0691a4bd5
bug: replace decorator with counter attribute for pipeline event (#3462) 2022-10-26 12:09:04 +02:00
bogdankostic
4fbe80c098
feat: Extraction of headlines in markdown files (#3445)
* Extract headings from markdown files + adapt PreProcessor

* Add tests

* Fix mypy

* Generate JSON schema

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/nodes/file_converter/markdown.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply black

* Add PR feedback

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-10-26 11:57:55 +02:00
Vladimir Blagojevic
5ca96357ff
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever (#3453) 2022-10-25 17:52:29 +02:00