Massimiliano Pippi
0c1de3745d
fix milvus imports ( #3576 )
2022-11-15 10:58:51 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" ( #3548 )
...
* add_all fixes the bug
* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x ( #3552 )
...
* remove milvus1
* leftover
* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests ( #3555 )
...
* add pytest option to unmock pinecone
* first try
* handle missing answer
* fix labels metadata
* more tests
* adapt workflow
* typo
* address review comments
2022-11-14 15:19:15 +01:00
ju-gu
559730649b
fix: [rest_api] support TableQA in the endpoint /documents/get_by_filters
( #3551 )
...
* Update schema.py
* Update document.py
2022-11-14 15:09:14 +01:00
Massimiliano Pippi
ef558fa5e3
ignore proposals in release notes ( #3564 )
2022-11-14 14:42:27 +01:00
Massimiliano Pippi
7af22cd98c
CI: install httpx to run tests ( #3565 )
...
* install httpx to run tests
* try
2022-11-14 12:52:04 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests ( #3541 )
...
* refactor tests
* fix job
* revert
* revert
* revert
* use latest weaviate
* fix abstract methods signatures
* pass class_name to all the CRUD methods
* finish moving all the tests
* bump weaviate version
* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
da6b0dc66f
feat: introduce proposal design process ( #3333 )
...
* add RFC process
* migrate old ADR to the new process
* typo
* review comments
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* [skip ci] review feedback
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* [skip ci] leftover
* rename to proposals
* Adjust naming
* Update 2170-pydantic-dataclasses.md
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2022-11-11 12:49:23 +01:00
Vladimir Blagojevic
005025bd14
feat: include error message in HaystackError telemetry events( #3543 )
...
* Telemetry: add message to all HaystackError(s)
* do not send message for DocumentStoreError and API node errors
* change default payload from None to {}
* add default send_message_in_event to NodeError and type annotations
* black
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-11-10 12:00:37 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests ( #3537 )
...
* fix write docs behaviour
* refactor FAISS tests
* do not remove the sqlite db
* try
* remove extra slash
* Apply suggestions from code review
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_faiss.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Sara Zan
9539a209ae
refactor: apply pep-484 ( #3542 )
...
* apply pep-484
* another implicit optional
* apply pep-484 on rest_api and ui too
2022-11-08 14:30:33 +01:00
Sara Zan
43b24fd1a7
fix: strip whitespaces safely from FARMReader
's answers ( #3526 )
...
* remove .strip()
* check for right-side offset
* return the whitespace-cleaned answer
* lstrip, not rstrip :D
* remove int
* left_offset
* slightly refactor reader fixture
* extend test_output
2022-11-08 09:26:47 +01:00
Branden Chan
e6b7109164
docs: Update docker readme ( #3531 )
...
* Update docker readme
* Make language changes
2022-11-08 09:06:18 +01:00
Massimiliano Pippi
af96e002a4
merge black job into testing workflow ( #3539 )
2022-11-07 20:01:02 +05:30
Mayank Jobanputra
794fe5ffa4
bug: didn't clean up model files after running pytest for test_table_text_retriever_training ( #3534 )
...
* Added tmp path to avoid clean up of model files later
2022-11-07 15:07:04 +05:30
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job ( #3529 )
...
* move dC tests to their own module and job
* restore global var
* revert
2022-11-04 17:05:10 +01:00
Sara Zan
815017ad5b
Deploy the demo only manually ( #3525 )
2022-11-04 12:15:58 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests ( #3517 )
...
* port SQL tests
* cleanup document_store_tests.py from sql tests
* leftover
* Update .github/workflows/tests.yml
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_base.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Stefano Fiorucci
1a60e21137
refactor: simplify Summarizer, add Document Merger ( #3452 )
...
* remove generate_single_summary
* update schemas
* remove unused import
* fix mypy
* fix mypy
* test: summarizer doesnt change content
* other test correction
* move test_summarizer_translation to test_extractor_translation
* fix test
* first try for doc merger
* reintroduce and deprecate generate_single_summary
* progress in document merger
* document merger!
* mypy, pylint fixes
* use generator
* added test that will fail in 1.12
* adapt to review
* extended deprecation docstring
* Update test/nodes/test_extractor_translation.py
* Update test/nodes/test_summarizer.py
* Update test/nodes/test_summarizer.py
* black
* documents fixture
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-11-03 16:04:53 +01:00
Massimiliano Pippi
0a04dec808
Use a dedicated PAT ( #3511 )
...
* Use a dedicated PAT
* Update project.yml
2022-11-03 12:58:01 +01:00
bogdankostic
fc551b90ac
docs: Fix link directing to tutorials in README.md ( #3516 )
2022-11-02 18:11:03 +01:00
Sara Zan
f0be78c6a6
bug: remove useless import in conftest.py ( #3362 )
...
* Remove useless milvus import in conftest
* schemas
* schemas
2022-11-02 19:22:24 +05:30
Stefano Fiorucci
4b0894f4c2
fix: support long texts for labels in ElasticsearchDocumentStore
( #3346 )
2022-11-02 11:16:36 +01:00
Sara Zan
b93bbb1cab
refactor: upgrade actions version ( #3506 )
...
* upgrade actions version
* upgrade cache action too
2022-11-02 10:35:10 +01:00
Sara Zan
bb1d9983b0
refactor: remove YAML save/load methods for subclasses of BaseStandardPipeline
( #3443 )
...
* remove methods & update docstring
* remove irrelevant test
2022-11-02 10:14:33 +01:00
Branden Chan
0b2e71daf6
feat: Create the TextIndexingPipeline ( #3473 )
...
* Add TextIndexingPipeline
* Run Black formatting
* Incorporate reviewer feedback
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-01 10:52:37 +01:00
bogdankostic
60224412bc
feat: Add headline extraction to ParsrConverter
( #3488 )
...
* Add headline extraction to ParsrConverter
* Add sample PDF file
* Add test
* Use extract_headlines if set in convert method
* Integrate PR feedback
2022-10-31 19:00:02 +01:00
Sara Zan
8ddeda811a
generate docs for search.engine.py ( #3507 )
2022-10-31 16:57:39 +01:00
Massimiliano Pippi
9fe2f69d56
add workflow to triage new issues with GH projects ( #3508 )
2022-10-31 16:01:59 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring ( #3449 )
...
* add new marker
* start using test hierarchies
* move ES tests into their own class
* refactor test workflow
* job steps
* add more tests
* move more tests
* more tests
* test labels
* add more tests
* Update tests.yml
* Update tests.yml
* fix
* typo
* fix es image tag
* map es ports
* try
* fix
* default port
* remove opensearch from the markers sorcery
* revert
* skip new tests in old jobs
* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Mayank Jobanputra
85cdc1040a
Added telemetry changes ( #3503 )
2022-10-31 12:49:52 +01:00
Sara Zan
adc982a624
fix: do not reference package directory in PDFToTextOCRConverter.convert()
( #3478 )
...
* remove weird temp path from PDFToTextOCRConverter.convert()
* remove debug lines
* remove os import
2022-10-31 12:48:43 +01:00
Massimiliano Pippi
17cd79e2c8
[release process] Create new schema when bumping unstable ( #3416 )
...
* also create new schema when bumping unstable version
* openapi schema
* no need to update the json schema anymore
2022-10-31 12:26:48 +01:00
Sara Zan
54cc9cd4cf
refactor: remove json-schemas
( #3485 )
...
* remove json-schemas
* main schema can be removed too
* add .gitignore to schemas folder
* try to explicitly get the new haystack in the rest api tests
* fix workflow again
* fix version string in rest api tests
* add pip freeze
* debug statements in workflow
* -U prevents schema generation
2022-10-31 11:24:43 +01:00
Massimiliano Pippi
b52ed52c4e
fix docker minimal deprecated image ( #3497 )
2022-10-28 16:46:48 +02:00
Sebastian
384663981d
Fixed bug in onnx converter for XLMRoberta architecture ( #3470 )
2022-10-28 15:35:53 +02:00
Massimiliano Pippi
9f4a9a76a3
fix: pattern to match tags push ( #3469 )
2022-10-28 14:52:30 +02:00
Sara Zan
823d0d3006
Add Schemas badge on README.md ( #3493 )
2022-10-28 13:57:42 +02:00
Sara Zan
a66e7caa34
feat: hatch-autorun
generates schemas ( #3484 )
...
* hatch-run generates the schemas
* fix path
* keep schemas for now
* fix path
* schemas
* Do not generate rc schemas
* make the autorun hook self-destroy
* typo
* schemas
* schemas were ok
* improve logs to make generate_schema.py usable standalone too
* fix warning
* Update warning
* Update generate_schema.py
* black
2022-10-28 13:55:11 +02:00
Massimiliano Pippi
1f9f4ab03a
fix: fix docs badge ( #3491 )
...
* fix: fix docs badge
* format
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-10-28 11:59:49 +02:00
Sara Zan
f377b78263
refactor: replace YAML schema check with a dispatch call ( #3482 )
...
* Replace yaml check with a dispatch call
* split workflow
* add branch for testing
* access secrets properly
* remove testing branch trigger
2022-10-28 10:48:59 +02:00
Sebastian
8db7dfb884
refactor: TableReader ( #3456 )
...
* Refactoring table reader
2022-10-26 20:57:28 +02:00
Sebastian
59857cb492
feat: Speed up reader tests ( #3476 )
...
* Use a smaller reader where possible
* Change scope to module of reader to get faster load times
2022-10-26 19:04:18 +02:00
Tuana Celik
a4002ae87c
Updating readme to point to new docs site ( #3336 )
...
* Updating readme to point to new docs site
* updating some links
* updating docs link
2022-10-26 17:28:46 +02:00
Sara Zan
dd774b867d
add missing schemas ( #3480 )
2022-10-26 17:27:42 +02:00
Sara Zan
05c68b6624
feat: add document_store
to all BaseRetriever.retrieve()
and BaseRetriever.retrieve_batch()
implementations ( #3379 )
...
* add document_store to retrieve()]
* mypy & pylint
* pass docstore to embedding encoders
* schemas
* mypy and pylint
* fix tfidfretriever
* pylint
* mypy
* pylint
* fix tfidf
* mypy
* pylint
* schemas
* another fix for tfidf
* fix question generation tests
* remove docstore from embedding encoder signature
* pylint
* revert accidental test changes
* Apply suggestions from code review
* check for docstore similarity function only if the docstore is present
* check for docstore similarity function only if the docstore is present
2022-10-26 15:47:06 +02:00
Julian Risch
d0691a4bd5
bug: replace decorator with counter attribute for pipeline event ( #3462 )
2022-10-26 12:09:04 +02:00
bogdankostic
4fbe80c098
feat: Extraction of headlines in markdown files ( #3445 )
...
* Extract headings from markdown files + adapt PreProcessor
* Add tests
* Fix mypy
* Generate JSON schema
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/file_converter/markdown.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply black
* Add PR feedback
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-10-26 11:57:55 +02:00
Vladimir Blagojevic
5ca96357ff
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever ( #3453 )
2022-10-25 17:52:29 +02:00