Tuana Celik
0771cf1cce
Update CONTRIBUTING.md ( #3624 )
2022-11-24 13:59:49 +00:00
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore tests to their own class ( #3614 )
...
* move tests to their own class
* move more tests
* add specific job
* fix test
* Update test/document_stores/test_memory.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
tstadel
0e05f71f33
fix return type typing ( #3617 )
2022-11-23 10:23:20 +01:00
Stefano Fiorucci
f43bc562d3
refactor: replace torch.no_grad with torch.inference_mode (where possible) ( #3601 )
...
* try to replace torch.no_grad
* revert erroneous change
* revert other module breaking
* revert training/base
2022-11-23 09:26:11 +01:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever in InMemoryDocumentStore ( #3561 )
...
* very first draft
* implement query and query_batch
* add more bm25 parameters
* add rank_bm25 dependency
* fix mypy
* remove tokenizer callable parameter
* remove unused import
* only json serializable attributes
* try to fix: pylint too-many-public-methods / R0904
* bm25 attribute always present
* convert errors into warnings to make the tutorial 1 work
* add docstrings; tests
* try to make tests run
* better docstrings; revert not running tests
* some suggestions from review
* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25
* exclude tests with filters
* change elasticsearch to bm25 retriever in test_summarizer
* add tests
* try to improve tests
* better type hint
* adapt test_table_text_retriever_embedding
* handle non-textual docs
* query only textual documents
2022-11-22 09:24:52 +01:00
tstadel
0d45cbce56
convert eval metrics to python float ( #3612 )
2022-11-22 09:05:10 +01:00
Massimiliano Pippi
2fadcf2859
add labeler to the repo ( #3609 )
2022-11-21 20:49:25 +05:30
Tuana Celik
78ec528e26
Url fixes ( #3592 )
...
* add 2 example scripts
* fixing faq script
* fixing some urls
* removing example scripts
* black reformatting
2022-11-21 11:26:16 +01:00
Espoir Murhabazi
d114a994f1
refactor: update Squad data ( #3513 )
...
* refractor the to_squad data class
* fix the validation label
* refractor the to_squad data class
* fix the validation label
* add the test for the to_label object function
* fix the tests for to_label_objects
* move all the test related to squad data to one file
* remove unused imports
* revert tiny_augmented.json
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-21 11:06:14 +01:00
Stefano Fiorucci
5f62494105
fix: ParsrConverter fails on pages without text ( #3605 )
...
* try to fix bug
* remove print
* leftover
2022-11-21 10:54:40 +01:00
Massimiliano Pippi
7e0aa82eb8
Update Python version ( #3602 )
2022-11-21 10:16:47 +01:00
Branden Chan
f85ead431a
Update Haystack imports ( #3599 )
2022-11-21 10:15:57 +01:00
Massimiliano Pippi
c7e3483f62
Pin faiss-cpu as 1.7.3 seems to have problems ( #3603 )
2022-11-18 09:18:24 +01:00
Massimiliano Pippi
ea75e2aab5
feat: store metadata using JSON in SQLDocumentStore ( #3547 )
...
* add warnings
* make the field cachable
* review comment
2022-11-18 08:26:19 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module ( #3596 )
2022-11-17 16:22:02 +01:00
Massimiliano Pippi
6cd0e337d0
refactor: Generate JSON schema when missing ( #3533 )
...
* removed unused script
* print info logs when generating openapi schema
* create json schema only when needed
* fix tests
* Remove leftover
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-17 11:09:27 +01:00
Julian Risch
8052632b64
test: add test to check id_hash_keys is not ignored ( #3577 )
2022-11-17 09:25:02 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier output in SQLDocumentStore; remove _sql_session_rollback hack in tests ( #3273 )
...
* first draft
* fix
* fix
* move test to test_sql
2022-11-16 12:20:57 +01:00
github-actions[bot]
af78f8b431
Update unstable version and openapi schema ( #3584 )
...
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-11-16 10:09:40 +01:00
Massimiliano Pippi
ba75d39029
fix: discard metadata fields if not set in Weaviate ( #3578 )
...
* fix weaviate bug in returning embeddings and setting empty meta fields
* review comment
2022-11-15 22:02:53 +01:00
tstadel
6ce2d296f4
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta ( #3572 )
...
* fix meta bug
* adjust brownfield test
2022-11-15 12:13:21 +01:00
Mayank Jobanputra
3098440a27
bug: fix release number ( #3559 )
...
* Added haystack version in docker base build
* test version -- name's bond
2022-11-15 16:31:10 +05:30
Massimiliano Pippi
0c1de3745d
fix milvus imports ( #3576 )
2022-11-15 10:58:51 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" ( #3548 )
...
* add_all fixes the bug
* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x ( #3552 )
...
* remove milvus1
* leftover
* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests ( #3555 )
...
* add pytest option to unmock pinecone
* first try
* handle missing answer
* fix labels metadata
* more tests
* adapt workflow
* typo
* address review comments
2022-11-14 15:19:15 +01:00
ju-gu
559730649b
fix: [rest_api] support TableQA in the endpoint /documents/get_by_filters ( #3551 )
...
* Update schema.py
* Update document.py
2022-11-14 15:09:14 +01:00
Massimiliano Pippi
ef558fa5e3
ignore proposals in release notes ( #3564 )
2022-11-14 14:42:27 +01:00
Massimiliano Pippi
7af22cd98c
CI: install httpx to run tests ( #3565 )
...
* install httpx to run tests
* try
2022-11-14 12:52:04 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests ( #3541 )
...
* refactor tests
* fix job
* revert
* revert
* revert
* use latest weaviate
* fix abstract methods signatures
* pass class_name to all the CRUD methods
* finish moving all the tests
* bump weaviate version
* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
da6b0dc66f
feat: introduce proposal design process ( #3333 )
...
* add RFC process
* migrate old ADR to the new process
* typo
* review comments
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* [skip ci] review feedback
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* [skip ci] leftover
* rename to proposals
* Adjust naming
* Update 2170-pydantic-dataclasses.md
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2022-11-11 12:49:23 +01:00
Vladimir Blagojevic
005025bd14
feat: include error message in HaystackError telemetry events( #3543 )
...
* Telemetry: add message to all HaystackError(s)
* do not send message for DocumentStoreError and API node errors
* change default payload from None to {}
* add default send_message_in_event to NodeError and type annotations
* black
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-11-10 12:00:37 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests ( #3537 )
...
* fix write docs behaviour
* refactor FAISS tests
* do not remove the sqlite db
* try
* remove extra slash
* Apply suggestions from code review
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_faiss.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Sara Zan
9539a209ae
refactor: apply pep-484 ( #3542 )
...
* apply pep-484
* another implicit optional
* apply pep-484 on rest_api and ui too
2022-11-08 14:30:33 +01:00
Sara Zan
43b24fd1a7
fix: strip whitespaces safely from FARMReader's answers ( #3526 )
...
* remove .strip()
* check for right-side offset
* return the whitespace-cleaned answer
* lstrip, not rstrip :D
* remove int
* left_offset
* slightly refactor reader fixture
* extend test_output
2022-11-08 09:26:47 +01:00
Branden Chan
e6b7109164
docs: Update docker readme ( #3531 )
...
* Update docker readme
* Make language changes
2022-11-08 09:06:18 +01:00
Massimiliano Pippi
af96e002a4
merge black job into testing workflow ( #3539 )
2022-11-07 20:01:02 +05:30
Mayank Jobanputra
794fe5ffa4
bug: didn't clean up model files after running pytest for test_table_text_retriever_training ( #3534 )
...
* Added tmp path to avoid clean up of model files later
2022-11-07 15:07:04 +05:30
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job ( #3529 )
...
* move dC tests to their own module and job
* restore global var
* revert
2022-11-04 17:05:10 +01:00
Sara Zan
815017ad5b
Deploy the demo only manually ( #3525 )
2022-11-04 12:15:58 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests ( #3517 )
...
* port SQL tests
* cleanup document_store_tests.py from sql tests
* leftover
* Update .github/workflows/tests.yml
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_base.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Stefano Fiorucci
1a60e21137
refactor: simplify Summarizer, add Document Merger ( #3452 )
...
* remove generate_single_summary
* update schemas
* remove unused import
* fix mypy
* fix mypy
* test: summarizer doesnt change content
* other test correction
* move test_summarizer_translation to test_extractor_translation
* fix test
* first try for doc merger
* reintroduce and deprecate generate_single_summary
* progress in document merger
* document merger!
* mypy, pylint fixes
* use generator
* added test that will fail in 1.12
* adapt to review
* extended deprecation docstring
* Update test/nodes/test_extractor_translation.py
* Update test/nodes/test_summarizer.py
* Update test/nodes/test_summarizer.py
* black
* documents fixture
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-11-03 16:04:53 +01:00
Massimiliano Pippi
0a04dec808
Use a dedicated PAT ( #3511 )
...
* Use a dedicated PAT
* Update project.yml
2022-11-03 12:58:01 +01:00
bogdankostic
fc551b90ac
docs: Fix link directing to tutorials in README.md ( #3516 )
2022-11-02 18:11:03 +01:00
Sara Zan
f0be78c6a6
bug: remove useless import in conftest.py ( #3362 )
...
* Remove useless milvus import in conftest
* schemas
* schemas
2022-11-02 19:22:24 +05:30
Stefano Fiorucci
4b0894f4c2
fix: support long texts for labels in ElasticsearchDocumentStore ( #3346 )
2022-11-02 11:16:36 +01:00
Sara Zan
b93bbb1cab
refactor: upgrade actions version ( #3506 )
...
* upgrade actions version
* upgrade cache action too
2022-11-02 10:35:10 +01:00
Sara Zan
bb1d9983b0
refactor: remove YAML save/load methods for subclasses of BaseStandardPipeline ( #3443 )
...
* remove methods & update docstring
* remove irrelevant test
2022-11-02 10:14:33 +01:00
Branden Chan
0b2e71daf6
feat: Create the TextIndexingPipeline ( #3473 )
...
* Add TextIndexingPipeline
* Run Black formatting
* Incorporate reviewer feedback
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-01 10:52:37 +01:00
bogdankostic
60224412bc
feat: Add headline extraction to ParsrConverter ( #3488 )
...
* Add headline extraction to ParsrConverter
* Add sample PDF file
* Add test
* Use extract_headlines if set in convert method
* Integrate PR feedback
2022-10-31 19:00:02 +01:00