Sebastian
25bf95d47f
Update table reader tests to include checking the score of answers. ( #3641 )
2022-12-07 07:30:49 -08:00
Sara Zan
fc89f6ea74
fix: revert Weaviate query with filters and improve tests ( #3646 )
...
* revert weaviate query with filters and improve tests
* pylint
* upgrade weaviate container
* use latest docker tag
* fix text
* fix text
2022-12-06 14:48:58 +01:00
Vladimir Blagojevic
e4c3817d01
Adjust get_type() method for pipelines ( #3657 )
2022-12-02 14:48:47 +01:00
Julian Risch
adb580b6b7
feat: add offsets_in_context to evaluation result ( #3640 )
...
* add offsets_in_context to eval result
* extend test case
2022-11-30 11:43:42 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class ( #3637 )
...
* move more tests to the base class
* skip tests where unsupported
* do not pass index label explicitly
* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Mayank Jobanputra
95cf666a20
refactor: change MultiModal retriever to be of type DenseRetriever ( #3598 )
...
* changed Multimodal retriever to be of type DenseRetriever
* format fix
* Pylint fix
* Added embed_queries and tests
2022-11-28 19:24:22 +01:00
Massimiliano Pippi
6f9a0f2215
use 9200 as the default port in launch_opensearch ( #3630 )
2022-11-28 19:06:45 +05:30
Sara Zan
eb7b9452d0
refactor: Weaviate query with filters ( #3628 )
2022-11-28 12:26:33 +01:00
Massimiliano Pippi
c6890c3e86
chore: remove redundant tests ( #3620 )
...
* remove redundant tests
* skip test on win
* fix missing import
* revert mistake
* revert
2022-11-25 20:55:21 +05:30
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore
tests to their own class ( #3614 )
...
* move tests to their own class
* move more tests
* add specific job
* fix test
* Update test/document_stores/test_memory.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
Stefano Fiorucci
f43bc562d3
refactor: replace torch.no_grad
with torch.inference_mode
(where possible) ( #3601 )
...
* try to replace torch.no_grad
* revert erroneous change
* revert other module breaking
* revert training/base
2022-11-23 09:26:11 +01:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever
in InMemoryDocumentStore
( #3561 )
...
* very first draft
* implement query and query_batch
* add more bm25 parameters
* add rank_bm25 dependency
* fix mypy
* remove tokenizer callable parameter
* remove unused import
* only json serializable attributes
* try to fix: pylint too-many-public-methods / R0904
* bm25 attribute always present
* convert errors into warnings to make the tutorial 1 work
* add docstrings; tests
* try to make tests run
* better docstrings; revert not running tests
* some suggestions from review
* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25
* exclude tests with filters
* change elasticsearch to bm25 retriever in test_summarizer
* add tests
* try to improve tests
* better type hint
* adapt test_table_text_retriever_embedding
* handle non-textual docs
* query only textual documents
2022-11-22 09:24:52 +01:00
tstadel
0d45cbce56
convert eval metrics to python float ( #3612 )
2022-11-22 09:05:10 +01:00
Espoir Murhabazi
d114a994f1
refactor: update Squad data ( #3513 )
...
* refractor the to_squad data class
* fix the validation label
* refractor the to_squad data class
* fix the validation label
* add the test for the to_label object function
* fix the tests for to_label_objects
* move all the test related to squad data to one file
* remove unused imports
* revert tiny_augmented.json
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-21 11:06:14 +01:00
Massimiliano Pippi
ea75e2aab5
feat: store metadata using JSON in SQLDocumentStore ( #3547 )
...
* add warnings
* make the field cachable
* review comment
2022-11-18 08:26:19 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module ( #3596 )
2022-11-17 16:22:02 +01:00
Massimiliano Pippi
6cd0e337d0
refactor: Generate JSON schema when missing ( #3533 )
...
* removed unused script
* print info logs when generating openapi schema
* create json schema only when needed
* fix tests
* Remove leftover
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-11-17 11:09:27 +01:00
Julian Risch
8052632b64
test: add test to check id_hash_keys is not ignored ( #3577 )
2022-11-17 09:25:02 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier
output in SQLDocumentStore
; remove _sql_session_rollback
hack in tests ( #3273 )
...
* first draft
* fix
* fix
* move test to test_sql
2022-11-16 12:20:57 +01:00
Massimiliano Pippi
ba75d39029
fix: discard metadata fields if not set in Weaviate ( #3578 )
...
* fix weaviate bug in returning embeddings and setting empty meta fields
* review comment
2022-11-15 22:02:53 +01:00
tstadel
6ce2d296f4
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta ( #3572 )
...
* fix meta bug
* adjust brownfield test
2022-11-15 12:13:21 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" ( #3548 )
...
* add_all fixes the bug
* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x ( #3552 )
...
* remove milvus1
* leftover
* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests ( #3555 )
...
* add pytest option to unmock pinecone
* first try
* handle missing answer
* fix labels metadata
* more tests
* adapt workflow
* typo
* address review comments
2022-11-14 15:19:15 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests ( #3541 )
...
* refactor tests
* fix job
* revert
* revert
* revert
* use latest weaviate
* fix abstract methods signatures
* pass class_name to all the CRUD methods
* finish moving all the tests
* bump weaviate version
* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests ( #3537 )
...
* fix write docs behaviour
* refactor FAISS tests
* do not remove the sqlite db
* try
* remove extra slash
* Apply suggestions from code review
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_faiss.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Sara Zan
43b24fd1a7
fix: strip whitespaces safely from FARMReader
's answers ( #3526 )
...
* remove .strip()
* check for right-side offset
* return the whitespace-cleaned answer
* lstrip, not rstrip :D
* remove int
* left_offset
* slightly refactor reader fixture
* extend test_output
2022-11-08 09:26:47 +01:00
Mayank Jobanputra
794fe5ffa4
bug: didn't clean up model files after running pytest for test_table_text_retriever_training ( #3534 )
...
* Added tmp path to avoid clean up of model files later
2022-11-07 15:07:04 +05:30
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job ( #3529 )
...
* move dC tests to their own module and job
* restore global var
* revert
2022-11-04 17:05:10 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests ( #3517 )
...
* port SQL tests
* cleanup document_store_tests.py from sql tests
* leftover
* Update .github/workflows/tests.yml
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_base.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Stefano Fiorucci
1a60e21137
refactor: simplify Summarizer, add Document Merger ( #3452 )
...
* remove generate_single_summary
* update schemas
* remove unused import
* fix mypy
* fix mypy
* test: summarizer doesnt change content
* other test correction
* move test_summarizer_translation to test_extractor_translation
* fix test
* first try for doc merger
* reintroduce and deprecate generate_single_summary
* progress in document merger
* document merger!
* mypy, pylint fixes
* use generator
* added test that will fail in 1.12
* adapt to review
* extended deprecation docstring
* Update test/nodes/test_extractor_translation.py
* Update test/nodes/test_summarizer.py
* Update test/nodes/test_summarizer.py
* black
* documents fixture
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-11-03 16:04:53 +01:00
Sara Zan
f0be78c6a6
bug: remove useless import in conftest.py ( #3362 )
...
* Remove useless milvus import in conftest
* schemas
* schemas
2022-11-02 19:22:24 +05:30
Stefano Fiorucci
4b0894f4c2
fix: support long texts for labels in ElasticsearchDocumentStore
( #3346 )
2022-11-02 11:16:36 +01:00
Sara Zan
bb1d9983b0
refactor: remove YAML save/load methods for subclasses of BaseStandardPipeline
( #3443 )
...
* remove methods & update docstring
* remove irrelevant test
2022-11-02 10:14:33 +01:00
bogdankostic
60224412bc
feat: Add headline extraction to ParsrConverter
( #3488 )
...
* Add headline extraction to ParsrConverter
* Add sample PDF file
* Add test
* Use extract_headlines if set in convert method
* Integrate PR feedback
2022-10-31 19:00:02 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring ( #3449 )
...
* add new marker
* start using test hierarchies
* move ES tests into their own class
* refactor test workflow
* job steps
* add more tests
* move more tests
* more tests
* test labels
* add more tests
* Update tests.yml
* Update tests.yml
* fix
* typo
* fix es image tag
* map es ports
* try
* fix
* default port
* remove opensearch from the markers sorcery
* revert
* skip new tests in old jobs
* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Sebastian
384663981d
Fixed bug in onnx converter for XLMRoberta architecture ( #3470 )
2022-10-28 15:35:53 +02:00
Sebastian
8db7dfb884
refactor: TableReader ( #3456 )
...
* Refactoring table reader
2022-10-26 20:57:28 +02:00
Sebastian
59857cb492
feat: Speed up reader tests ( #3476 )
...
* Use a smaller reader where possible
* Change scope to module of reader to get faster load times
2022-10-26 19:04:18 +02:00
Sara Zan
05c68b6624
feat: add document_store
to all BaseRetriever.retrieve()
and BaseRetriever.retrieve_batch()
implementations ( #3379 )
...
* add document_store to retrieve()]
* mypy & pylint
* pass docstore to embedding encoders
* schemas
* mypy and pylint
* fix tfidfretriever
* pylint
* mypy
* pylint
* fix tfidf
* mypy
* pylint
* schemas
* another fix for tfidf
* fix question generation tests
* remove docstore from embedding encoder signature
* pylint
* revert accidental test changes
* Apply suggestions from code review
* check for docstore similarity function only if the docstore is present
* check for docstore similarity function only if the docstore is present
2022-10-26 15:47:06 +02:00
Julian Risch
d0691a4bd5
bug: replace decorator with counter attribute for pipeline event ( #3462 )
2022-10-26 12:09:04 +02:00
bogdankostic
4fbe80c098
feat: Extraction of headlines in markdown files ( #3445 )
...
* Extract headings from markdown files + adapt PreProcessor
* Add tests
* Fix mypy
* Generate JSON schema
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/nodes/file_converter/markdown.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply black
* Add PR feedback
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-10-26 11:57:55 +02:00
Vladimir Blagojevic
5ca96357ff
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever ( #3453 )
2022-10-25 17:52:29 +02:00
Stefano Fiorucci
54ec13eaf7
refactor: Change no_answer
attribute ( #3411 )
...
* always run validation
* update schemas
* no_answer as a property. break things!
* forgotten schema
* fix
* update openapi
* removed my unnecessary test
* fix sql document store
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-10-25 13:07:00 +02:00
Mayank Jobanputra
d48577b4e7
bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings
workflow ( #3368 )
...
* Removed explicit passage formatting by name field
* passing correct input type for embedding the docs
* Updated test, updated similarity scores and added results
* changed expected input to embed method
2022-10-25 14:52:05 +05:30
Sara Zan
cbf44413d8
feat: add __cointains__
to Span
( #3446 )
...
* add __contains__
* add tests
2022-10-21 13:58:17 +02:00
Vladimir Blagojevic
8f31228211
feat: Add exponential backoff decorator; apply it to OpenAI requests ( #3398 )
2022-10-19 17:47:38 +02:00
Ursin Brunner
5fedfb03b0
fix: Fix the error of wrong page numbers when documents contain empty pages. ( #3330 )
...
* Fix the error of wrong page numbers when documents contain empty pages.
* Reformat using git hooks.
* Use a more descriptive placeholder
2022-10-18 17:51:02 +02:00
Sebastian
93817f63b4
feat: Speed up integration tests (nodes) ( #3408 )
...
* Changed summarizer model to a smaller one (2GB to 500MB) to save on space and speed up the tests.
* Removed google pegasus from cache
2022-10-18 16:23:57 +02:00
Sebastian
15a59fd040
feat: Updated EntityExtractor to handle long texts and added better postprocessing ( #3154 )
...
* Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class
* Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library.
* Fixed text squishing problem. Added additional unit test for it.
Co-authored-by: ju-gu <julian.gutsch@deepset.ai>
2022-10-17 21:26:44 +02:00