Julian Risch
efdcd24d70
fixed typo ( #1680 )
...
* fixed typo
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-01 10:38:39 +01:00
bogdankostic
9025615be7
Add TableQA tutorial ( #1670 )
...
* Add TableQA tutorial
* Add tutorial header
* Add latest docstring and tutorial changes
* Add more details
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-29 11:07:13 +02:00
Julian Risch
33b2663fdc
ensure tf-idf matrix calculation before retrieval ( #1665 )
...
* ensure tf-idf matrix calculation before retrieval
* Run fit() automatically if new documents have been added
* Add latest docstring and tutorial changes
* Fix type error
* Add test case for tfidf retriever yaml pipeline
* Use InMemoryDocStore and add 2nd test case
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-28 16:48:06 +02:00
Sara Zan
eab475bb5d
Rename every occurrence of 'embed_passages' with 'embed_documents' ( #1667 )
...
* Rename every occurrence of 'embed_passages' with 'embed_documents'
* Remove aliased method embed_documents
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-28 12:17:56 +02:00
bogdankostic
0c80ac9e62
Truncate too large tables for TableReader ( #1662 )
...
* Truncate too large tables for TableReader
* Add documentation
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-27 15:46:59 +02:00
Timo Moeller
1d3f63ac2e
Allow setting of scroll
param in ElasticsearchDocumentStore ( #1645 )
...
* remove scroll param in ES call
* Add scroll param to ES init
* Add latest docstring and tutorial changes
* Add scroll to set_config
* remove trailing comma
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-27 11:07:13 +02:00
Branden Chan
9b2f40100d
Replace Haystack banner for readme ( #1654 )
...
* Replace haystack banner for readme
* Replace haystack banner for readme
* Update README.md
* Crop image
* Update README.md
revert to image from master branch
2021-10-26 17:59:45 +02:00
Sara Zan
13510aa753
Refactoring of the haystack
package ( #1624 )
...
* Files moved, imports all broken
* Fix most imports and docstrings into
* Fix the paths to the modules in the API docs
* Add latest docstring and tutorial changes
* Add a few pipelines that were lost in the inports
* Fix a bunch of mypy warnings
* Add latest docstring and tutorial changes
* Create a file_classifier module
* Add docs for file_classifier
* Fixed most circular imports, now the REST API can start
* Add latest docstring and tutorial changes
* Tackling more mypy issues
* Reintroduce from FARM and fix last mypy issues hopefully
* Re-enable old-style imports
* Fix some more import from the top-level package in an attempt to sort out circular imports
* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests
* Change document_store into document_stores
* Update imports in tutorials
* Add latest docstring and tutorial changes
* Probably fixes summarizer tests
* Improve the old-style import allowing module imports (should work)
* Try to fix the docs
* Remove dedicated KnowledgeGraph page from autodocs
* Remove dedicated GraphRetriever page from autodocs
* Fix generate_docstrings.sh with an updated list of yaml files to look for
* Fix some more modules in the docs
* Fix the document stores docs too
* Fix a small issue on Tutorial14
* Add latest docstring and tutorial changes
* Add deprecation warning to old-style imports
* Remove stray folder and import Dict into dense.py
* Change import path for MLFlowLogger
* Add old loggers path to the import path aliases
* Fix debug output of convert_ipynb.py
* Fix circular import on BaseRetriever
* Missed one merge block
* re-run tutorial 5
* Fix imports in tutorial 5
* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base
* Add latest docstring and tutorial changes
* Fix typo in utils __init__
* Fix a few more imports
* Fix benchmarks too
* New-style imports in test_knowledge_graph
* Rollback setup.py
* Rollback squad_to_dpr too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
bogdankostic
51acf779f2
Add TableTextRetriever ( #1529 )
...
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* renam label.question -> label.query. Allow sorting of Answers.
* WIP primitives
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472 )
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* Add MultimodalRetriever
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add methods to MultimodalRetriever
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* keep meta data of answers in reader.run()
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* make document=None for open domain labels
* add import
* fix print utils
* fix rest api
* Add methods and tests
* Add latest docstring and tutorial changes
* Fix mypy
* Add latest docstring and tutorial changes
* Add type hints and doc strings
* Make use of initialize_device_settings
* Move serialization of pd.DataFrame to schema.py
* Fix mypy
* Adapt Document's from_dict method
* Update docstrings
* Add latest docstring and tutorial changes
* Fix mypy
* Fix mypy
* Fix Document's from_dict method
* Fix Document's to_dict method
* Change handling of table metadata
* Add latest docstring and tutorial changes
* Change naming from Multimodal to TableText
* Turn off tokenizers_parallelism in retriever tests
* Add latest docstring and tutorial changes
* Remove turning off tokenizers_parallelism in retriever tests
* Adapt convert_es_hit_to_document
* Change embed_surrounding_context to embed_meta_fields
* Add latest docstring and tutorial changes
* Add check if torch.distributed is available
* Set n_gpu to 0 in training test
* Set HIP_LAUNCH_BLOCKING to 1
* Set HIP_LAUNCH_BLOCKING to "1"
* Set use_gpu to False
* Use DataParallel only if more than one device
* Remove --find-links=https://download.pytorch.org/whl/torch_stable.html
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 12:27:02 +02:00
Julian Risch
6033319cfe
Fix parameter names in tutorial 5 and 12 ( #1639 )
...
* Fix parameter names in tutorial 5
* Update parameters in tutorial notebook
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-22 17:22:51 +02:00
Timo Moeller
9dc125df9d
Bugfix Tutorial 5 parameters, adjust default split length ( #1635 )
...
Bugfix parameters, adjust default split length, add sentencetransformers
2021-10-22 16:03:12 +02:00
Julian Risch
52e1fc991e
Update jobs link to personio ( #1611 )
...
* Update jobs link to personio
* Add latest docstring and tutorial changes
* Change jobs link to main website
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-21 11:42:32 +02:00
Julian Risch
f2a3f95ab6
add note on gpu runtime to tutorial 13 ( #1614 )
...
* add note on gpu runtime to tutorial 13
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-20 09:55:56 +02:00
Julian Risch
4ed2b90bca
Add delete_labels() except for weaviate doc store ( #1604 )
...
* Add delete_labels() except for weaviate doc store
* Add latest docstring and tutorial changes
* Add test for delete_labels()
* Adapt filter for label deletion to different doc stores in test
* Allow delete labels by _id in elasticsearch
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* re-add bugfix after merge
* Add ids as optional parameter
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-19 17:20:28 +02:00
Sara Zan
9722bbf1e1
DPR training: Rename TransformersAdamW
to AdamW
( #1613 )
...
* Rename TransformersAdamW into simply AdamW (probably changed in transformers at some point)
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-19 16:18:30 +02:00
Sara Zan
575e64333c
Delete documents by ID in all document stores ( #1606 )
...
* Modify BaseDocumentStore.delete_documents() signature, implement ElasticSearch, and add tests
* Add implementation for InMemory
* Implement for SQL, FAISS and Milvus too
* Add tests for faiss and milvus
* Fix delete_all_documents
* Implement deletion by ID for weaviate
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
Co-authored-by: prafgup <prafulgupta6@gmail.com>
Co-authored-by: ankh6 <andynzemokalumu@live.be>
2021-10-19 12:30:15 +02:00
Malte Pietsch
eb95f0e8aa
Add more flexible options for model downloads (Proxies, resume_download, local_files_only...) ( #1256 )
...
* allow passing more options for model/tokenizer download from remote
* temporarily change dependency to current farm master
* Add latest docstring and tutorial changes
* add kwargs
* add docstrings
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-18 15:47:36 +02:00
bogdankostic
655d721371
Add Table Reader ( #1446 )
...
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* Add first draft of TableReader
* renam label.question -> label.query. Allow sorting of Answers.
* Add calculation of answer scores
* WIP primitives
* Adapt input and output to new primitives
* Add doc strings
* Add tests
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472 )
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* Adapt answer format
* Add latest docstring and tutorial changes
* keep meta data of answers in reader.run()
* Fix mypy
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* Allow inference on GPU
* Remove automatic aggregation
* Add automatic aggregation
* Add latest docstring and tutorial changes
* Add torch-scatter dependency
* Add wheel to torch-scatter dependency
* Fix requirements
* Fix requirements
* Fix requirements
* Adapt setup.py to allow for wheels
* Fix requirements
* Fix requirements
* Add type hints and code snippet
* Add latest docstring and tutorial changes
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 16:34:48 +02:00
Julian Risch
5ec29a5283
Limit generator tests to memory doc store; split pipeline tests ( #1602 )
...
* Limit generator tests to memory doc store; split pipeline tests
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 15:37:46 +02:00
Malte Pietsch
99c8046367
Fix Tutorials ( #1594 )
...
* fix response format of DocumentSearchPipeline
* Add latest docstring and tutorial changes
* fix typos
* change prints in tutorial 4
* Add latest docstring and tutorial changes
* fix tutorial 13
* Add latest docstring and tutorial changes
* remove unused import
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-14 11:49:35 +02:00
Malte Pietsch
caba590576
Fix answer format in ui ( #1591 )
...
* fix answer format in ui
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 16:48:33 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document
, Answer
, Label
( #1398 )
...
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* renam label.question -> label.query. Allow sorting of Answers.
* WIP primitives
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472 )
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* keep meta data of answers in reader.run()
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* make document=None for open domain labels
* add import
* fix print utils
* fix rest api
* adress review feedback
* Add latest docstring and tutorial changes
* fix mypy
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Malte Pietsch
9650f7aed1
Add debug
and debug_logs
params to standard pipelines ( #1586 )
...
* add debug and debug_logs to standard pipelines
* Add latest docstring and tutorial changes
* fix params
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 16:00:48 +02:00
Sara Zan
6354528336
Add /documents/get_by_filters
endpoint ( #1580 )
...
* Add endpoint to get documents by filter
* Add test for /documents/get_by_filter and extend the delete documents test
* Add rest_api/file-upload to .gitignore
* Make sure the document store is empty for each test
* Improve docstrings of delete_documents_by_filters and get_documents_by_filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 10:53:54 +02:00
Malte Pietsch
38652dd4dd
Enable GPU usage for QuestionGenerator ( #1571 )
...
* enable GPU usage for question generator
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-08 12:17:48 +02:00
Sara Zan
54947cb840
Return intermediate nodes output in pipelines ( #1558 )
...
* First rough implementation
* Add a flag to dump the debug logs to the console as well
* Typing run() and _dispatch_run()
* Allow debug and debug_logs to be passed as arguments of run()
* Avoid overwriting _debug, later we might want to store other objects in it
* Put logs under a separate key of the _debug dictionary and add input and output of the node alongside it
* Introduce global arguments for pipeline.run() that get applied to every node when defined
* Change default values of debug variables to None, otherwise their default would override the params values
* Remove a potential infinite recursion on the overridden __getattr__
* Do not append the output of the last node in the _debug key, it causes infinite recursion
* Add tests
* Move the input/output collection into _dispatch_run to gather only relevant info
* Add partial Pipeline.run() docstring
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-07 22:13:25 +02:00
Julian Risch
7e063b77d2
Format doc classifier usage example ( #1550 )
...
* Format doc classifier usage example
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 15:01:19 +02:00
Julian Risch
24483d7bad
TransformersDocumentClassifier replacing FARMClassifier ( #1540 )
...
* Initial draft of TransformersClassifier
* Add transformers classifier implementation
* Add test for SentenceTransformersClassifier
* Add truncation and corresponding test case to Classifier
* Add zero-shot classification and test
* Add document classifier documentation
* Add latest docstring and tutorial changes
* print meta data with print_documents()
* Add latest docstring and tutorial changes
* Remove top_k param from Classifier usage example
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 11:22:56 +02:00
Julian Risch
0e7338f0c6
Remove mentions of FARM from Ranker comments ( #1535 )
...
* Remove mentions of FARM from Ranker comments
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-29 11:57:30 +02:00
Sara Zan
a30a826c6c
Standardize delete_documents(filter=...)
across all document stores ( #1509 )
...
* Make InMemoryDocumentStore accept and apply filters in delete_documents()
* Modify test_document_store.py to test the filtered deletion in memory, sql and milvus too
* Make FAISSDocumentStore accept and properly apply filters in delete_documents()
* Add latest docstring and tutorial changes
* Remove accidentally duplicated test
* Remove unnecessary decorators from test/test_document_store.py::test_delete_documents_with_filters
* Add embeddings count test for FAISS and Milvus; Milvus fails it.
* Fixed a bug that made Milvus not deleting embeddings
* Remove batch size parametrization in tests & update all documentstore's docstrings with a filter example
* Add latest docstring and tutorial changes
Co-authored-by: prafgup <prafulgupta6@gmail.com>
2021-09-29 09:27:06 +02:00
Julian Risch
f9d2f786ca
Replace FARM import statements; add dependencies ( #1492 )
...
* Replace FARM import statements; add dependencies
* Add InferenceProc., TextCl.Proc., TextPairCl.Proc.
* Remove FARMRanker, add type annotations, rename max_sample
* Add sample_to_features_text for InferenceProc.
* Fix type annotations: model_name_or_path is str not Path
* Fix mypy errors: implement _create_dataset in TextCl.Proc.
* Add task_type "embeddings" in Inferencer
* Allow loading AdaptiveModel for embedding task
* Add SQuAD eval metrics; enable InferenceProc for embedding task
* Add baskets as param to log_samples and handle empty basket list in log_samples
* Remove unused dependencies
* Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead
* Remove FARMRanker and Classifier from doc generation scripts
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-28 16:34:24 +02:00
Malte Pietsch
183fd5ae5a
Simplify tests & allow running on individual doc stores ( #1487 )
...
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* update readme and contributing.md
* update contributing
* adjust example
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-27 10:52:07 +02:00
bogdankostic
c644e2b4d0
Add comment to tutorial notebooks about restarting runtime in colab ( #1486 )
...
* Add comment to tutorial notebooks about restarting runtime in colab
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-23 14:36:20 +02:00
Julian Risch
d569e66bc7
Update Tutorial1_Basic_QA_Pipeline.ipynb ( #1489 )
...
* Update Tutorial1_Basic_QA_Pipeline.ipynb
passing params to pipeline as dict
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-22 16:35:20 +02:00
Branden Chan
bddee2def4
Define SAS model in notebook ( #1485 )
...
* Define SAS model in notebook
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 17:05:16 +02:00
Branden Chan
2c4baa7f4e
Regenerate API and Tutorial md files ( #1480 )
...
* Change punctuation
* Add latest docstring and tutorial changes
* Change punctuation
* Add documentation for Docs2Answer
* Add latest docstring and tutorial changes
* Generate new API docs
* Replace Finder with Pipeline
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 14:42:18 +02:00
Markus Paff
39845c0624
Automate updates docstrings tutorials ( #1461 )
...
* remove not needed githab actions and reactivate docstrings and tutorial generation
* test workflow
* update pydoc version
* update python version
* update watchdog
* move to latest version pydoc-markdown
* remove version check
* Add latest docstring and tutorial changes
* remove test workflow
* test for param docstrings
* pin pydoc-markdown version
* add test workflow
* pin watchdog version
* Add latest docstring and tutorial changes
* update original workflow and delete test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-17 13:44:31 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components ( #1321 )
2021-09-10 11:41:16 +02:00
Bob van Luijt
c0cc8bc80f
Bump Weaviate version to 1.7.0 ( #1412 )
...
* Bump Weaviate
* Bump Weaviate
* Bump Weaviate client
* Bump Weaviate
* Revert client version
There is a change in the client API that needs to be addressed before bumping its version
2021-09-05 09:28:55 +02:00
Ikram Ali
3fc7f3f695
[docs] crawler api docs updated. ( #1388 )
2021-09-01 12:07:32 +02:00
Branden Chan
1938fb001b
Add support for no Docker envs in Tutorial 13 ( #1365 )
...
* Add support for no docker envs e.g. colab
* Generate md
2021-08-31 15:22:51 +02:00
Shahrukh Khan
c3d8aa0643
Add query classifier usage docs ( #1348 )
...
* Create query_classifier.md
* Update query_classifier.md
* Update query_classifier.md
* Update query_classifier.md
* Update query_classifier.md
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-24 15:56:11 +02:00
Markus Paff
cac15310bd
adding tutorial 13 and 14 ( #1364 )
2021-08-23 11:37:06 +02:00
Markus Paff
ff2049cd45
updated tutorials ( #1359 )
2021-08-19 21:16:56 +02:00
Bob van Luijt
ba071cc052
Bump Weaviate version ( #1336 )
2021-08-12 09:54:09 +02:00
Markus Paff
7569ab97dd
Add faq annotation ( #1333 )
...
* add annotation faq to read.me
* design fix
* add faq to docs page
* changed format
2021-08-10 14:55:31 +02:00
Malte Pietsch
a0921f0c35
Remove Finder
( #1326 )
...
* deprecate finder
* remove import
* add doc section for moving from finder to pipelines
2021-08-09 13:41:40 +02:00
Branden Chan
937247d628
Add QuestionGenerator ( #1267 )
...
* Create basic Question Generation
* Split texts into 50 word chunks
* Allow prompt to be changed
* Implement iteration functionality in DS
* Add docstrings, create pipelines
* Make pipelines work
* Add comments
* Add tests
* Add tutorials and docs
* Add doc string
2021-07-26 17:20:43 +02:00
Branden Chan
363be65a78
Implement OpenSearch ANN ( #1225 )
...
* Simplify ODES init
* Add arguments to ES init and create script
* Rename similarity_fn_name and add util fn
* Create OpenSearchDocumentStore
* Specify params of Open Search HNSW
* Add better argument handling
* Update opensearch index mapping
* Edit opensearch default port
* Fix HNSW mapping
* Force small HNSW params
* Implement auto start and stopping of document store services
* Fix starting and stopping of ds service
* Restore HNSW params
* Add opensearch query benchmarks
* Add write wait time
* Revert wait time
* Add timeout
* Update benchmarks
* Update benchmarks
* Update benchmarks json
* Update documentation
* Update documentation
* Fix similarity name
* Improve argument passing
* Improve stopping and starting of service
2021-07-26 10:52:52 +02:00
Bob van Luijt
8dae844447
Bump Weaviate version to 1.5 ( #1287 )
...
* bump Weaviate version to 1.5
* bump Weaviate version to 1.5
2021-07-15 08:26:22 +02:00