Julian Risch
4ed2b90bca
Add delete_labels() except for weaviate doc store ( #1604 )
...
* Add delete_labels() except for weaviate doc store
* Add latest docstring and tutorial changes
* Add test for delete_labels()
* Adapt filter for label deletion to different doc stores in test
* Allow delete labels by _id in elasticsearch
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* re-add bugfix after merge
* Add ids as optional parameter
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-19 17:20:28 +02:00
Sara Zan
9722bbf1e1
DPR training: Rename TransformersAdamW
to AdamW
( #1613 )
...
* Rename TransformersAdamW into simply AdamW (probably changed in transformers at some point)
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-19 16:18:30 +02:00
Sara Zan
575e64333c
Delete documents by ID in all document stores ( #1606 )
...
* Modify BaseDocumentStore.delete_documents() signature, implement ElasticSearch, and add tests
* Add implementation for InMemory
* Implement for SQL, FAISS and Milvus too
* Add tests for faiss and milvus
* Fix delete_all_documents
* Implement deletion by ID for weaviate
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
Co-authored-by: prafgup <prafulgupta6@gmail.com>
Co-authored-by: ankh6 <andynzemokalumu@live.be>
2021-10-19 12:30:15 +02:00
Malte Pietsch
eb95f0e8aa
Add more flexible options for model downloads (Proxies, resume_download, local_files_only...) ( #1256 )
...
* allow passing more options for model/tokenizer download from remote
* temporarily change dependency to current farm master
* Add latest docstring and tutorial changes
* add kwargs
* add docstrings
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-18 15:47:36 +02:00
bogdankostic
655d721371
Add Table Reader ( #1446 )
...
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* Add first draft of TableReader
* renam label.question -> label.query. Allow sorting of Answers.
* Add calculation of answer scores
* WIP primitives
* Adapt input and output to new primitives
* Add doc strings
* Add tests
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472 )
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* Adapt answer format
* Add latest docstring and tutorial changes
* keep meta data of answers in reader.run()
* Fix mypy
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* Allow inference on GPU
* Remove automatic aggregation
* Add automatic aggregation
* Add latest docstring and tutorial changes
* Add torch-scatter dependency
* Add wheel to torch-scatter dependency
* Fix requirements
* Fix requirements
* Fix requirements
* Adapt setup.py to allow for wheels
* Fix requirements
* Fix requirements
* Add type hints and code snippet
* Add latest docstring and tutorial changes
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 16:34:48 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document
, Answer
, Label
( #1398 )
...
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* renam label.question -> label.query. Allow sorting of Answers.
* WIP primitives
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472 )
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* keep meta data of answers in reader.run()
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* make document=None for open domain labels
* add import
* fix print utils
* fix rest api
* adress review feedback
* Add latest docstring and tutorial changes
* fix mypy
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Malte Pietsch
9650f7aed1
Add debug
and debug_logs
params to standard pipelines ( #1586 )
...
* add debug and debug_logs to standard pipelines
* Add latest docstring and tutorial changes
* fix params
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 16:00:48 +02:00
Sara Zan
6354528336
Add /documents/get_by_filters
endpoint ( #1580 )
...
* Add endpoint to get documents by filter
* Add test for /documents/get_by_filter and extend the delete documents test
* Add rest_api/file-upload to .gitignore
* Make sure the document store is empty for each test
* Improve docstrings of delete_documents_by_filters and get_documents_by_filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 10:53:54 +02:00
Malte Pietsch
38652dd4dd
Enable GPU usage for QuestionGenerator ( #1571 )
...
* enable GPU usage for question generator
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-08 12:17:48 +02:00
Sara Zan
54947cb840
Return intermediate nodes output in pipelines ( #1558 )
...
* First rough implementation
* Add a flag to dump the debug logs to the console as well
* Typing run() and _dispatch_run()
* Allow debug and debug_logs to be passed as arguments of run()
* Avoid overwriting _debug, later we might want to store other objects in it
* Put logs under a separate key of the _debug dictionary and add input and output of the node alongside it
* Introduce global arguments for pipeline.run() that get applied to every node when defined
* Change default values of debug variables to None, otherwise their default would override the params values
* Remove a potential infinite recursion on the overridden __getattr__
* Do not append the output of the last node in the _debug key, it causes infinite recursion
* Add tests
* Move the input/output collection into _dispatch_run to gather only relevant info
* Add partial Pipeline.run() docstring
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-07 22:13:25 +02:00
Julian Risch
7e063b77d2
Format doc classifier usage example ( #1550 )
...
* Format doc classifier usage example
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 15:01:19 +02:00
Julian Risch
24483d7bad
TransformersDocumentClassifier replacing FARMClassifier ( #1540 )
...
* Initial draft of TransformersClassifier
* Add transformers classifier implementation
* Add test for SentenceTransformersClassifier
* Add truncation and corresponding test case to Classifier
* Add zero-shot classification and test
* Add document classifier documentation
* Add latest docstring and tutorial changes
* print meta data with print_documents()
* Add latest docstring and tutorial changes
* Remove top_k param from Classifier usage example
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 11:22:56 +02:00
Julian Risch
0e7338f0c6
Remove mentions of FARM from Ranker comments ( #1535 )
...
* Remove mentions of FARM from Ranker comments
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-29 11:57:30 +02:00
Sara Zan
a30a826c6c
Standardize delete_documents(filter=...)
across all document stores ( #1509 )
...
* Make InMemoryDocumentStore accept and apply filters in delete_documents()
* Modify test_document_store.py to test the filtered deletion in memory, sql and milvus too
* Make FAISSDocumentStore accept and properly apply filters in delete_documents()
* Add latest docstring and tutorial changes
* Remove accidentally duplicated test
* Remove unnecessary decorators from test/test_document_store.py::test_delete_documents_with_filters
* Add embeddings count test for FAISS and Milvus; Milvus fails it.
* Fixed a bug that made Milvus not deleting embeddings
* Remove batch size parametrization in tests & update all documentstore's docstrings with a filter example
* Add latest docstring and tutorial changes
Co-authored-by: prafgup <prafulgupta6@gmail.com>
2021-09-29 09:27:06 +02:00
Julian Risch
f9d2f786ca
Replace FARM import statements; add dependencies ( #1492 )
...
* Replace FARM import statements; add dependencies
* Add InferenceProc., TextCl.Proc., TextPairCl.Proc.
* Remove FARMRanker, add type annotations, rename max_sample
* Add sample_to_features_text for InferenceProc.
* Fix type annotations: model_name_or_path is str not Path
* Fix mypy errors: implement _create_dataset in TextCl.Proc.
* Add task_type "embeddings" in Inferencer
* Allow loading AdaptiveModel for embedding task
* Add SQuAD eval metrics; enable InferenceProc for embedding task
* Add baskets as param to log_samples and handle empty basket list in log_samples
* Remove unused dependencies
* Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead
* Remove FARMRanker and Classifier from doc generation scripts
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-28 16:34:24 +02:00
Malte Pietsch
183fd5ae5a
Simplify tests & allow running on individual doc stores ( #1487 )
...
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* update readme and contributing.md
* update contributing
* adjust example
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-27 10:52:07 +02:00
Branden Chan
2c4baa7f4e
Regenerate API and Tutorial md files ( #1480 )
...
* Change punctuation
* Add latest docstring and tutorial changes
* Change punctuation
* Add documentation for Docs2Answer
* Add latest docstring and tutorial changes
* Generate new API docs
* Replace Finder with Pipeline
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 14:42:18 +02:00
Markus Paff
39845c0624
Automate updates docstrings tutorials ( #1461 )
...
* remove not needed githab actions and reactivate docstrings and tutorial generation
* test workflow
* update pydoc version
* update python version
* update watchdog
* move to latest version pydoc-markdown
* remove version check
* Add latest docstring and tutorial changes
* remove test workflow
* test for param docstrings
* pin pydoc-markdown version
* add test workflow
* pin watchdog version
* Add latest docstring and tutorial changes
* update original workflow and delete test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-17 13:44:31 +02:00
Ikram Ali
3fc7f3f695
[docs] crawler api docs updated. ( #1388 )
2021-09-01 12:07:32 +02:00
Branden Chan
10e332dabb
Fix Links ( #1199 )
...
* Fix link highlight
* Regen md files
* Remove duplicate
* Fix whitespace
* fixing strings for website
* Fix link
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-06-23 19:07:54 +02:00
Markus Paff
6cd49105e7
update api markdown files and add markdown file for ranker ( #1198 )
...
* update api markdown files and add markdown file for ranker
* added docstrings for weaviate
* new version of pydoc-markdown does not render arguments correctly. We used pydoc-markdown==3.11.0
2021-06-15 17:50:08 +02:00
vblagoje
2a5882578a
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever ( #1086 )
...
* Integrate LFQA with Haystack
* Integrate LFQA with Haystack - unit tests
* Properly initialize conftest default value for vector_dim
* Update PR after inital feedback
* Fix conftest.py import
* Seq2SeqGenerator uses Callables instead of subclasses for custom model input
* Update docstring
* Fix Callable use
* Add LFQA tutorials
* Improve type error reporting for invalid input converter Callable
* Generate docstrings
* Format comments in tutorial script
* Generate tutorial md
* Add usage page
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-06-14 17:53:43 +02:00
Branden Chan
5f0f85989a
Refresh API docs ( #1152 )
2021-06-09 16:13:58 +02:00
Lalit Pagaria
f46b09c756
Using text hash as id to prevent document duplication ( #1000 )
...
* using text hash as id to prevent document duplication. Also providing a way customize it.
* Add latest docstring and tutorial changes
* Fixing duplicate value test when text is same
* Adding test for duplicate ids in document store
* Changing exception to generic Exception type
* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-17 17:51:52 +02:00
Branden Chan
869b493b61
Regen api docs ( #1015 )
2021-04-30 12:35:13 +02:00
Branden Chan
9626c0d65e
Update Documentation ( #976 )
...
* Add api pages
* Add latest docstring and tutorial changes
* First sweep of usage docs
* Add link to conversion script
* Add import statements
* Add summarization page
* Add web crawler documentation
* Add confidence scores usage
* Add crawler api docs
* Regenerate api docs
* Update summarizer and translator api
* Add api pages
* Add latest docstring and tutorial changes
* First sweep of usage docs
* Add link to conversion script
* Add import statements
* Add summarization page
* Add web crawler documentation
* Add confidence scores usage
* Add crawler api docs
* Regenerate api docs
* Update summarizer and translator api
* Add indentation (pydoc-markdown 3.10.1)
* Comment out metadata
* Remove Finder deprecation message
* Remove Finder in FAQ
* Update tutorial link
* Incorporate reviewer feedback
* Regen api docs
* Add type annotations
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 16:45:29 +02:00
Branden Chan
77d4c2ca1c
Benchmark milvus ( #850 )
...
* Add milvus benchmarking support
* Add latest docstring and tutorial changes
* Edit config
* Disable docker interactive mode
* Add milvus index type support
* Adjust FAISS and Milvus node branching
* Remove duplicate in config
* Revert method for speedup
* Add latest docstring and tutorial changes
* Add latest benchmark run
* Add latest docstring and tutorial changes
* Add json files
* Revert "Add latest docstring and tutorial changes"
This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923.
* Add latest docstring and tutorial changes
* Revert "Add latest docstring and tutorial changes"
This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b.
* Fix typo
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-13 14:54:15 +02:00
Markus Paff
dfb0282b74
Update milvus links and docstrings ( #959 )
...
* update milvus links and docstrings
* Add latest docstring and tutorial changes
* new milvus version
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-12 14:38:57 +02:00
Timo Moeller
837dea4e6d
Integrate sentence transformers into benchmarks ( #843 )
...
* Integrate sentence transformers into benchmarks
* Add doc store asserts
* switch data downloads from s3 client to https. add license info
* Fix mypy, revert config
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-09 17:24:16 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines ( #922 )
2021-04-07 17:53:32 +02:00
Julian Risch
64ad953c6a
Adding indentation to markup files ( #947 )
2021-04-07 11:36:11 +02:00
Timo Moeller
5d2b16f3cc
Update farm version ( #936 )
...
* Update farm version
* Add new DPR loading, fix dpr param name
* Add QA model confidence as answer probability, fix prams in test
2021-04-01 18:23:05 +02:00
Branden Chan
d77152c469
WIP: Add evaluation nodes for Pipelines ( #904 )
...
* Add main eval fns
* WIP: make pipeline_eval.py run
* Fix typo
* Add support for no_answers
* Add latest docstring and tutorial changes
* Working pipeline eval
* Add timing of nodes
* Add latest docstring and tutorial changes
* Refactor and clean
* Update tutorial script
* Set default params
* Update tutorials
* Fix indent
* Add latest docstring and tutorial changes
* Address mypy issues
* Add test
* Fix mypy error
* Clear outputs
* Add doc strings
* Incorporate reviewer feedback
* Add latest docstring and tutorial changes
* Revert query counting
* Fix typo
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-01 17:35:18 +02:00
Timo Moeller
1244d16010
Better default value for mp chunksize ( #923 )
...
* Better default value for mp chunksize
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-25 19:00:45 +01:00
Lalit Pagaria
e904deefa7
Add Markdown file convertor ( #875 )
2021-03-23 16:31:26 +01:00
Timo Moeller
7b559fa4e8
Improve dpr conversion ( #826 )
...
* Bugfix dpr conversion
* Add latest docstring and tutorial changes
* Fix preprocessor changes
2021-03-18 14:51:01 +01:00
oryx1729
e9f0076dbd
Fix execution of Pipelines with parallel nodes ( #901 )
2021-03-18 12:41:30 +01:00
oryx1729
4b188b8102
Add runtime parameters to component initialization ( #873 )
2021-03-04 12:18:12 +01:00
Branden Chan
325a4e4d14
Add Milvus Documentation ( #838 )
...
* First commit
* Add latest docstring and tutorial changes
* Add DocStore external setup info
* fixed tabs
* Add Milvus recommendation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2021-02-24 11:43:40 +01:00
Malte Pietsch
e641bff7a6
Allow more options for elasticsearch client (auth, multiple hosts) ( #845 )
...
* allow more options for elasticsearch client (auth, multiple hosts)
* Add latest docstring and tutorial changes
* fix mypy
* Add latest docstring and tutorial changes
* test client connection via ping()
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-19 14:29:59 +01:00
Tanay Soni
07907f9eac
Add support for indexing pipelines ( #816 )
2021-02-16 16:24:28 +01:00
Lalit Pagaria
5bd94ac5f7
Adding Translator (standalone component & wrapper for pipelines) ( #782 )
...
* Adding translator with many generic input parameter support
* Making dict_key as generic
* Fixing mypy issue
* Adding pipeline and using opus models
* Add latest docstring and tutorial changes
* Adding test cases for end-to-end translation for generator, summerizer etc
* raise error join and merge nodes
* Fix test failure
* add docstrings. add usage documentation. rm skip_special_tokens param
* Add latest docstring and tutorial changes
* fix code snippets in md
* Adding few extra configuration parameters and fixing tests
* Fixingmypy issue and updating usage document
* fix for mypy issue in pipeline.py
* reverting renaming of pytest_collection_modifyitems method
* Addressing review comments
* setting skip_special_tokens to True
* removing model_max_length argument as None type is not supported to many models
* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 15:58:26 +01:00
Pavel Soriano
8adf5b4737
Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg ( #811 )
...
* added parameter to infer DPR tokenizers class
* Add latest docstring and tutorial changes
* Update docstring. fix mypy
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 14:17:55 +01:00
Tanay Soni
fd5c5dd23c
Introduce incremental updates for embeddings in document stores ( #812 )
2021-02-09 21:25:01 +01:00
Malte Pietsch
ac9f92466f
Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions ( #813 )
...
* fix encoding of pdftotext. fix version in download instructions
* fix test
* Add latest docstring and tutorial changes
* make latin-1 default encoding again
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 13:42:43 +01:00
Tanay Soni
7b18e324f2
Fix building Pipeline with YAML ( #800 )
2021-02-04 11:53:51 +01:00
Tanay Soni
8a5dc8f826
Load Pipeline with YAML config file ( #785 )
2021-02-02 17:32:17 +01:00
Malte Pietsch
1318b55eec
Make tqdm progress bars optional (less verbose prod logs) ( #796 )
...
* make dpr queries less verbose
* add progress bar flag to more components
* Add latest docstring and tutorial changes
* add type
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 20:51:55 +01:00
Tanay Soni
b87dd244c1
Get metadata values for a key from Elasticsearch ( #776 )
2021-02-01 16:13:26 +01:00
Tanay Soni
d62355ca88
Fix mypy typing ( #792 )
2021-02-01 12:15:36 +01:00