Julian Risch
17dcb8c23e
Use Reader's device by default ( #1208 )
...
* Use Reader's device by default
* Replace get_device with initialize_device_settings
* Add import statements for init_device_settings
* Remove unused get_device method
2021-06-24 09:22:34 +02:00
Branden Chan
10e332dabb
Fix Links ( #1199 )
...
* Fix link highlight
* Regen md files
* Remove duplicate
* Fix whitespace
* fixing strings for website
* Fix link
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-06-23 19:07:54 +02:00
Branden Chan
efc03f72db
Make PreProcessor.process() work on lists of documents ( #1163 )
...
* Add process_batch method
* Rename methods
* Fix doc string, satisfy mypy
* Fix mypy CI
* Fix typp
* Update tutorial
* Fix argument name
* Change arg name
* Incorporate reviewer feedback
2021-06-23 18:13:51 +02:00
oryx1729
afee4f36ce
Add scaffold for defining custom components for Pipelines ( #1205 )
2021-06-23 12:01:54 +02:00
vblagoje
02fc4c7783
Improve document stores unit test parametrization ( #1202 )
2021-06-22 16:08:23 +02:00
Markus Paff
a8f3601e6a
Pin docs for 0.9.0
2021-06-22 10:38:08 +02:00
Ikram Ali
d835a9cdc5
[setup] version tag added to Haystack fix #1175 ( #1216 )
2021-06-22 09:43:26 +02:00
Stefano
66049abff0
Add arg to support different languages in PreProcessor's sentence segmentation ( #1160 )
...
* Add PreProcessor optional language parameter.
* Add iso639 to nltk languages.
* Update docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-21 18:53:19 +02:00
Julian Risch
9e4d7bf9be
Increase Haystack version to 0.9.0 ( #1215 )
v0.9.0
2021-06-21 18:39:00 +02:00
oryx1729
0168f04385
Remove unused function _get_pseudo_prob ( #1201 )
2021-06-17 10:28:48 +02:00
C V Goudar
f9c4083006
Bugfix setting of device by defaulting to "cpu" ( #1182 )
...
* Defaulting the device to cpu in case gpu is not available and use_gpu is set to True
Co-authored-by: C V Goudar <cv.goudar@emplay.et>
2021-06-16 10:26:29 +02:00
Markus Paff
6cd49105e7
update api markdown files and add markdown file for ranker ( #1198 )
...
* update api markdown files and add markdown file for ranker
* added docstrings for weaviate
* new version of pydoc-markdown does not render arguments correctly. We used pydoc-markdown==3.11.0
2021-06-15 17:50:08 +02:00
Julian Risch
215c45eb8a
Remove quickfix from reader and ranker ( #1196 )
...
* Remove quickfix from ranker
* remove quickfix from reader
* Use inferencer's model instead of reloaded model
2021-06-15 09:46:11 +02:00
Branden Chan
7dbd58f6be
Add about sections ( #1195 )
2021-06-14 18:37:00 +02:00
vblagoje
2a5882578a
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever ( #1086 )
...
* Integrate LFQA with Haystack
* Integrate LFQA with Haystack - unit tests
* Properly initialize conftest default value for vector_dim
* Update PR after inital feedback
* Fix conftest.py import
* Seq2SeqGenerator uses Callables instead of subclasses for custom model input
* Update docstring
* Fix Callable use
* Add LFQA tutorials
* Improve type error reporting for invalid input converter Callable
* Generate docstrings
* Format comments in tutorial script
* Generate tutorial md
* Add usage page
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-06-14 17:53:43 +02:00
venuraja
ae55927f58
Weaviate: Update Embeddings - Use update instead of replace ( #1181 )
...
* Update Embeddings logic improved
* Update Embeddings logic improved
2021-06-14 17:50:55 +02:00
Shahrukh Khan
1a3b4b9c74
Fix typo in Query Classifier Exception Message( #1190 )
2021-06-14 17:40:35 +02:00
Julian Risch
f6e70f0f3d
Removed single_model_path; added infer_tokenizer to dpr load() ( #1060 )
2021-06-14 14:14:46 +02:00
Julian Risch
1c31589b43
Bump to FARM 0.8.0, torch 1.8.1 and transformers 4.6.1 ( #1192 )
...
* bump to FARM 0.8.0, which in turn bumps torch 1.8.1 and transformers 4.6.1 (#1192 )
* Replace deprecated force_bos_token_to_be_generated parameter
2021-06-14 13:00:41 +02:00
Bob van Luijt
f583d0bfaf
Minor change with a link to the Weaviate docs ( #1180 )
...
Super minor change, but in line with other DocumentStore's
2021-06-11 21:20:23 +02:00
Branden Chan
e7937ac5d7
Reformat FAQ page ( #1177 )
...
* Add faq page
* Update faq.md
* Fix mypy CI
* Add question
* Reformat faq
2021-06-11 11:59:52 +02:00
Branden Chan
783893c3d2
Tutorial update ( #1166 )
...
* Add header / footer
* Add Milvus example
* Generate md files
* Fix mypy CI
2021-06-11 11:09:15 +02:00
Branden Chan
13edff109d
Documentation update ( #1162 )
...
* Add content
* Add German BERT references
* Mention preprocessor language
* Fix mypy CI
* Add document length recommendation
* Add more languages
2021-06-11 11:06:57 +02:00
Branden Chan
41b537affe
Add FAQ page ( #1151 )
...
* Add faq page
* Update faq.md
* Fix mypy CI
* Add question
2021-06-10 17:29:14 +02:00
venuraja79
49886f88f0
Integrate Weaviate as another DocumentStore ( #1064 )
...
* Annotation Tool: data is not persisted when using local version #853
* First version of weaviate
* First version of weaviate
* First version of weaviate
* Updated comments
* Updated comments
* ran query, get and write tests
* update embeddings, dynamic schema and filters implemented
* Initial set of tests and fixes
* Tests added for update_embeddings and delete documents
* introduced duplicate documents fix
* fixed mypy errors
* Added Weaviate to requirements
* Fix the weaviate docker env variables
* Fixing test dependencies for now
* Created weaviate test marker and fixed query
* Update docstring
* Add documentation
* Bump up weaviate version
* Bump up weaviate version in documentation
* Bump up weaviate version in documentation
* Updgrade weaviate version
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-10 09:43:53 +02:00
Lalit Pagaria
db17d73a82
Fixing issues caused due to mypy upgrade ( #1165 )
2021-06-09 16:24:39 +02:00
Branden Chan
5f0f85989a
Refresh API docs ( #1152 )
2021-06-09 16:13:58 +02:00
Shahrukh Khan
545c625a37
Add QueryClassifier incl. baseline models ( #1099 )
...
* restructure query classifier code and add s3 based pickles
* make model and vectorizer optional in query classifier
* update query classifier as per init style
* add query classifiers sklearn/hf
* update docstrings for query classifiers
* add unit test for query classifier
* add type patch for sklearn classifier
* fix mypy type issue
* revert to pure formatting
* add query classifiers
* resolve conflict
* add output names for query classifier
* revert output and update docstring queryclassifier
* Update docstring for SklearnQueryClassifier
* update transformer query classifier docstring
* fix typo
* change arg names in query classifier classes
* add set_config(). rename attributes
* fix set_config()
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-08 15:20:13 +02:00
Malte Pietsch
600636e77b
Update README.md
2021-06-08 09:23:56 +02:00
Branden Chan
59e3c55c47
Add More top_k handling to EvalDocuments ( #1133 )
...
* Improve top_k support
* Adjust warning
* Satisfy mypy
* Reinit eval counts if top_k has changed
* Incorporate reviewer feedback
2021-06-07 12:11:00 +02:00
Branden Chan
c513865566
Add L2 support for FAISS HNSW ( #1138 )
2021-06-04 11:05:18 +02:00
Julian Risch
580e28344d
Add docu of confidence scores and calibration method ( #1131 )
...
* Add docu of confidence scores and calibration method
2021-06-03 15:49:07 +02:00
Malte Pietsch
a1472b040c
Add badges ( #1136 )
2021-06-03 14:47:08 +02:00
Malte Pietsch
b41719b7c8
Add config to JoinDocuments node to allow yaml export in pipelines ( #1134 )
...
* add config to JoinNode to allow yaml export
* remove test print
2021-06-03 11:03:25 +02:00
Julian Risch
8e3d0d1287
Distinguish labels for calculating similarity scores ( #1124 )
...
* Distinguish labels for calculating similarity scores
* Explain label "0" and "1" of TextPairClassifier in Ranker
2021-06-02 17:33:36 +02:00
Branden Chan
b555bc525c
Remove duplicate run ( #1132 )
2021-06-02 13:58:55 +02:00
Branden Chan
09ba75073c
Improve Milvus HNSW Performance ( #1127 )
...
* Add simplified script
* Optimize HNSW index creation
* Adjust benchmark order
* Rename script
2021-06-02 13:17:35 +02:00
Branden Chan
9356f637d4
Update Milvus benchmarks ( #1128 )
...
* Update Milvus benchmarks
* Add sentence transformers
* Update sentence transformers index results
* Remove duplicate row
2021-06-02 13:09:45 +02:00
Branden Chan
aa6f768efa
Prevent merge of same questions on different documents during evaluation ( #1119 )
...
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Make label field flexible, add docstrings
* Satisfy mypy
* Fix failing tests
* Adjust docstring
* Fix tutorial
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-02 12:09:03 +02:00
Branden Chan
d8c47ed525
Preserve whitespace ( #1121 )
2021-06-02 12:08:22 +02:00
Malte Pietsch
022f8586f6
Remove Python 3.6 support ( #1059 )
...
* Remove Python 3.6 support
* change cache key for CI
2021-06-01 15:24:44 +02:00
Julian Risch
a7ba146246
Removed comma from last item in json list ( #1114 )
2021-06-01 12:32:21 +02:00
Julian Risch
40ceaf418a
Fixing grpcio-tools to version of colab's pre-installed grpcio ( #1113 )
2021-05-31 19:09:10 +02:00
Alvise Sembenico
6326cf5710
🐳 add PDF converter dependencies to Docker ( #1107 )
2021-05-31 19:01:02 +02:00
Branden Chan
6ca6ac0632
Add OpenDistro init ( #1101 )
2021-05-31 18:59:20 +02:00
Julian Risch
84c34295a1
Re-ranking component for document search without QA ( #1025 )
...
* Adding ranker similar to retriever and reader
* Sort documents according to query-document similarity scores
* Reranking and model training runs for small example
* Added EvalRanker node
* Calculate recall@k in EvalRetriever and EvalRanker nodes
* Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers
* Added mean reciprocal rank as metric for EvalDocuments
* Fix bug that appeared when ranking documents with same score
* Remove commented code for unimplmented eval() of Ranker node
* Add documentation of k parameter in EvalDocuments
* Add Ranker docu and renaming top_k param
2021-05-31 15:31:36 +02:00
Michaël Bitard
b5cae20ddb
Fix typo in streamlit UI ( #1106 )
2021-05-28 11:18:09 +02:00
Ikram Ali
94f1a2b5c9
Improve speed of FAISSDocumentStore.delete_documents() ( #1095 )
2021-05-26 07:56:09 +02:00
Ikram Ali
b76ed4c5a4
Add options for handling duplicate documents (skip, fail, overwrite) ( #1088 )
...
* [document_stores] Duplicate document implmentation added for memorystore.
* [document_stores]duplicate documents implementation done for faiss store.
* [document_store] Duplicate document feature added for elasticsearch document store fixed #1069
* [document_store] Duplicate documents feature added for milvus document store and bug fixed in faiss document store fixed #1069
* [document_store] Code refactored fixed #1069
* [document_store]Test cases refactored.
* [document_store] mypy issue fixed.
* [test_case] faiss and milvus test case refactored to support duplicate documents implementation. fixed #1069
* [document_store] duplicate_documents_options code refactored.
* [document_store] Code refactored.
2021-05-25 13:30:06 +02:00
Avishekh Shrestha
c4ee32d47d
Fix typo in preprocessing.md( #1087 )
...
Correct variable name from 'd' to 'doc' in line 134.
2021-05-23 19:16:58 +02:00