Julian Risch
40ceaf418a
Fixing grpcio-tools to version of colab's pre-installed grpcio ( #1113 )
2021-05-31 19:09:10 +02:00
Alvise Sembenico
6326cf5710
🐳 add PDF converter dependencies to Docker ( #1107 )
2021-05-31 19:01:02 +02:00
Branden Chan
6ca6ac0632
Add OpenDistro init ( #1101 )
2021-05-31 18:59:20 +02:00
Julian Risch
84c34295a1
Re-ranking component for document search without QA ( #1025 )
...
* Adding ranker similar to retriever and reader
* Sort documents according to query-document similarity scores
* Reranking and model training runs for small example
* Added EvalRanker node
* Calculate recall@k in EvalRetriever and EvalRanker nodes
* Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers
* Added mean reciprocal rank as metric for EvalDocuments
* Fix bug that appeared when ranking documents with same score
* Remove commented code for unimplmented eval() of Ranker node
* Add documentation of k parameter in EvalDocuments
* Add Ranker docu and renaming top_k param
2021-05-31 15:31:36 +02:00
Michaël Bitard
b5cae20ddb
Fix typo in streamlit UI ( #1106 )
2021-05-28 11:18:09 +02:00
Ikram Ali
94f1a2b5c9
Improve speed of FAISSDocumentStore.delete_documents() ( #1095 )
2021-05-26 07:56:09 +02:00
Ikram Ali
b76ed4c5a4
Add options for handling duplicate documents (skip, fail, overwrite) ( #1088 )
...
* [document_stores] Duplicate document implmentation added for memorystore.
* [document_stores]duplicate documents implementation done for faiss store.
* [document_store] Duplicate document feature added for elasticsearch document store fixed #1069
* [document_store] Duplicate documents feature added for milvus document store and bug fixed in faiss document store fixed #1069
* [document_store] Code refactored fixed #1069
* [document_store]Test cases refactored.
* [document_store] mypy issue fixed.
* [test_case] faiss and milvus test case refactored to support duplicate documents implementation. fixed #1069
* [document_store] duplicate_documents_options code refactored.
* [document_store] Code refactored.
2021-05-25 13:30:06 +02:00
Avishekh Shrestha
c4ee32d47d
Fix typo in preprocessing.md( #1087 )
...
Correct variable name from 'd' to 'doc' in line 134.
2021-05-23 19:16:58 +02:00
Ikram Ali
4ab1bc3c3e
Improve the progress bar in update_embeddings() + Fix filters in update_embeddings() ( #1063 )
...
* [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed #1037
* change 2nd level loop to docs. switch to tqdm.auto.
* [document_stores] Elasticsearch new method get_document_without_embedding_count() added.
* [test_case] Elasticsearch documentstore get_document_without_embedding_count() test case added.
* [document_stores] Add new bool arg in get_document_count() method and fixed #1082
* [document_stores] typo fixed #1082
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-21 14:18:07 +02:00
Lalit Pagaria
f46b09c756
Using text hash as id to prevent document duplication ( #1000 )
...
* using text hash as id to prevent document duplication. Also providing a way customize it.
* Add latest docstring and tutorial changes
* Fixing duplicate value test when text is same
* Adding test for duplicate ids in document store
* Changing exception to generic Exception type
* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-17 17:51:52 +02:00
Malte Pietsch
25d1122773
Upgrade milvus to 1.1.0 ( #1066 )
...
* upgrade milvus in CI to 1.1
* fix pymilvus
* loose pymilvus requirement again
* add date to cache keys
* fix date var in action
2021-05-17 17:27:34 +02:00
Moshe Berchansky
880edd139d
Add use_amp to DPR's train method to enable mixed precision training. ( #1048 )
2021-05-17 15:10:02 +02:00
Ikram Ali
a06e4450d1
Rename delete_all_documents() method to delete_documents() ( #1047 )
2021-05-10 13:37:08 +02:00
Branden Chan
5d31e633ce
Squad tools ( #1029 )
...
* Add first commit
* Add support for conversion to and from pandas df
* Add logging
* Add functionality
* Satisfy mypy
* Incorporate reviewer feedback
2021-05-06 19:02:15 +02:00
Branden Chan
373fef8d1e
Add white space normalization warning ( #1022 )
...
* Add white space normalization warning
* Implement safer document id fetching
2021-05-05 17:54:32 +02:00
Branden Chan
aadd8b049a
Add Tutorial 11 to Readme
2021-05-05 15:35:21 +02:00
oryx1729
9bec8859f2
Test ES connection only for the default user ( #1028 )
2021-05-04 15:03:19 +02:00
oryx1729
c41101ff74
Upgrade streamlit version ( #1024 )
2021-05-03 17:44:57 +02:00
Julian Risch
bf4563e5d2
Filtering duplicate answers ( #1021 )
...
* Allow filtering of duplicate answers as implemented in FARM
* Changed default behavior to filtering exact duplicates
* Change expected test result due to filtering of duplicate answers by default
* Rounding expected test results for comparison with predictions
2021-05-03 17:18:10 +02:00
Bhadresh Savani
ca63f9fee2
Fix debug message for file-upload in UI ( #1018 )
2021-05-03 09:18:55 +02:00
brandenchan
5b0b3e4616
Merge branch 'master' of https://github.com/deepset-ai/haystack
2021-04-30 16:41:05 +02:00
brandenchan
4cc853d1c3
Update link
2021-04-30 15:06:45 +02:00
Branden Chan
869b493b61
Regen api docs ( #1015 )
2021-04-30 12:35:13 +02:00
oryx1729
99990e7249
Add export of Pipeline YAML config ( #1003 )
2021-04-30 12:23:29 +02:00
Mario Jäckle
a00703256f
docs(document_store): add usage information for aws elastic search ( #1008 )
...
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
2021-04-30 11:38:25 +02:00
Bhadresh Savani
37a72d2f45
Add File Upload Functionality in UI ( #995 )
2021-04-30 10:46:30 +02:00
Branden Chan
056be3354b
Add pipelines tutorial ( #1013 )
2021-04-29 18:19:20 +02:00
Branden Chan
9827b3652e
Pipelines tutorial ( #991 )
...
* Start Pipelines tutorial
* Make Tutorial 11 run locally
* Add colab compatibility
* Fix pip install
* Add ES install from source
* Add ES install from source
* Add pygraphviz installation
* Incorporate reviewer feedback
* Ensure print_answers() works for Generator output
* Fix typo
2021-04-29 17:31:28 +02:00
Julian Risch
65f1da00cc
knowledge graph documentation ( #979 )
...
* Create knowledge_graph.md
* add doc strings to Text2SparqlRetriever
* Add doc strings to GraphDBKnowledgeGraph
* Make method calls unambiguous so its clear which class is meant
2021-04-27 16:44:40 +02:00
oryx1729
8a57f6b16a
Update tests for FAISSDocumentStore ( #999 )
2021-04-27 09:55:31 +02:00
Markus Paff
cf8a622e35
Streamlit UI Evaluation mode ( #920 )
...
* first running version of eval mode
* restructuring, new naming of elements and testing
* add new files to Docker, how to start with Haystack reference, remove not needed dependencies
* Add latest docstring and tutorial changes
* merged changes
* fixing bugs after breaking changes from last release
* newser version of states in streamlit, more docs for eval mode, eval file as env virable
* eval file as env variable
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 17:30:17 +02:00
Branden Chan
9626c0d65e
Update Documentation ( #976 )
...
* Add api pages
* Add latest docstring and tutorial changes
* First sweep of usage docs
* Add link to conversion script
* Add import statements
* Add summarization page
* Add web crawler documentation
* Add confidence scores usage
* Add crawler api docs
* Regenerate api docs
* Update summarizer and translator api
* Add api pages
* Add latest docstring and tutorial changes
* First sweep of usage docs
* Add link to conversion script
* Add import statements
* Add summarization page
* Add web crawler documentation
* Add confidence scores usage
* Add crawler api docs
* Regenerate api docs
* Update summarizer and translator api
* Add indentation (pydoc-markdown 3.10.1)
* Comment out metadata
* Remove Finder deprecation message
* Remove Finder in FAQ
* Update tutorial link
* Incorporate reviewer feedback
* Regen api docs
* Add type annotations
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 16:45:29 +02:00
Malte Pietsch
b1e8ebf81a
Create pull_request_template.md
2021-04-22 15:48:39 +02:00
Andrey A
58ea0a62e0
Add links to GitHub Discussion and SO ( #984 )
...
* Add link to Stack Overflow
* Add link to GitHub discussions and re-arrange links
2021-04-22 09:51:21 +02:00
Timo Moeller
2e39361f8a
Add maxsamples and convert data dir to path ( #989 )
2021-04-22 09:35:11 +02:00
oryx1729
7269530e45
Add validation for root node in Pipeline ( #987 )
2021-04-21 12:18:33 +02:00
oryx1729
8c1e411380
Fix update_embeddings() for FAISSDocumentStore ( #978 )
2021-04-21 09:56:35 +02:00
Guillim
0051a34ff9
Add root_path option to REST API for reverse proxy deployments ( #982 )
2021-04-20 11:19:28 +02:00
oryx1729
4dd5a7a744
Make FAISS import optional ( #971 )
2021-04-15 12:26:34 +02:00
oryx1729
237172f459
Make FAISS import conditional ( #970 )
2021-04-14 17:34:01 +02:00
Mario Jäckle
84f90e82c5
feature(aws): add aws iam auth method ( #965 )
...
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
2021-04-14 16:34:24 +02:00
oryx1729
5bb66940a9
Fix equality check in preprocessor ( #969 )
2021-04-14 16:03:48 +02:00
Markus Paff
0633dae4d0
new docs version ( #964 )
2021-04-14 13:40:05 +02:00
oryx1729
bba1d80aef
Update Haystack version
v0.8.0
2021-04-13 16:31:19 +02:00
Branden Chan
77d4c2ca1c
Benchmark milvus ( #850 )
...
* Add milvus benchmarking support
* Add latest docstring and tutorial changes
* Edit config
* Disable docker interactive mode
* Add milvus index type support
* Adjust FAISS and Milvus node branching
* Remove duplicate in config
* Revert method for speedup
* Add latest docstring and tutorial changes
* Add latest benchmark run
* Add latest docstring and tutorial changes
* Add json files
* Revert "Add latest docstring and tutorial changes"
This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923.
* Add latest docstring and tutorial changes
* Revert "Add latest docstring and tutorial changes"
This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b.
* Fix typo
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-13 14:54:15 +02:00
Markus Paff
b87daed62b
fixed link to dpr ( #962 )
2021-04-13 09:45:04 +02:00
Julian Risch
8333a13d6f
Adding tutorial on knowledge graphs to README
2021-04-12 15:26:02 +02:00
Markus Paff
dfb0282b74
Update milvus links and docstrings ( #959 )
...
* update milvus links and docstrings
* Add latest docstring and tutorial changes
* new milvus version
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-12 14:38:57 +02:00
oryx1729
406f7fa679
Disable Gunicorn preload option ( #960 )
2021-04-12 12:46:52 +02:00
Timo Moeller
837dea4e6d
Integrate sentence transformers into benchmarks ( #843 )
...
* Integrate sentence transformers into benchmarks
* Add doc store asserts
* switch data downloads from s3 client to https. add license info
* Fix mypy, revert config
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-09 17:24:16 +02:00