Lalit Pagaria
3a9a756810
Using Columns names instead of ORM to get all documents ( #620 )
...
* Using Columns name instead of ORM object for get all documents call
* Separating meta search from documents. This way it will optimize the memory not duplicating document.text
* Fixing mypy issue
* SQLite have limit on number of host variable hence using batching to fetch meta information
* Query meta only if meta field is not Null in DocOrm
* Add batch_size to other functions except label
* meta can be none so fix that issue
* Dummy commit to trigger CI
* Using chunked dictionary
* Upgrading faiss
* reverting change related to faiss upgrade
* Changing DB name in test_faiss_retrieving test as it might interfere with exiting files by corrupting DB file
* Updating doc string related to batch_size
* Update docstring for batch_size
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-06 15:56:19 +01:00
Branden Chan
bb8aba18e0
Create Preprocessing Tutorial ( #706 )
...
* WIP: First version of preprocessing tutorial
* stride renamed overlap, ipynb and py files created
* rename split_stride in test
* Update preprocessor api documentation
* define order for markdown files
* define order of modules in api docs
* Add colab links
* Incorporate review feedback
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-01-06 15:54:05 +01:00
Malte Pietsch
5db73d4107
Update stale bot
2021-01-05 08:29:24 +01:00
Malte Pietsch
74b0868d28
Fix GPU docker build ( #703 )
2020-12-31 15:04:13 +01:00
Malte Pietsch
a284af3ae5
Remove sourcerer.io widget ( #702 )
...
Fix #699
2020-12-30 09:57:02 +01:00
Tanmay Laud
7cd9e09491
Add basic demo UI via streamlit ( #671 )
...
* Added starter code for frontend demo
* worked on comments
* Added Docker config for frontend
* update docker file. restructure folder structure. minimal renamings and defaults
* add screenshot to readme
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-12-27 13:36:09 +01:00
Lalit Pagaria
fc521fe293
Haystack logo is not visible on github mobile. Adding download badge. ( #697 )
2020-12-27 12:58:48 +01:00
Malte Pietsch
737a47e2fe
Update readme
2020-12-26 12:14:52 +01:00
Malte Pietsch
a2e5e6b09e
Update pipeline documentation and readme ( #693 )
...
* Update README.md
* Update pipelines.md
* Update pipelines.md
* Update README.md
2020-12-22 13:34:28 +01:00
Malte Pietsch
94b7345505
Make use_gpu=True the default in tutorials ( #692 )
...
* enable gpu args in tutorials
* add info box for gpu runtime on colab
2020-12-22 07:58:12 +01:00
Markus Paff
b752da1cd5
Add docs v0.6.0 ( #689 )
...
* new docs version
* updated directory structure
* Add pipelines page
* Add Finder deprecation suggestion
* header for pipelines file
* Document MySQL support
* Mention DPR train tutorial coming soon
* Mention open distro ES
* Update doc strings regarding similarity fn
* Add link to API docs
* Wrap pipelines docs in box
* add api reference for pipelines
* copied latest version to v0.6.0
* Remove space
* Remove space
* Copy to v0.6.0
Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-18 12:47:27 +01:00
Tanay Soni
0e4eec9499
Add tests for custom embedding field ( #640 )
2020-12-17 09:18:57 +01:00
Malte Pietsch
5b817387c2
Bump version to 0.6.0
v0.6.0
2020-12-17 06:31:22 +01:00
bogdankostic
a9bcabc42d
Fix saving tokenizers in DPR training + unify save and load dirs ( #682 )
2020-12-16 17:09:47 +01:00
Tanay Soni
4c2804e38e
Add support for aggregating scores in JoinDocuments node ( #683 )
2020-12-16 15:54:58 +01:00
demSd
143da4cb3f
Fix a typo in DPR args, num_negatives -> num_positives ( #681 )
...
* fix a typo, num_negatives -> num_positives
* default value for num_positives
* Update dense.py
2020-12-15 10:10:41 +01:00
Tanay Soni
369e237fd4
Add DocumentStore for Open Distro Elasticsearch ( #676 )
2020-12-15 09:28:40 +01:00
Tanay Soni
33fe597949
Cleanup Pytest Fixtures ( #639 )
2020-12-14 18:15:44 +01:00
Branden Chan
d8154939fc
Scale dot product into probabilities ( #667 )
...
* scale dot product
* Add tip in documentation
* Add recommendation boxes
* WIP: Use similarity attribute in all doc stores
* Implement similarity for InMemoryDS
* Add FAISS support
* Clean printout
* Update documentation
* Implement document field map
2020-12-11 12:10:24 +01:00
demSd
a0e146dde6
add gpu support for rag ( #669 )
...
* add gpu support for rag
* Update transformers.py
2020-12-11 12:08:01 +01:00
Malte Pietsch
149d98a0fd
Add latest benchmark run ( #652 )
...
* add latest benchmark run
* update templates and fix small json errors
* Change scale
Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-10 16:25:51 +01:00
Timo Moeller
efc754b166
Redone: Fix concatenation of sentences in PreProcessor. Add stride for word-based splits with sentence boundaries ( #641 )
...
* Update preprocessor.py
Concatenation of sentences done correctly. Stride functionality enabled for splitting by words while respecting sentence boundaries.
* Simplify code, add test
Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2020-12-09 16:12:36 +01:00
Branden Chan
8c904d79d6
Fix links ( #663 )
2020-12-08 10:28:31 +01:00
Tanay Soni
c4a5de59aa
Add set_node() for Pipeline ( #659 )
2020-12-07 19:16:35 +01:00
Tanay Soni
4152ad8426
Enable dynamic parameter updates for the FARMReader ( #650 )
2020-12-07 14:07:20 +01:00
Malte Pietsch
e6ada08d0e
Update query arg in Tutorial 7 ( #656 )
2020-12-04 08:42:09 +01:00
Tanay Soni
8e52b48e1d
Add pipelines for GenerativeQA & FAQs ( #645 )
2020-12-03 10:27:06 +01:00
Malte Pietsch
216787ed34
Fix benchmarks ( #648 )
...
* disable fasttokenizer, increase ES timeout for delete requests
* add session.close()
* fix deletion of docs
2020-12-02 16:59:42 +01:00
Branden Chan
79555148ac
Add link to FAISS Info in documentation ( #643 )
...
* Add link to FAISS info
* Clean link
2020-12-02 15:24:22 +01:00
brandenchan
cdd009d1ef
Better payload example spacing
2020-12-01 13:07:29 +01:00
Branden Chan
e573c9e27d
Improve User Feedback Documentation ( #539 )
...
* Extend docs
* Add User Feedback API calls
* Incorporate reviewer feedback
2020-12-01 12:55:31 +01:00
Malte Pietsch
a9107d29eb
Refactor DensePassageRetriever._get_predictions ( #642 )
2020-12-01 09:22:15 +01:00
Tanay Soni
5e62e54875
Rename question parameter to query ( #614 )
2020-11-30 17:50:04 +01:00
Branden Chan
5e5dba9587
Add api md ( #631 )
2020-11-27 17:26:53 +01:00
Branden Chan
9fbd845ef3
Clean API docs and increase coverage ( #621 )
...
* Fix docstrings
* Fix docstrings
* docstrings for retrievers and docstores
* Clean and add more docstrings
2020-11-27 17:17:58 +01:00
Tanay Soni
fa55de2fab
Add refresh_type param for Elasticsearch update_embeddings() ( #630 )
2020-11-27 16:10:04 +01:00
brandenchan
ce6cba227f
Fix website typo
2020-11-27 16:07:29 +01:00
Markus Paff
88d0ee2c98
Add boxes for recommendations ( #629 )
...
* add boxes for recommendations
* add more recommendation boxes
Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-11-27 16:00:20 +01:00
Malte Pietsch
58bc9aa7f0
Add contributor hall of fame ( #628 )
2020-11-26 14:52:20 +01:00
Ky-Anh Huynh
0edd127f35
Add formatting checks for shell scripts ( #627 )
2020-11-26 14:36:35 +01:00
Ky-Anh Huynh
4bd4a61e65
README: Fix link to roadmap ( #626 )
...
Co-authored-by: Ky-Anh Huynh <kyanh.huynh@viettug.org>
2020-11-26 14:01:05 +01:00
Tanay Soni
ea976ba5b5
Add return_embedding parameter for get_all_documents() ( #615 )
2020-11-26 10:32:30 +01:00
Branden Chan
09690b84b4
Move DPR embeddings from GPU to CPU straight away ( #618 )
...
* Start
* Move embeddings from gpu to cpu
2020-11-25 14:22:43 +01:00
Branden Chan
ae530c3a41
Fix docstring examples ( #604 )
...
* Fix docstring examples
* Unify code example format
* Add md files
2020-11-25 14:19:49 +01:00
Markus Paff
3dee284f20
cleaning the api docs ( #616 )
2020-11-24 18:49:14 +01:00
Branden Chan
e192387e65
Fix link ( #613 )
2020-11-24 11:11:20 +01:00
Tanay Soni
e3a68aedaf
Add support for building custom Search Pipelines ( #596 )
2020-11-20 17:41:08 +01:00
Guillim
65cf9547d2
Allow setting return_no_answers for TransformersReader in REST API (SQuAD 1.0 format) ( #609 )
...
* Update config.py
* new option
Allow a new option from the settings : tell is a reader model can return a "no answer" like SQuAD2.0 models, or if it's only a SQuAD1.0-like model, always giving an answer.
2020-11-20 14:09:39 +01:00
Branden Chan
1e8af84ecc
Make more changes to documentation ( #578 )
...
* First batch of changes
* Add RAG tutorial links
* Prettify RAG tutorial
* draft of generator doc
* Add text
* Complete generator page
* Create optimization section
* Split intro
* Fix formatting tutorial 7
2020-11-19 14:58:27 +01:00
Branden Chan
2aa3c071fd
Remove column in benchmark website ( #608 )
...
* Make benchmarks clearer
* remove column
2020-11-19 12:18:47 +01:00