Pavel Soriano
8adf5b4737
Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg ( #811 )
...
* added parameter to infer DPR tokenizers class
* Add latest docstring and tutorial changes
* Update docstring. fix mypy
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 14:17:55 +01:00
Branden Chan
c807f0d050
Add key concepts diagram
2021-02-12 12:49:22 +01:00
Branden Chan
a1983ad84e
Add new images
2021-02-11 15:10:00 +01:00
Tanay Soni
fd5c5dd23c
Introduce incremental updates for embeddings in document stores ( #812 )
2021-02-09 21:25:01 +01:00
Malte Pietsch
e91518ee00
Update tutorials (torch versions, ES version, replace Finder with Pipeline) ( #814 )
...
* remove manual torch install on colab
* update elasticsearch version everywhere to 7.9.2
* fix FAQPipeline
* update tutorials with new pipelines
* Add latest docstring and tutorial changes
* revert faqpipeline change. fix field names in tutorial 4
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 14:56:54 +01:00
Malte Pietsch
ac9f92466f
Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions ( #813 )
...
* fix encoding of pdftotext. fix version in download instructions
* fix test
* Add latest docstring and tutorial changes
* make latin-1 default encoding again
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 13:42:43 +01:00
Tanay Soni
7b18e324f2
Fix building Pipeline with YAML ( #800 )
2021-02-04 11:53:51 +01:00
Branden Chan
f3a3b73d9b
Choose correct similarity fns during benchmark runs & re-run benchmarks ( #773 )
...
* Adapt to new dataset_from_dicts return signature
* rename fn
* Align similarity fn in benchmark doc store
* Better choice of similarity fn
* Increase postgres wait time
* Add more expected returned variables
* update benchmark results
* Fix typo
* update all benchmark runs
* multiply stats by 100
* Specify similarity fns for website
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-03 11:45:18 +01:00
Tanay Soni
8a5dc8f826
Load Pipeline with YAML config file ( #785 )
2021-02-02 17:32:17 +01:00
Malte Pietsch
1318b55eec
Make tqdm progress bars optional (less verbose prod logs) ( #796 )
...
* make dpr queries less verbose
* add progress bar flag to more components
* Add latest docstring and tutorial changes
* add type
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 20:51:55 +01:00
Tanay Soni
b87dd244c1
Get metadata values for a key from Elasticsearch ( #776 )
2021-02-01 16:13:26 +01:00
Tanay Soni
d62355ca88
Fix mypy typing ( #792 )
2021-02-01 12:15:36 +01:00
Branden Chan
1dc74c7067
Add model versioning support ( #784 )
...
* Add model versioning support
* Add latest docstring and tutorial changes
* Support DPR versioning
* Add RAG versioning support
* Add latest docstring and tutorial changes
* Add summarizer support
* Add Embedding Retriever support
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 11:42:36 +01:00
Lalit Pagaria
9f7f95221f
Milvus integration ( #771 )
...
* Initial commit for Milvus integration
* Add latest docstring and tutorial changes
* Updating implementation of Milvus document store
* Add latest docstring and tutorial changes
* Adding tests and updating doc string
* Add latest docstring and tutorial changes
* Fixing issue caught by tests
* Addressing review comments
* Fixing mypy detected issue
* Fixing issue caught in test about sorting of vector ids
* fixing test
* Fixing generator test failure
* update docstrings
* Addressing review comments about multiple network call while fetching embedding from milvus server
* Add latest docstring and tutorial changes
* Ignoring mypy issue while converting vector_id to int
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-29 13:29:12 +01:00
brandenchan
6efa4f06c1
Add Streamlit UI Image
2021-01-27 17:01:29 +01:00
Tanay Soni
46307d1571
Remove quotes around placeholders in Elasticsearch custom query ( #762 )
2021-01-25 12:46:43 +01:00
Tanay Soni
f0aa879a1c
Fix delete_all_documents for the SQLDocumentStore ( #761 )
2021-01-22 14:39:24 +01:00
Markus Paff
aee90c5df9
Docs v0.7.0 ( #757 )
...
* new docs version
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-22 10:28:33 +01:00
Markus Paff
0b583b8972
Generate docstrings and deploy to branches to Staging (Website) ( #731 )
...
* test pre commit hook
* test status
* test on this branch
* push generated docstrings and tutorials to branch
* fixed syntax error
* Add latest docstring and tutorial changes
* add files before commit
* catch commit error
* separate generation from deployment
* add deployment process for staging
* add current branch to payload
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-21 11:01:09 +01:00
Branden Chan
a3a12bc95b
Remove broken link
2021-01-13 17:32:10 +01:00
brandenchan
01fd9940d8
Fix tutorial link
2021-01-13 15:29:25 +01:00
Branden Chan
7376185b65
Create DPR training tutorial ( #708 )
...
* WIP: Start DPR training tutorial
* Create basics of DPR Train tutorial
* Update documentation
* Allow DPR to be initialized without document store
* WIP: Add param descriptions to DPR notebook
* Clean tutorial
* Improve loading
* Make doc store optional when loading DPR
* Satisfy mypy type check
* Add links
* Add tutorial header
* Add colab badge
* Clear outputs
* Incorporate reviewer feedback
* WIP: Start DPR training tutorial
* Create basics of DPR Train tutorial
* Update documentation
* Allow DPR to be initialized without document store
* WIP: Add param descriptions to DPR notebook
* Clean tutorial
* Improve loading
* Make doc store optional when loading DPR
* Satisfy mypy type check
* Add links
* Add tutorial header
* Add colab badge
* Clear outputs
* Incorporate reviewer feedback
* Add readme links
* Regenerate tutorials
* Add excitement
* Fix typo
* Fix hard negatives comment
* Wrap tutorial for windows users
* Fix mypy issue
2021-01-13 10:33:55 +01:00
Markus Paff
3af3ee1a12
Automate docstring and tutorial generation with every push to master ( #718 )
...
* automate docstring and tutorial generation with every push to master
* test CI for current branch
* fixed yaml syntax
* add setupttools to install process
* checkout repo
* fixed command for shell script
* install wheel as it is needed for CI
* install mkdocs
* test without shell script
* use package from github actions
* test other configuration
* back to right config
* cleaning script
2021-01-11 16:25:43 +01:00
Branden Chan
bb8aba18e0
Create Preprocessing Tutorial ( #706 )
...
* WIP: First version of preprocessing tutorial
* stride renamed overlap, ipynb and py files created
* rename split_stride in test
* Update preprocessor api documentation
* define order for markdown files
* define order of modules in api docs
* Add colab links
* Incorporate review feedback
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-01-06 15:54:05 +01:00
Malte Pietsch
a2e5e6b09e
Update pipeline documentation and readme ( #693 )
...
* Update README.md
* Update pipelines.md
* Update pipelines.md
* Update README.md
2020-12-22 13:34:28 +01:00
Markus Paff
b752da1cd5
Add docs v0.6.0 ( #689 )
...
* new docs version
* updated directory structure
* Add pipelines page
* Add Finder deprecation suggestion
* header for pipelines file
* Document MySQL support
* Mention DPR train tutorial coming soon
* Mention open distro ES
* Update doc strings regarding similarity fn
* Add link to API docs
* Wrap pipelines docs in box
* add api reference for pipelines
* copied latest version to v0.6.0
* Remove space
* Remove space
* Copy to v0.6.0
Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-18 12:47:27 +01:00
Branden Chan
d8154939fc
Scale dot product into probabilities ( #667 )
...
* scale dot product
* Add tip in documentation
* Add recommendation boxes
* WIP: Use similarity attribute in all doc stores
* Implement similarity for InMemoryDS
* Add FAISS support
* Clean printout
* Update documentation
* Implement document field map
2020-12-11 12:10:24 +01:00
Malte Pietsch
149d98a0fd
Add latest benchmark run ( #652 )
...
* add latest benchmark run
* update templates and fix small json errors
* Change scale
Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-10 16:25:51 +01:00
Branden Chan
8c904d79d6
Fix links ( #663 )
2020-12-08 10:28:31 +01:00
Malte Pietsch
e6ada08d0e
Update query arg in Tutorial 7 ( #656 )
2020-12-04 08:42:09 +01:00
Branden Chan
79555148ac
Add link to FAISS Info in documentation ( #643 )
...
* Add link to FAISS info
* Clean link
2020-12-02 15:24:22 +01:00
brandenchan
cdd009d1ef
Better payload example spacing
2020-12-01 13:07:29 +01:00
Branden Chan
e573c9e27d
Improve User Feedback Documentation ( #539 )
...
* Extend docs
* Add User Feedback API calls
* Incorporate reviewer feedback
2020-12-01 12:55:31 +01:00
Branden Chan
5e5dba9587
Add api md ( #631 )
2020-11-27 17:26:53 +01:00
brandenchan
ce6cba227f
Fix website typo
2020-11-27 16:07:29 +01:00
Markus Paff
88d0ee2c98
Add boxes for recommendations ( #629 )
...
* add boxes for recommendations
* add more recommendation boxes
Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-11-27 16:00:20 +01:00
Branden Chan
ae530c3a41
Fix docstring examples ( #604 )
...
* Fix docstring examples
* Unify code example format
* Add md files
2020-11-25 14:19:49 +01:00
Markus Paff
3dee284f20
cleaning the api docs ( #616 )
2020-11-24 18:49:14 +01:00
Branden Chan
1e8af84ecc
Make more changes to documentation ( #578 )
...
* First batch of changes
* Add RAG tutorial links
* Prettify RAG tutorial
* draft of generator doc
* Add text
* Complete generator page
* Create optimization section
* Split intro
* Fix formatting tutorial 7
2020-11-19 14:58:27 +01:00
Branden Chan
2aa3c071fd
Remove column in benchmark website ( #608 )
...
* Make benchmarks clearer
* remove column
2020-11-19 12:18:47 +01:00
Branden Chan
827a40b12a
Make benchmarks clearer ( #606 )
2020-11-19 10:31:43 +01:00
brandenchan
090a8cf3e9
Revert "First batch of changes"
...
This reverts commit c07182aa0ab77106cdb142f4ca43ff02476e6fbf.
2020-11-12 12:27:16 +01:00
brandenchan
c07182aa0a
First batch of changes
2020-11-12 12:07:02 +01:00
Malte Pietsch
ea0fd405d8
add concept sketch
2020-11-07 08:42:01 +01:00
Markus Paff
4cca3b5290
New docs version v0.5.0 ( #560 )
2020-11-06 13:17:04 +01:00
Branden Chan
99e924aede
Update Documentation for Haystack 0.5.0 ( #557 )
...
* Add languages and preprocessing pages
* add content
* address review comments
* make link relative
* update api ref with latest docstrings
* move doc readme and update
* add generator API docs
* fix example code
* design and link fix
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2020-11-06 10:53:22 +01:00
Markus Paff
40c5c8edb4
Added new formatting for examples in docstrings ( #555 )
2020-11-05 15:50:08 +01:00
Malte Pietsch
df13a6830d
Update annotation docs for website ( #505 )
...
* update annotation docs for website
* add md file for docs
* add user manual
2020-11-03 11:24:06 +01:00
Malte Pietsch
50709a3f9d
Fix retriever mAP benchmarks
2020-11-02 19:55:58 +01:00
Branden Chan
3793205aa3
Merge branch 'master' into fix_website
2020-10-29 10:29:25 +01:00