35 Commits

Author SHA1 Message Date
Lalit Pagaria
e904deefa7
Add Markdown file convertor (#875) 2021-03-23 16:31:26 +01:00
Timo Moeller
7b559fa4e8
Improve dpr conversion (#826)
* Bugfix dpr conversion

* Add latest docstring and tutorial changes

* Fix preprocessor changes
2021-03-18 14:51:01 +01:00
oryx1729
e9f0076dbd
Fix execution of Pipelines with parallel nodes (#901) 2021-03-18 12:41:30 +01:00
oryx1729
4b188b8102
Add runtime parameters to component initialization (#873) 2021-03-04 12:18:12 +01:00
Branden Chan
325a4e4d14
Add Milvus Documentation (#838)
* First commit

* Add latest docstring and tutorial changes

* Add DocStore external setup info

* fixed tabs

* Add Milvus recommendation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2021-02-24 11:43:40 +01:00
Malte Pietsch
e641bff7a6
Allow more options for elasticsearch client (auth, multiple hosts) (#845)
* allow more options for elasticsearch client (auth, multiple hosts)

* Add latest docstring and tutorial changes

* fix mypy

* Add latest docstring and tutorial changes

* test client connection via ping()

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-19 14:29:59 +01:00
Tanay Soni
07907f9eac
Add support for indexing pipelines (#816) 2021-02-16 16:24:28 +01:00
Lalit Pagaria
5bd94ac5f7
Adding Translator (standalone component & wrapper for pipelines) (#782)
* Adding translator with many generic input parameter support

* Making dict_key as generic

* Fixing mypy issue

* Adding pipeline and using opus models

* Add latest docstring and tutorial changes

* Adding test cases for end-to-end translation for generator, summerizer etc

* raise error join and merge nodes

* Fix test failure

* add docstrings. add usage documentation. rm skip_special_tokens param

* Add latest docstring and tutorial changes

* fix code snippets in md

* Adding few extra configuration parameters and fixing tests

* Fixingmypy issue and updating usage document

* fix for mypy issue in pipeline.py

* reverting renaming of pytest_collection_modifyitems method

* Addressing review comments

* setting skip_special_tokens to True

* removing model_max_length argument as None type is not supported to many models

* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 15:58:26 +01:00
Pavel Soriano
8adf5b4737
Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg (#811)
* added parameter to infer DPR tokenizers class

* Add latest docstring and tutorial changes

* Update docstring. fix mypy

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 14:17:55 +01:00
Tanay Soni
fd5c5dd23c
Introduce incremental updates for embeddings in document stores (#812) 2021-02-09 21:25:01 +01:00
Malte Pietsch
ac9f92466f
Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions (#813)
* fix encoding of pdftotext. fix version in download instructions

* fix test

* Add latest docstring and tutorial changes

* make latin-1 default encoding again

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 13:42:43 +01:00
Tanay Soni
7b18e324f2
Fix building Pipeline with YAML (#800) 2021-02-04 11:53:51 +01:00
Tanay Soni
8a5dc8f826
Load Pipeline with YAML config file (#785) 2021-02-02 17:32:17 +01:00
Malte Pietsch
1318b55eec
Make tqdm progress bars optional (less verbose prod logs) (#796)
* make dpr queries less verbose

* add progress bar flag to more components

* Add latest docstring and tutorial changes

* add type

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 20:51:55 +01:00
Tanay Soni
b87dd244c1
Get metadata values for a key from Elasticsearch (#776) 2021-02-01 16:13:26 +01:00
Tanay Soni
d62355ca88
Fix mypy typing (#792) 2021-02-01 12:15:36 +01:00
Branden Chan
1dc74c7067
Add model versioning support (#784)
* Add model versioning support

* Add latest docstring and tutorial changes

* Support DPR versioning

* Add RAG versioning support

* Add latest docstring and tutorial changes

* Add summarizer support

* Add Embedding Retriever support

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 11:42:36 +01:00
Lalit Pagaria
9f7f95221f
Milvus integration (#771)
* Initial commit for Milvus integration

* Add latest docstring and tutorial changes

* Updating implementation of Milvus document store

* Add latest docstring and tutorial changes

* Adding tests and updating doc string

* Add latest docstring and tutorial changes

* Fixing issue caught by tests

* Addressing review comments

* Fixing mypy detected issue

* Fixing issue caught in test about sorting of vector ids

* fixing test

* Fixing generator test failure

* update docstrings

* Addressing review comments about multiple network call while fetching embedding from milvus server

* Add latest docstring and tutorial changes

* Ignoring mypy issue while converting vector_id to int

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-29 13:29:12 +01:00
Tanay Soni
46307d1571
Remove quotes around placeholders in Elasticsearch custom query (#762) 2021-01-25 12:46:43 +01:00
Tanay Soni
f0aa879a1c
Fix delete_all_documents for the SQLDocumentStore (#761) 2021-01-22 14:39:24 +01:00
Markus Paff
aee90c5df9
Docs v0.7.0 (#757)
* new docs version

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-22 10:28:33 +01:00
Markus Paff
0b583b8972
Generate docstrings and deploy to branches to Staging (Website) (#731)
* test pre commit hook

* test status

* test on this branch

* push generated docstrings and tutorials to branch

* fixed syntax error

* Add latest docstring and tutorial changes

* add files before commit

* catch commit error

* separate generation from deployment

* add deployment process for staging

* add current branch to payload

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-21 11:01:09 +01:00
Markus Paff
3af3ee1a12
Automate docstring and tutorial generation with every push to master (#718)
* automate docstring and tutorial generation with every push to master

* test CI for current branch

* fixed yaml syntax

* add setupttools to install process

* checkout repo

* fixed command for shell script

* install wheel as it is needed for CI

* install mkdocs

* test without shell script

* use package from github actions

* test other configuration

* back to right config

* cleaning script
2021-01-11 16:25:43 +01:00
Branden Chan
bb8aba18e0
Create Preprocessing Tutorial (#706)
* WIP: First version of preprocessing tutorial

* stride renamed overlap, ipynb and py files created

* rename split_stride in test

* Update preprocessor api documentation

* define order for markdown files

* define order of modules in api docs

* Add colab links

* Incorporate review feedback

Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-01-06 15:54:05 +01:00
Markus Paff
b752da1cd5
Add docs v0.6.0 (#689)
* new docs version

* updated directory structure

* Add pipelines page

* Add Finder deprecation suggestion

* header for pipelines file

* Document MySQL support

* Mention DPR train tutorial coming soon

* Mention open distro ES

* Update doc strings regarding similarity fn

* Add link to API docs

* Wrap pipelines docs in box

* add api reference for pipelines

* copied latest version to v0.6.0

* Remove space

* Remove space

* Copy to v0.6.0

Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-18 12:47:27 +01:00
Branden Chan
5e5dba9587
Add api md (#631) 2020-11-27 17:26:53 +01:00
Branden Chan
ae530c3a41
Fix docstring examples (#604)
* Fix docstring examples

* Unify code example format

* Add md files
2020-11-25 14:19:49 +01:00
Markus Paff
3dee284f20
cleaning the api docs (#616) 2020-11-24 18:49:14 +01:00
Branden Chan
99e924aede
Update Documentation for Haystack 0.5.0 (#557)
* Add languages and preprocessing pages

* add content

* address review comments

* make link relative

* update api ref with latest docstrings

* move doc readme and update

* add generator API docs

* fix example code

* design and link fix

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2020-11-06 10:53:22 +01:00
Markus Paff
40c5c8edb4
Added new formatting for examples in docstrings (#555) 2020-11-05 15:50:08 +01:00
Lalit Pagaria
63c12371b9
Change arg "model" to "model_name_or_path" in TransformersReader (#510)
* Consistent parameter naming for TransformersReader along with removing unused imports as well.

* Addressing review comments
2020-10-21 17:15:35 +02:00
Malte Pietsch
3434d5205d
Update doc string for ElasticsearchDocumentStore.write_documents() & sync markdown files (#501)
* update doc string for ElasticsearchDocumentStore.write_documents()

* update all markdowns with latest docstrings
2020-10-19 13:56:38 +02:00
Markus Paff
56852f820b
READ.me for Docstring Generation and remove not needed files (#468) 2020-10-06 15:16:56 +02:00
Markus Paff
66a1893f79
Moved files to api directory (#418) 2020-09-22 11:48:26 +02:00
Branden Chan
7fdb85d63a
Create documentation website (#272)
* Skeleton of doc website

* Flesh out documentation pages

* Split concepts into their own rst files

* add tutorial rsts

* Consistent level 1 markdown headers in tutorials

* Change theme to readthedocs

* Turn bullet points into prose

* Populate sections

* Add more text

* Add more sphinx files

* Add more retriever documentation

* combined all documenations in one structure

* rename of src to _src as it was ignored by git

* Incorporate MP2's changes

* add benchmark bar charts

* Adapt docstrings in Readers

* Improvements to intro, creation of glossary

* Adapt docstrings in Retrievers

* Adapt docstrings in Finder

* Adapt Docstrings of Finder

* Updates to text

* Edit text

* update doc strings

* proof read tutorials

* Edit text

* Edit text

* Add stacked chart

* populate graph with data

* Switch Documentation to markdown (#386)

* add way to generate markdown files to sphinx

* changed from rst to markdown and extended sphinx for it

* fix spelling

* Clean titles

* delete file

* change spelling

* add sections to document store usage

* add basic rest api docs

* fix readme in setup.py

* Update Tutorials

* Change section names

* add windows note to pip install

* update intro

* new renderer for markdown files

* Fix typos

* delete dpr_utils.py

* fix windows note in get started

* Fix docstrings

* deleted rest api docs in api

* fixed typo

* Fix docstring

* revert readme to rst

* Fix readme

* Update setup.py

Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-09-18 12:57:32 +02:00