53 Commits

Author SHA1 Message Date
Branden Chan
2c4baa7f4e
Regenerate API and Tutorial md files (#1480)
* Change punctuation

* Add latest docstring and tutorial changes

* Change punctuation

* Add documentation for Docs2Answer

* Add latest docstring and tutorial changes

* Generate new API docs

* Replace Finder with Pipeline

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 14:42:18 +02:00
Markus Paff
39845c0624
Automate updates docstrings tutorials (#1461)
* remove not needed githab actions and reactivate docstrings and tutorial generation

* test workflow

* update pydoc version

* update python version

* update watchdog

* move to latest version pydoc-markdown

* remove version check

* Add latest docstring and tutorial changes

* remove test workflow

* test for param docstrings

* pin pydoc-markdown version

* add test workflow

* pin watchdog version

* Add latest docstring and tutorial changes

* update original workflow and delete test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-17 13:44:31 +02:00
Ikram Ali
3fc7f3f695
[docs] crawler api docs updated. (#1388) 2021-09-01 12:07:32 +02:00
Branden Chan
10e332dabb
Fix Links (#1199)
* Fix link highlight

* Regen md files

* Remove duplicate

* Fix whitespace

* fixing strings for website

* Fix link

Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-06-23 19:07:54 +02:00
Markus Paff
6cd49105e7
update api markdown files and add markdown file for ranker (#1198)
* update api markdown files and add markdown file for ranker

* added docstrings for weaviate

* new version of pydoc-markdown does not render arguments correctly. We used pydoc-markdown==3.11.0
2021-06-15 17:50:08 +02:00
vblagoje
2a5882578a
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086)
* Integrate LFQA with Haystack

* Integrate LFQA with Haystack - unit tests

* Properly initialize conftest default value for vector_dim

* Update PR after inital feedback

* Fix conftest.py import

* Seq2SeqGenerator uses Callables instead of subclasses for custom model input

* Update docstring

* Fix Callable use

* Add LFQA tutorials

* Improve type error reporting for invalid input converter Callable

* Generate docstrings

* Format comments in tutorial script

* Generate tutorial md

* Add usage page

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-06-14 17:53:43 +02:00
Branden Chan
5f0f85989a
Refresh API docs (#1152) 2021-06-09 16:13:58 +02:00
Lalit Pagaria
f46b09c756
Using text hash as id to prevent document duplication (#1000)
* using text hash as id to prevent document duplication. Also providing a way customize it.

* Add latest docstring and tutorial changes

* Fixing duplicate value test when text is same

* Adding test for duplicate ids in document store

* Changing exception to generic Exception type

* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-17 17:51:52 +02:00
Branden Chan
869b493b61
Regen api docs (#1015) 2021-04-30 12:35:13 +02:00
Branden Chan
9626c0d65e
Update Documentation (#976)
* Add api pages

* Add latest docstring and tutorial changes

* First sweep of usage docs

* Add link to conversion script

* Add import statements

* Add summarization page

* Add web crawler documentation

* Add confidence scores usage

* Add crawler api docs

* Regenerate api docs

* Update summarizer and translator api

* Add api pages

* Add latest docstring and tutorial changes

* First sweep of usage docs

* Add link to conversion script

* Add import statements

* Add summarization page

* Add web crawler documentation

* Add confidence scores usage

* Add crawler api docs

* Regenerate api docs

* Update summarizer and translator api

* Add indentation (pydoc-markdown 3.10.1)

* Comment out metadata

* Remove Finder deprecation message

* Remove Finder in FAQ

* Update tutorial link

* Incorporate reviewer feedback

* Regen api docs

* Add type annotations

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 16:45:29 +02:00
Branden Chan
77d4c2ca1c
Benchmark milvus (#850)
* Add milvus benchmarking support

* Add latest docstring and tutorial changes

* Edit config

* Disable docker interactive mode

* Add milvus index type support

* Adjust FAISS and Milvus node branching

* Remove duplicate in config

* Revert method for speedup

* Add latest docstring and tutorial changes

* Add latest benchmark run

* Add latest docstring and tutorial changes

* Add json files

* Revert "Add latest docstring and tutorial changes"

This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923.

* Add latest docstring and tutorial changes

* Revert "Add latest docstring and tutorial changes"

This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b.

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-13 14:54:15 +02:00
Markus Paff
dfb0282b74
Update milvus links and docstrings (#959)
* update milvus links and docstrings

* Add latest docstring and tutorial changes

* new milvus version

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-12 14:38:57 +02:00
Timo Moeller
837dea4e6d
Integrate sentence transformers into benchmarks (#843)
* Integrate sentence transformers into benchmarks

* Add doc store asserts

* switch data downloads from s3 client to https. add license info

* Fix mypy, revert config

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-09 17:24:16 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
Julian Risch
64ad953c6a
Adding indentation to markup files (#947) 2021-04-07 11:36:11 +02:00
Timo Moeller
5d2b16f3cc
Update farm version (#936)
* Update farm version

* Add new DPR loading, fix dpr param name

* Add QA model confidence as answer probability, fix prams in test
2021-04-01 18:23:05 +02:00
Branden Chan
d77152c469
WIP: Add evaluation nodes for Pipelines (#904)
* Add main eval fns

* WIP: make pipeline_eval.py run

* Fix typo

* Add support for no_answers

* Add latest docstring and tutorial changes

* Working pipeline eval

* Add timing of nodes

* Add latest docstring and tutorial changes

* Refactor and clean

* Update tutorial script

* Set default params

* Update tutorials

* Fix indent

* Add latest docstring and tutorial changes

* Address mypy issues

* Add test

* Fix mypy error

* Clear outputs

* Add doc strings

* Incorporate reviewer feedback

* Add latest docstring and tutorial changes

* Revert query counting

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-01 17:35:18 +02:00
Timo Moeller
1244d16010
Better default value for mp chunksize (#923)
* Better default value for mp chunksize

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-25 19:00:45 +01:00
Lalit Pagaria
e904deefa7
Add Markdown file convertor (#875) 2021-03-23 16:31:26 +01:00
Timo Moeller
7b559fa4e8
Improve dpr conversion (#826)
* Bugfix dpr conversion

* Add latest docstring and tutorial changes

* Fix preprocessor changes
2021-03-18 14:51:01 +01:00
oryx1729
e9f0076dbd
Fix execution of Pipelines with parallel nodes (#901) 2021-03-18 12:41:30 +01:00
oryx1729
4b188b8102
Add runtime parameters to component initialization (#873) 2021-03-04 12:18:12 +01:00
Branden Chan
325a4e4d14
Add Milvus Documentation (#838)
* First commit

* Add latest docstring and tutorial changes

* Add DocStore external setup info

* fixed tabs

* Add Milvus recommendation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2021-02-24 11:43:40 +01:00
Malte Pietsch
e641bff7a6
Allow more options for elasticsearch client (auth, multiple hosts) (#845)
* allow more options for elasticsearch client (auth, multiple hosts)

* Add latest docstring and tutorial changes

* fix mypy

* Add latest docstring and tutorial changes

* test client connection via ping()

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-19 14:29:59 +01:00
Tanay Soni
07907f9eac
Add support for indexing pipelines (#816) 2021-02-16 16:24:28 +01:00
Lalit Pagaria
5bd94ac5f7
Adding Translator (standalone component & wrapper for pipelines) (#782)
* Adding translator with many generic input parameter support

* Making dict_key as generic

* Fixing mypy issue

* Adding pipeline and using opus models

* Add latest docstring and tutorial changes

* Adding test cases for end-to-end translation for generator, summerizer etc

* raise error join and merge nodes

* Fix test failure

* add docstrings. add usage documentation. rm skip_special_tokens param

* Add latest docstring and tutorial changes

* fix code snippets in md

* Adding few extra configuration parameters and fixing tests

* Fixingmypy issue and updating usage document

* fix for mypy issue in pipeline.py

* reverting renaming of pytest_collection_modifyitems method

* Addressing review comments

* setting skip_special_tokens to True

* removing model_max_length argument as None type is not supported to many models

* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 15:58:26 +01:00
Pavel Soriano
8adf5b4737
Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg (#811)
* added parameter to infer DPR tokenizers class

* Add latest docstring and tutorial changes

* Update docstring. fix mypy

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 14:17:55 +01:00
Tanay Soni
fd5c5dd23c
Introduce incremental updates for embeddings in document stores (#812) 2021-02-09 21:25:01 +01:00
Malte Pietsch
ac9f92466f
Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions (#813)
* fix encoding of pdftotext. fix version in download instructions

* fix test

* Add latest docstring and tutorial changes

* make latin-1 default encoding again

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 13:42:43 +01:00
Tanay Soni
7b18e324f2
Fix building Pipeline with YAML (#800) 2021-02-04 11:53:51 +01:00
Tanay Soni
8a5dc8f826
Load Pipeline with YAML config file (#785) 2021-02-02 17:32:17 +01:00
Malte Pietsch
1318b55eec
Make tqdm progress bars optional (less verbose prod logs) (#796)
* make dpr queries less verbose

* add progress bar flag to more components

* Add latest docstring and tutorial changes

* add type

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 20:51:55 +01:00
Tanay Soni
b87dd244c1
Get metadata values for a key from Elasticsearch (#776) 2021-02-01 16:13:26 +01:00
Tanay Soni
d62355ca88
Fix mypy typing (#792) 2021-02-01 12:15:36 +01:00
Branden Chan
1dc74c7067
Add model versioning support (#784)
* Add model versioning support

* Add latest docstring and tutorial changes

* Support DPR versioning

* Add RAG versioning support

* Add latest docstring and tutorial changes

* Add summarizer support

* Add Embedding Retriever support

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-01 11:42:36 +01:00
Lalit Pagaria
9f7f95221f
Milvus integration (#771)
* Initial commit for Milvus integration

* Add latest docstring and tutorial changes

* Updating implementation of Milvus document store

* Add latest docstring and tutorial changes

* Adding tests and updating doc string

* Add latest docstring and tutorial changes

* Fixing issue caught by tests

* Addressing review comments

* Fixing mypy detected issue

* Fixing issue caught in test about sorting of vector ids

* fixing test

* Fixing generator test failure

* update docstrings

* Addressing review comments about multiple network call while fetching embedding from milvus server

* Add latest docstring and tutorial changes

* Ignoring mypy issue while converting vector_id to int

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-29 13:29:12 +01:00
Tanay Soni
46307d1571
Remove quotes around placeholders in Elasticsearch custom query (#762) 2021-01-25 12:46:43 +01:00
Tanay Soni
f0aa879a1c
Fix delete_all_documents for the SQLDocumentStore (#761) 2021-01-22 14:39:24 +01:00
Markus Paff
aee90c5df9
Docs v0.7.0 (#757)
* new docs version

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-22 10:28:33 +01:00
Markus Paff
0b583b8972
Generate docstrings and deploy to branches to Staging (Website) (#731)
* test pre commit hook

* test status

* test on this branch

* push generated docstrings and tutorials to branch

* fixed syntax error

* Add latest docstring and tutorial changes

* add files before commit

* catch commit error

* separate generation from deployment

* add deployment process for staging

* add current branch to payload

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-21 11:01:09 +01:00
Markus Paff
3af3ee1a12
Automate docstring and tutorial generation with every push to master (#718)
* automate docstring and tutorial generation with every push to master

* test CI for current branch

* fixed yaml syntax

* add setupttools to install process

* checkout repo

* fixed command for shell script

* install wheel as it is needed for CI

* install mkdocs

* test without shell script

* use package from github actions

* test other configuration

* back to right config

* cleaning script
2021-01-11 16:25:43 +01:00
Branden Chan
bb8aba18e0
Create Preprocessing Tutorial (#706)
* WIP: First version of preprocessing tutorial

* stride renamed overlap, ipynb and py files created

* rename split_stride in test

* Update preprocessor api documentation

* define order for markdown files

* define order of modules in api docs

* Add colab links

* Incorporate review feedback

Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-01-06 15:54:05 +01:00
Markus Paff
b752da1cd5
Add docs v0.6.0 (#689)
* new docs version

* updated directory structure

* Add pipelines page

* Add Finder deprecation suggestion

* header for pipelines file

* Document MySQL support

* Mention DPR train tutorial coming soon

* Mention open distro ES

* Update doc strings regarding similarity fn

* Add link to API docs

* Wrap pipelines docs in box

* add api reference for pipelines

* copied latest version to v0.6.0

* Remove space

* Remove space

* Copy to v0.6.0

Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-18 12:47:27 +01:00
Branden Chan
5e5dba9587
Add api md (#631) 2020-11-27 17:26:53 +01:00
Branden Chan
ae530c3a41
Fix docstring examples (#604)
* Fix docstring examples

* Unify code example format

* Add md files
2020-11-25 14:19:49 +01:00
Markus Paff
3dee284f20
cleaning the api docs (#616) 2020-11-24 18:49:14 +01:00
Branden Chan
99e924aede
Update Documentation for Haystack 0.5.0 (#557)
* Add languages and preprocessing pages

* add content

* address review comments

* make link relative

* update api ref with latest docstrings

* move doc readme and update

* add generator API docs

* fix example code

* design and link fix

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2020-11-06 10:53:22 +01:00
Markus Paff
40c5c8edb4
Added new formatting for examples in docstrings (#555) 2020-11-05 15:50:08 +01:00
Lalit Pagaria
63c12371b9
Change arg "model" to "model_name_or_path" in TransformersReader (#510)
* Consistent parameter naming for TransformersReader along with removing unused imports as well.

* Addressing review comments
2020-10-21 17:15:35 +02:00
Malte Pietsch
3434d5205d
Update doc string for ElasticsearchDocumentStore.write_documents() & sync markdown files (#501)
* update doc string for ElasticsearchDocumentStore.write_documents()

* update all markdowns with latest docstrings
2020-10-19 13:56:38 +02:00