267 Commits

Author SHA1 Message Date
Branden Chan
10e332dabb
Fix Links (#1199)
* Fix link highlight

* Regen md files

* Remove duplicate

* Fix whitespace

* fixing strings for website

* Fix link

Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-06-23 19:07:54 +02:00
Markus Paff
6cd49105e7
update api markdown files and add markdown file for ranker (#1198)
* update api markdown files and add markdown file for ranker

* added docstrings for weaviate

* new version of pydoc-markdown does not render arguments correctly. We used pydoc-markdown==3.11.0
2021-06-15 17:50:08 +02:00
Branden Chan
7dbd58f6be
Add about sections (#1195) 2021-06-14 18:37:00 +02:00
vblagoje
2a5882578a
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086)
* Integrate LFQA with Haystack

* Integrate LFQA with Haystack - unit tests

* Properly initialize conftest default value for vector_dim

* Update PR after inital feedback

* Fix conftest.py import

* Seq2SeqGenerator uses Callables instead of subclasses for custom model input

* Update docstring

* Fix Callable use

* Add LFQA tutorials

* Improve type error reporting for invalid input converter Callable

* Generate docstrings

* Format comments in tutorial script

* Generate tutorial md

* Add usage page

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-06-14 17:53:43 +02:00
Bob van Luijt
f583d0bfaf
Minor change with a link to the Weaviate docs (#1180)
Super minor change, but in line with other DocumentStore's
2021-06-11 21:20:23 +02:00
Branden Chan
e7937ac5d7
Reformat FAQ page (#1177)
* Add faq page

* Update faq.md

* Fix mypy CI

* Add question

* Reformat faq
2021-06-11 11:59:52 +02:00
Branden Chan
783893c3d2
Tutorial update (#1166)
* Add header / footer

* Add Milvus example

* Generate md files

* Fix mypy CI
2021-06-11 11:09:15 +02:00
Branden Chan
13edff109d
Documentation update (#1162)
* Add content

* Add German BERT references

* Mention preprocessor language

* Fix mypy CI

* Add document length recommendation

* Add more languages
2021-06-11 11:06:57 +02:00
Branden Chan
41b537affe
Add FAQ page (#1151)
* Add faq page

* Update faq.md

* Fix mypy CI

* Add question
2021-06-10 17:29:14 +02:00
venuraja79
49886f88f0
Integrate Weaviate as another DocumentStore (#1064)
* Annotation Tool: data is not persisted when using local version #853

* First version of weaviate

* First version of weaviate

* First version of weaviate

* Updated comments

* Updated comments

* ran query, get and write tests

* update embeddings, dynamic schema and filters implemented

* Initial set of tests and fixes

* Tests added for update_embeddings and delete documents

* introduced duplicate documents fix

* fixed mypy errors

* Added Weaviate to requirements

* Fix the weaviate docker env variables

* Fixing test dependencies for now

* Created weaviate test marker and fixed query

* Update docstring

* Add documentation

* Bump up weaviate version

* Bump up weaviate version in documentation

* Bump up weaviate version in documentation

* Updgrade weaviate version

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-10 09:43:53 +02:00
Branden Chan
5f0f85989a
Refresh API docs (#1152) 2021-06-09 16:13:58 +02:00
Julian Risch
580e28344d
Add docu of confidence scores and calibration method (#1131)
* Add docu of confidence scores and calibration method
2021-06-03 15:49:07 +02:00
Julian Risch
8e3d0d1287
Distinguish labels for calculating similarity scores (#1124)
* Distinguish labels for calculating similarity scores

* Explain label "0" and "1" of TextPairClassifier in Ranker
2021-06-02 17:33:36 +02:00
Branden Chan
b555bc525c
Remove duplicate run (#1132) 2021-06-02 13:58:55 +02:00
Branden Chan
9356f637d4
Update Milvus benchmarks (#1128)
* Update Milvus benchmarks

* Add sentence transformers

* Update sentence transformers index results

* Remove duplicate row
2021-06-02 13:09:45 +02:00
Julian Risch
84c34295a1
Re-ranking component for document search without QA (#1025)
* Adding ranker similar to retriever and reader

* Sort documents according to query-document similarity scores

* Reranking and model training runs for small example

* Added EvalRanker node

* Calculate recall@k in EvalRetriever and EvalRanker nodes

* Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers

* Added mean reciprocal rank as metric for EvalDocuments

* Fix bug that appeared when ranking documents with same score

* Remove commented code for unimplmented eval() of Ranker node

* Add documentation of k parameter in EvalDocuments

* Add Ranker docu and renaming top_k param
2021-05-31 15:31:36 +02:00
Avishekh Shrestha
c4ee32d47d
Fix typo in preprocessing.md(#1087)
Correct variable name from 'd' to 'doc' in line 134.
2021-05-23 19:16:58 +02:00
Lalit Pagaria
f46b09c756
Using text hash as id to prevent document duplication (#1000)
* using text hash as id to prevent document duplication. Also providing a way customize it.

* Add latest docstring and tutorial changes

* Fixing duplicate value test when text is same

* Adding test for duplicate ids in document store

* Changing exception to generic Exception type

* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-17 17:51:52 +02:00
brandenchan
5b0b3e4616 Merge branch 'master' of https://github.com/deepset-ai/haystack 2021-04-30 16:41:05 +02:00
brandenchan
4cc853d1c3 Update link 2021-04-30 15:06:45 +02:00
Branden Chan
869b493b61
Regen api docs (#1015) 2021-04-30 12:35:13 +02:00
Mario Jäckle
a00703256f
docs(document_store): add usage information for aws elastic search (#1008)
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
2021-04-30 11:38:25 +02:00
Branden Chan
056be3354b
Add pipelines tutorial (#1013) 2021-04-29 18:19:20 +02:00
Julian Risch
65f1da00cc
knowledge graph documentation (#979)
* Create knowledge_graph.md

* add doc strings to Text2SparqlRetriever

* Add doc strings to GraphDBKnowledgeGraph

* Make method calls unambiguous so its clear which class is meant
2021-04-27 16:44:40 +02:00
Markus Paff
cf8a622e35
Streamlit UI Evaluation mode (#920)
* first running version of eval mode

* restructuring, new naming of elements and testing

* add new files to Docker, how to start with Haystack reference, remove not needed dependencies

* Add latest docstring and tutorial changes

* merged changes

* fixing bugs after breaking changes from last release

* newser version of states in streamlit, more docs for eval mode, eval file as env virable

* eval file as env variable

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 17:30:17 +02:00
Branden Chan
9626c0d65e
Update Documentation (#976)
* Add api pages

* Add latest docstring and tutorial changes

* First sweep of usage docs

* Add link to conversion script

* Add import statements

* Add summarization page

* Add web crawler documentation

* Add confidence scores usage

* Add crawler api docs

* Regenerate api docs

* Update summarizer and translator api

* Add api pages

* Add latest docstring and tutorial changes

* First sweep of usage docs

* Add link to conversion script

* Add import statements

* Add summarization page

* Add web crawler documentation

* Add confidence scores usage

* Add crawler api docs

* Regenerate api docs

* Update summarizer and translator api

* Add indentation (pydoc-markdown 3.10.1)

* Comment out metadata

* Remove Finder deprecation message

* Remove Finder in FAQ

* Update tutorial link

* Incorporate reviewer feedback

* Regen api docs

* Add type annotations

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 16:45:29 +02:00
Branden Chan
77d4c2ca1c
Benchmark milvus (#850)
* Add milvus benchmarking support

* Add latest docstring and tutorial changes

* Edit config

* Disable docker interactive mode

* Add milvus index type support

* Adjust FAISS and Milvus node branching

* Remove duplicate in config

* Revert method for speedup

* Add latest docstring and tutorial changes

* Add latest benchmark run

* Add latest docstring and tutorial changes

* Add json files

* Revert "Add latest docstring and tutorial changes"

This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923.

* Add latest docstring and tutorial changes

* Revert "Add latest docstring and tutorial changes"

This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b.

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-13 14:54:15 +02:00
Markus Paff
b87daed62b
fixed link to dpr (#962) 2021-04-13 09:45:04 +02:00
Markus Paff
dfb0282b74
Update milvus links and docstrings (#959)
* update milvus links and docstrings

* Add latest docstring and tutorial changes

* new milvus version

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-12 14:38:57 +02:00
Timo Moeller
837dea4e6d
Integrate sentence transformers into benchmarks (#843)
* Integrate sentence transformers into benchmarks

* Add doc store asserts

* switch data downloads from s3 client to https. add license info

* Fix mypy, revert config

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-09 17:24:16 +02:00
Julian Risch
d38c07e0ee
knowledge graph example (#934)
* Add knowledge graph module

* Fix type hint

* Add graph retriver module

* Change type annotations, change return format

* Add graph retriever that executes questions as sparql queries

* Linking only those entities that are in the knowledge graph

* Added logging and using relations extracted from Knowledge graph for linking

* Preventing entity linking from linking the same token to multiple entities

* Pruning triples that have no variables for select and count queries

* Support knowledge graphs with Pipelines

* Add text2sparql

* Entity linking and relation linking consider more special cases now based on evaluation on labelled data

* Separating example code from KGQA implementation

* Add eval on combined extarctive and kg questions

* Remove references to hp-test

* Add fields sparql_query and long_answer_list to metadata

* Removing modular Question2SPARQL approach

* Removing additional classes used for modular kgqa approach

* preparing lcquad data

* change graph db

* Translating namespaces in knowledge graph queries

* Creating graphdb index and loading triples from .ttl file

* Fetching graph config files, triples and model from S3

* Fix incompatibility issues with BaseGraphRetriever and BaseComponent

* Removing unused utility functions

* Adding doc strings and tutorial header

* Adding sparqlwrapper dependency

* Moving tutorial header

* Sorting tutorials by number within name of notebook

* Add latest docstring and tutorial changes

* Creating test cases for knowledge graph

* Changing knowledge graph example to harry potter

* Add latest docstring and tutorial changes

* Adapting the tutorial notebook to harry potter example

* Add GraphDB fixture for tests

* Add latest docstring and tutorial changes

* Added GraphDB docker launch to CI

* Use correct GraphDB fixture

* Check if GraphDB instance is already running

* Renaming question/query and incorporating other feedback from Timo and Tanay

* Removed type annotation

* Add latest docstring and tutorial changes

Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-08 14:05:33 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
Julian Risch
64ad953c6a
Adding indentation to markup files (#947) 2021-04-07 11:36:11 +02:00
Timo Moeller
5d2b16f3cc
Update farm version (#936)
* Update farm version

* Add new DPR loading, fix dpr param name

* Add QA model confidence as answer probability, fix prams in test
2021-04-01 18:23:05 +02:00
Branden Chan
d77152c469
WIP: Add evaluation nodes for Pipelines (#904)
* Add main eval fns

* WIP: make pipeline_eval.py run

* Fix typo

* Add support for no_answers

* Add latest docstring and tutorial changes

* Working pipeline eval

* Add timing of nodes

* Add latest docstring and tutorial changes

* Refactor and clean

* Update tutorial script

* Set default params

* Update tutorials

* Fix indent

* Add latest docstring and tutorial changes

* Address mypy issues

* Add test

* Fix mypy error

* Clear outputs

* Add doc strings

* Incorporate reviewer feedback

* Add latest docstring and tutorial changes

* Revert query counting

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-01 17:35:18 +02:00
lewtun
32050fdce3
Add Milvus to the retriever / document store table (#931) 2021-03-29 09:53:26 +02:00
Timo Moeller
1244d16010
Better default value for mp chunksize (#923)
* Better default value for mp chunksize

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-25 19:00:45 +01:00
Lalit Pagaria
e904deefa7
Add Markdown file convertor (#875) 2021-03-23 16:31:26 +01:00
Timo Moeller
f954f0db38
Fix top_k param in RAG tutorials (#906)
* Fix top_k param

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-18 18:00:21 +01:00
Timo Moeller
7b559fa4e8
Improve dpr conversion (#826)
* Bugfix dpr conversion

* Add latest docstring and tutorial changes

* Fix preprocessor changes
2021-03-18 14:51:01 +01:00
oryx1729
e9f0076dbd
Fix execution of Pipelines with parallel nodes (#901) 2021-03-18 12:41:30 +01:00
Branden Chan
24d0c4d42d
Fix DPR training batch size (#898)
* Adjust batch size

* Add latest docstring and tutorial changes

* Update training results

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-17 18:33:59 +01:00
Mohamed Sayed
9ec2406a05
Remove broken tf-idf youtube link (#888)
The youtube link is of a deleted video.
2021-03-11 14:23:05 +01:00
oryx1729
4b188b8102
Add runtime parameters to component initialization (#873) 2021-03-04 12:18:12 +01:00
Branden Chan
325a4e4d14
Add Milvus Documentation (#838)
* First commit

* Add latest docstring and tutorial changes

* Add DocStore external setup info

* fixed tabs

* Add Milvus recommendation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2021-02-24 11:43:40 +01:00
Malte Pietsch
e641bff7a6
Allow more options for elasticsearch client (auth, multiple hosts) (#845)
* allow more options for elasticsearch client (auth, multiple hosts)

* Add latest docstring and tutorial changes

* fix mypy

* Add latest docstring and tutorial changes

* test client connection via ping()

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-19 14:29:59 +01:00
Tanay Soni
07907f9eac
Add support for indexing pipelines (#816) 2021-02-16 16:24:28 +01:00
Branden Chan
7030c94325
Revamp Readme (#820)
* Text changes

* Add new images

* First improvements

* Next iteration

* Resize gif

* Add bold

* Update key concepts diagram

* Center image

* Initial import of a more detailed README.md

* Slight changes to ToC, requirements and across the text.

* Grammar and Streamlit UI png.

* Unfix size of gif for mobile

* Remove requirements, add formatting to numbered lists.

* Formatting, remove img size options.

* Another iteration of phrasing the note about open ports.

* Rephrase the note about the docker ports.

Co-authored-by: Andrey A <56412611+aantti@users.noreply.github.com>
2021-02-16 15:32:43 +01:00
brandenchan
fe47e3a45e Fix link in documentation 2021-02-15 11:15:54 +01:00
Lalit Pagaria
5bd94ac5f7
Adding Translator (standalone component & wrapper for pipelines) (#782)
* Adding translator with many generic input parameter support

* Making dict_key as generic

* Fixing mypy issue

* Adding pipeline and using opus models

* Add latest docstring and tutorial changes

* Adding test cases for end-to-end translation for generator, summerizer etc

* raise error join and merge nodes

* Fix test failure

* add docstrings. add usage documentation. rm skip_special_tokens param

* Add latest docstring and tutorial changes

* fix code snippets in md

* Adding few extra configuration parameters and fixing tests

* Fixingmypy issue and updating usage document

* fix for mypy issue in pipeline.py

* reverting renaming of pytest_collection_modifyitems method

* Addressing review comments

* setting skip_special_tokens to True

* removing model_max_length argument as None type is not supported to many models

* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-12 15:58:26 +01:00