2539 Commits

Author SHA1 Message Date
Markus Paff
b87daed62b
fixed link to dpr (#962) 2021-04-13 09:45:04 +02:00
Julian Risch
8333a13d6f
Adding tutorial on knowledge graphs to README 2021-04-12 15:26:02 +02:00
Markus Paff
dfb0282b74
Update milvus links and docstrings (#959)
* update milvus links and docstrings

* Add latest docstring and tutorial changes

* new milvus version

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-12 14:38:57 +02:00
oryx1729
406f7fa679
Disable Gunicorn preload option (#960) 2021-04-12 12:46:52 +02:00
Timo Moeller
837dea4e6d
Integrate sentence transformers into benchmarks (#843)
* Integrate sentence transformers into benchmarks

* Add doc store asserts

* switch data downloads from s3 client to https. add license info

* Fix mypy, revert config

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-09 17:24:16 +02:00
Julian Risch
d38c07e0ee
knowledge graph example (#934)
* Add knowledge graph module

* Fix type hint

* Add graph retriver module

* Change type annotations, change return format

* Add graph retriever that executes questions as sparql queries

* Linking only those entities that are in the knowledge graph

* Added logging and using relations extracted from Knowledge graph for linking

* Preventing entity linking from linking the same token to multiple entities

* Pruning triples that have no variables for select and count queries

* Support knowledge graphs with Pipelines

* Add text2sparql

* Entity linking and relation linking consider more special cases now based on evaluation on labelled data

* Separating example code from KGQA implementation

* Add eval on combined extarctive and kg questions

* Remove references to hp-test

* Add fields sparql_query and long_answer_list to metadata

* Removing modular Question2SPARQL approach

* Removing additional classes used for modular kgqa approach

* preparing lcquad data

* change graph db

* Translating namespaces in knowledge graph queries

* Creating graphdb index and loading triples from .ttl file

* Fetching graph config files, triples and model from S3

* Fix incompatibility issues with BaseGraphRetriever and BaseComponent

* Removing unused utility functions

* Adding doc strings and tutorial header

* Adding sparqlwrapper dependency

* Moving tutorial header

* Sorting tutorials by number within name of notebook

* Add latest docstring and tutorial changes

* Creating test cases for knowledge graph

* Changing knowledge graph example to harry potter

* Add latest docstring and tutorial changes

* Adapting the tutorial notebook to harry potter example

* Add GraphDB fixture for tests

* Add latest docstring and tutorial changes

* Added GraphDB docker launch to CI

* Use correct GraphDB fixture

* Check if GraphDB instance is already running

* Renaming question/query and incorporating other feedback from Timo and Tanay

* Removed type annotation

* Add latest docstring and tutorial changes

Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-08 14:05:33 +02:00
oryx1729
fc6368c191
Fix passing a list of values as param (#952) 2021-04-07 19:50:50 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
Julian Risch
64ad953c6a
Adding indentation to markup files (#947) 2021-04-07 11:36:11 +02:00
lewtun
8894c4fae9
Reduce precision in pipeline eval print functions (#943)
A proposal to reduce the precision shown in the `EvalRetriever.print` and `EvalReader.print` to 4 significant figures. If the user wants the full precision, they can access the class attributes directly.

Before
```
Retriever
-----------------
has_answer recall: 0.8739495798319328 (208/238)
no_answer recall:  1.00 (120/120) (no_answer samples are always treated as correctly retrieved)
recall: 0.9162011173184358 (328 / 358)
```

After
```
Retriever
-----------------
has_answer recall: 0.8739 (208/238)
no_answer recall:  1.00 (120/120) (no_answer samples are always treated as correctly retrieved)
recall: 0.9162 (328 / 358)
```
2021-04-06 05:11:29 +02:00
lewtun
41a1c8329d
Fix division by zero error in EvalRetriever (#938)
If the first query in the evaluation returns a document with `no_answer=True` we got a division by zero error because neither `self.has_answer_correct` or `self.has_answer_count` get incremented. This fix moves the `self.has_answer_recall` calculation within the if-else block.
2021-04-03 18:13:36 +02:00
Timo Moeller
5d2b16f3cc
Update farm version (#936)
* Update farm version

* Add new DPR loading, fix dpr param name

* Add QA model confidence as answer probability, fix prams in test
2021-04-01 18:23:05 +02:00
Branden Chan
d77152c469
WIP: Add evaluation nodes for Pipelines (#904)
* Add main eval fns

* WIP: make pipeline_eval.py run

* Fix typo

* Add support for no_answers

* Add latest docstring and tutorial changes

* Working pipeline eval

* Add timing of nodes

* Add latest docstring and tutorial changes

* Refactor and clean

* Update tutorial script

* Set default params

* Update tutorials

* Fix indent

* Add latest docstring and tutorial changes

* Address mypy issues

* Add test

* Fix mypy error

* Clear outputs

* Add doc strings

* Incorporate reviewer feedback

* Add latest docstring and tutorial changes

* Revert query counting

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-01 17:35:18 +02:00
lewtun
32050fdce3
Add Milvus to the retriever / document store table (#931) 2021-03-29 09:53:26 +02:00
Guillim
55b7a820d4
Fixing inconsistency (#926)
Fixing inconsistency between pipe and p in the doc
2021-03-26 18:55:02 +01:00
Timo Moeller
1244d16010
Better default value for mp chunksize (#923)
* Better default value for mp chunksize

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-25 19:00:45 +01:00
Peter Adorjan
cafa1230da
Warning instead of Exception in FAISS and Milvus filtering (#913) 2021-03-23 17:49:47 +01:00
Lalit Pagaria
e904deefa7
Add Markdown file convertor (#875) 2021-03-23 16:31:26 +01:00
Moshe Berchansky
47dc069afe
Fix for allocate memory exception by specifing max_processes (#910) 2021-03-19 18:11:25 +01:00
Timo Moeller
f954f0db38
Fix top_k param in RAG tutorials (#906)
* Fix top_k param

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-18 18:00:21 +01:00
Branden Chan
26093452a4 Add code of conduct 2021-03-18 16:39:16 +01:00
Timo Moeller
7b559fa4e8
Improve dpr conversion (#826)
* Bugfix dpr conversion

* Add latest docstring and tutorial changes

* Fix preprocessor changes
2021-03-18 14:51:01 +01:00
oryx1729
e9f0076dbd
Fix execution of Pipelines with parallel nodes (#901) 2021-03-18 12:41:30 +01:00
Branden Chan
24d0c4d42d
Fix DPR training batch size (#898)
* Adjust batch size

* Add latest docstring and tutorial changes

* Update training results

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-17 18:33:59 +01:00
Peter Demin
992277e812
Run Grammarly over README.md (#890)
* Run Grammarly over README.md

* Update README.md

Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>

* Update README.md

Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>

* Update README.md

Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>

* Update README.md

Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>

* Update README.md

Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>

* Update README.md

Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>

* Update README.md

Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>
2021-03-16 18:00:57 +03:00
Mohamed Sayed
9ec2406a05
Remove broken tf-idf youtube link (#888)
The youtube link is of a deleted video.
2021-03-11 14:23:05 +01:00
Malte Pietsch
91007c15dc
Add abstract run() method to Basecomponent (#887) 2021-03-11 12:47:10 +01:00
oryx1729
e0a118fd9a
Add support for parallel paths in Pipeline (#884) 2021-03-10 18:17:23 +01:00
oryx1729
6d00eff796
Add PDF converter in Dockerfiles (#877) 2021-03-08 09:55:11 +01:00
Malte Pietsch
81b83293c0
Update docker-compose.yml 2021-03-05 10:55:36 +01:00
Eric Lam
5484b8883b
Fix error when is_impossible not exist (#870) 2021-03-04 18:42:42 +01:00
oryx1729
f3fb9aacce
Fix validation for split_respect_sentence_boundary in Preprocessor (#869) 2021-03-04 15:09:08 +01:00
oryx1729
4b188b8102
Add runtime parameters to component initialization (#873) 2021-03-04 12:18:12 +01:00
Paul Klyvis
1b609114b8
Fix elasticsearch auth modes (#871)
Co-authored-by: Paulius Klyvis <paul@convious.com>
2021-03-02 16:24:31 +01:00
Eric Lam
db75498278
Fix error when is_impossible not is_impossible and json dump encoding error (#868)
* Fix error when is_impossible not is_impossible and json dump encoding in multilingual data

Fixing #867

* Fix file encoding, all file open with utf-8
2021-03-02 13:54:58 +01:00
Malte Pietsch
762f194b27
Fix boolean progress_bar for disabling tqdm progressbar (#863) 2021-02-26 10:49:31 +01:00
Branden Chan
325a4e4d14
Add Milvus Documentation (#838)
* First commit

* Add latest docstring and tutorial changes

* Add DocStore external setup info

* fixed tabs

* Add Milvus recommendation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2021-02-24 11:43:40 +01:00
venuraja79
e930d8a717
Annotation Tool: data is not persisted when using local version #853 (#855) 2021-02-21 15:35:45 +01:00
Tu NGUYEN
ba91a90dd6
Fix download ntlk preprocessor (#852) 2021-02-21 10:17:50 +01:00
Malte Pietsch
e641bff7a6
Allow more options for elasticsearch client (auth, multiple hosts) (#845)
* allow more options for elasticsearch client (auth, multiple hosts)

* Add latest docstring and tutorial changes

* fix mypy

* Add latest docstring and tutorial changes

* test client connection via ping()

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-19 14:29:59 +01:00
Divya Yeruva
6c3ec540a4
Add crawler to get texts from websites (#775)
* add fetch_data_from_url to extract data and store as files

* corrected a typo

* corrected variable name error

* correction of urlparse error

* type error

* added selenium, urllib to requirements

* removed urllib

* minor changes and added function to find out inpage navigation links

* quick duplicate links fix

* quick type annotation fix

* created seperate module for crawler

* type error fix

* type error fix

* import  fix

* quick type error fix

* addee return description

* updated include type to list

* refactor modules. Add Crawler class. rename params.

* add basic pipeline compatibility

* update docstrings

* fix mypy issues

* update args, docstrings, return filepaths

* fix mypy

* make urls optional in init

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-18 12:00:49 +01:00
Malte Pietsch
d700592c9a
Update GPU Dockerimage (Cuda 11, Fix faiss)(#836) 2021-02-17 12:40:00 +01:00
Malte Pietsch
abf2d63c92
Upgrade FAISS to 1.7.0 (#834) 2021-02-17 10:00:33 +01:00
Branden Chan
a6a3b74199
Fix image in README 2021-02-16 17:05:15 +01:00
Andrey A
e0be5639ef
Update README.md 2021-02-16 18:47:14 +03:00
Andrey A
ab89fac76a
Update README.md 2021-02-16 18:45:20 +03:00
Andrey A
5c9f7d493c
Fix link to Quick Demo in ToC. (#831) 2021-02-16 16:38:04 +01:00
Tanay Soni
07907f9eac
Add support for indexing pipelines (#816) 2021-02-16 16:24:28 +01:00
Branden Chan
7030c94325
Revamp Readme (#820)
* Text changes

* Add new images

* First improvements

* Next iteration

* Resize gif

* Add bold

* Update key concepts diagram

* Center image

* Initial import of a more detailed README.md

* Slight changes to ToC, requirements and across the text.

* Grammar and Streamlit UI png.

* Unfix size of gif for mobile

* Remove requirements, add formatting to numbered lists.

* Formatting, remove img size options.

* Another iteration of phrasing the note about open ports.

* Rephrase the note about the docker ports.

Co-authored-by: Andrey A <56412611+aantti@users.noreply.github.com>
2021-02-16 15:32:43 +01:00
Malte Pietsch
47aae14efa relax assert precision of arrays 2021-02-15 14:52:13 +01:00