1077 Commits

Author SHA1 Message Date
bogdankostic
c85ac2baec
Update Ray to version 1.9.1 (#1934) 2022-01-03 16:59:58 +01:00
bogdankostic
45df18c416
Add RCIReader for TableQA (#1909)
* Add RCIReader

* Add latest docstring and tutorial changes

* Add Doc Strings

* Add latest docstring and tutorial changes

* Add Tests

* Add Doc Strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 16:59:24 +01:00
Kristof Herrmann
6e8e3c68d9
Custom id hashing on documentstore level (#1910)
* adding dynamic id hashing

* Add latest docstring and tutorial changes

* added pr review

* Add latest docstring and tutorial changes

* fixed tests

* fix mypy error

* fix mypy issue

* ignore typing

* fixed correct check

* fixed tests

* try fixing the tests

* set id hash keys only if not none

* dont store id_hash_keys

* fix tests

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 16:58:19 +01:00
Julian Risch
a846be99d1
Extend TranslationWrapper to work with QA Generation (#1905)
* draft translationwrapper example

* draft translation of generated qa pairs

* Add latest docstring and tutorial changes

* fixed pass by reference by deepcopy

* delete adapted tutorial 13 (test purposes only)

* adapt method signature and doc string

* Add latest docstring and tutorial changes

* add type ignore

* extend tutorial 13 with TranslationWrapper example

* Add latest docstring and tutorial changes

* removed duplicate code

* indent if statement

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-03 13:30:24 +01:00
tstadel
a94c274134
Support custom headers per request in pipeline (#1861)
* chain headers param down to document_stores

* Add latest docstring and tutorial changes

* fix InMemoryDocumentStore params

* Add latest docstring and tutorial changes

* fix TfidfRetriever params

* Add latest docstring and tutorial changes

* fix missing headers

* Add latest docstring and tutorial changes

* fix sparql client and update docs

* Add latest docstring and tutorial changes

* test for documentstores

* pipeline tests added

* update header param in docstrings

* Add latest docstring and tutorial changes

* refactoring: headers as implicit param

* Add latest docstring and tutorial changes

* remove unnecessary imports

* propagade batch_size correctly

* Add latest docstring and tutorial changes

* revert InMemoryDocumentStore.write_documents signature

* Add latest docstring and tutorial changes

* remove #type: ignore

* Add latest docstring and tutorial changes

* replace MutableMapping by Dict

* Add latest docstring and tutorial changes

* improve docstrings

* Add latest docstring and tutorial changes

* get rid of **kwargs

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 11:38:02 +01:00
el2e10
377c20b8b1
Fix grammatical issue in optimization guides (#1941) 2022-01-03 11:06:13 +01:00
Alon Eirew
a1fb70bbbd
Make ctx_segment_ids a list instead of np.zeros_like
* fix #1687

* fix - UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow..

* fix RuntimeError: received 0 items of ancdata

* Remove set_sharing_strategy from this branch and replace numpy.zeros_like with python numpy
2022-01-03 08:33:55 +01:00
bogdankostic
39573cf0a9
Add ParsrConverter (#1931)
* Add ParsrConverter

* Fix typing error + add Parsr to Linux CI

* Fix valid_language for all converters + fix context generation for ParsrConverter

* Remove ParsrConverter test from WindowsCI

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-30 10:15:11 +01:00
Markus Paff
04f3b39ad5
Text for contributor license agreement (#1766)
* text for contributor license agreement

* formatting

* Add details about process

* test
2021-12-28 14:01:20 +01:00
MichelBartels
f33c2b987a
Adding distillation loss functions from TinyBERT (#1879)
* initial tinybertdistill commit

* add tinybert distill loss

* remove teacher caching for tinybert

* add tinybert to distil_from method

* Add latest docstring and tutorial changes

* add dim mapping and fix type hints

* fix type hints

* fix dummy input

* fix dim mapping for tinybert loss and add comments/doc strings

* add test for tinybert loss

* Add latest docstring and tutorial changes

* add comment

* fix BERT forward parameters

* add doc string to AdaptiveModel forward method

* remove unnecessary data silo

* fix farm import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-23 14:54:02 +01:00
tstadel
fc8df2163d
Fix Windows CI OOM (#1878)
* set fixture scope to "function"

* run FARMReader without multiprocessing

* dispose off ray after tests

* run most expensive tasks first in test files

* run expensive tests first

* run garbage collector between tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-22 17:20:23 +01:00
tstadel
7bdb782871
Raise exception if Elasticsearch search_fields have wrong datatype (#1913) 2021-12-20 16:10:55 +01:00
Dmitry Goryunov
42a0fc3860
Include ray version compatible with M1 processor (#1906) 2021-12-20 10:16:59 +01:00
Johnny-KP
51e84b805b
Changed export to csv method to new answer format (#1907) 2021-12-17 16:10:29 +01:00
bogdankostic
74c80e0c71
Set mypy version to 0.910 (#1899) 2021-12-16 14:02:04 +01:00
javier ramírez
5c7f3c234e
Fix minor typo in readme (#1900)
I just added a missing "r" to the word "contributions" at the "Overview and Usage" section
2021-12-16 13:31:27 +01:00
bogdankostic
4edec04c2c
Add improvements to AzureConverter (#1896)
* Add some improvements to AzureConverter

* Adapt docstring + use Path instead of str

* Fix mypy version to 0.910
2021-12-16 12:45:24 +01:00
Alberto Villa
e4aec4661d
Improved version of print_answers (#1891)
* Improved version of print_answers

* Changed the way max_text_len is checked
2021-12-15 17:16:33 +01:00
Alberto Villa
1bb6244a63
Exchanged minimal with minimum in print_answers function call (#1890) 2021-12-14 15:27:37 +01:00
Alberto Villa
2396f0cd3a
Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881) 2021-12-14 10:50:25 +01:00
tstadel
57a04631df
introduce node_input param (#1854)
* introduce node_input param

* Add latest docstring and tutorial changes

* prediction and label as node_input values

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-14 10:34:35 +01:00
Ivan Lopez
86f5688f47
fix wrong branch and repo, add cloudwatch agent (#1877) 2021-12-13 20:32:25 +01:00
Sara Zan
de71b944d7
Fix typo in the Windows CI UI deps (#1876)
* Fix typo in the WindowsCI UI deps

* Force a deps cache miss
2021-12-13 15:49:44 +01:00
Malte Pietsch
7084a24794
Bump version to 1.0 in REST api (#1875) 2021-12-13 12:39:59 +01:00
Julian Risch
2c184e467f
Upgrade transformers to 4.13.0 (#1659)
* upgrade to pytorch 1.10 and transformers 4.11.3

* pin torch to 1.9.1

* Upgrade transformers and torch to 4.12.2 and 1.10.0

* Test transformers 4.10.2

* Pin transformers to 4.10.2

* transformers 4.10.3

* transformers 4.11.0

* transformers 4.11.1

* transformers 4.11.2

* check fix on current transformer's master branch

* Install transformers from commit id

* update transformers to 4.12.5

* Upgrade torch version for torch-scatter

* Upgrade torch version for torch-scatter in Windows CI

* Build new cache

* Undo last commit

* Use transformers v4.11.2

* bump transformers to 4.12.5

* bump transformers to 4.13.0

* re-allow range of torch versions

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2021-12-11 12:08:16 +01:00
Fabrice Depaulis
77d52ad215
Rely api healthcheck on status code rather than json decoding (#1871)
* Rely api healthcheck on status code rather than json decoding

* Install UI dependencies on the Linux and Windows CI

Co-authored-by: Fabrice Depaulis <fabrice.depaulis@orange.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2021-12-10 18:05:23 +01:00
Andreas Motl
4eb4503f25
Fix typo (#1869) 2021-12-10 09:39:45 +01:00
Branden Chan
ea5aab23ec
Update pydoc-markdown-file-classifier.yml (#1856)
* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Prevent wrapping DataParallel in second DataParallel (#1855)

* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Create v1.0 docs (#1862)

* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Rebase and apply change to v1.0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2021-12-08 18:19:03 +01:00
Branden Chan
ef1e531895
Create v1.0 docs (#1862) 2021-12-08 17:53:00 +01:00
bogdankostic
cbfe2b4626
Prevent wrapping DataParallel in second DataParallel (#1855)
* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-08 09:56:45 +01:00
Malte Pietsch
8cb513c2c6
Bump version to 1.0.0 v1.0.0 2021-12-07 15:13:24 +01:00
Sara Zan
983b20f28d
Demo UI fix debug info (#1846)
* Fix debug info

* Make enter to run work better

* Reintroduce default question in the eval dataset

* Outputting valid json instead of a Python dict
2021-12-06 18:55:39 +01:00
KUNPENG GUO
160f81aaa3
Fix bug ranker: wrong lambda function (#1824)
* Fix bug ranker: wrong lambda function

The zip function used in line 110 intends to choose the logits array to be the key for the lambda function while it should be the first/second logit of the logit array which corresponds to the classification label (has_answer)

* Use label 1 as has_answer label

* generic ranker (add if-cond for logits vector shape)

* remove test code

* remove test code...

* add two_logits test case for ranker module.

* complete the documentation of ranker, support rankers with 1 or 2 logits as output
2021-12-06 17:13:57 +01:00
Sara Zan
8b7b51f0f5
Typo spotted in one question. Removed question that returned wrong answer. Added a couple more that work. (#1843) 2021-12-06 15:44:08 +01:00
Julian Risch
aa1520212f
workaround torch bug with non-continguous tensors (#1845) 2021-12-06 15:10:51 +01:00
Ivan Lopez
4f6dc36869
Deploy demo (#1837)
* Add GH Actions workflow for demo deployment

* update demo ec2 instance type

* remove redundant docker-compose build

* add custom demo command and env vars

* deploy demo on updates to workflow resources
2021-12-03 15:58:47 +01:00
Branden Chan
bec14b63c3
Add live demo link to readme (#1839) 2021-12-03 14:34:19 +01:00
Malte Pietsch
90ced1b246
Update release.yml 2021-12-03 13:23:55 +01:00
Malte Pietsch
e5599bd337
Extend categories for release notes (#1841) 2021-12-03 13:19:45 +01:00
Malte Pietsch
4e76129004
Add config for github release notes (#1840) 2021-12-03 12:27:58 +01:00
Julian Risch
54f776350c
Update evaluation tutorial to cover the new pipeline.eval() (#1765)
* Replace old tutorial 5 with new code based on test cases

* Add latest docstring and tutorial changes

* Use pipeline.eval() in tutorial

* Add latest docstring and tutorial changes

* Restructure notebook

* Add latest docstring and tutorial changes

* Add dataframe example

* Add latest docstring and tutorial changes

* Get eval data from doc store

* Add latest docstring and tutorial changes

* Load data from doc store

* Add latest docstring and tutorial changes

* Clear outputs

* Add latest docstring and tutorial changes

* Change example and add python script

* Add latest docstring and tutorial changes

* Fetch aggregated multilabels from doc store

* Add latest docstring and tutorial changes

* Incorporate review feedback on text comments

* Add latest docstring and tutorial changes

* Add Notebook output

* Remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

* Add output with all metrics

* Add printing of multiple metrics to script

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-03 11:19:41 +01:00
tstadel
9293a902d7
Fix OOM in test_eval.py Windows CI (#1830)
* diable problematic eval tests for windows ci

* move standard pipeline eval tests to separate test file

* switch to elasticsearch documentstore to reduce inproc mem

* Revert "switch to elasticsearch documentstore to reduce inproc mem"

This reverts commit 7a75871909c3317a252dff3a4df17e99eff69d05.

* get retiever from conftest

* use smaller embedding model for summarizer

* use smaller summarizer model

* remove queries param from pipeline.eval()

* isolate problematic tests

* rename separate test file

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 19:23:58 +01:00
tstadel
180c05365a
Deprecate old pipeline eval nodes: EvalDocuments and EvalAnswers (#1778)
* log deprecated warning on init

* deprecation warning included into docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.0.0rc1
2021-12-02 18:09:26 +01:00
tstadel
dc4cd49049
remove queries param from pipeline.eval() (#1836)
* remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 16:04:01 +01:00
Sara Zan
99365e1d8e
Add backlink below the context, if available in the doc's meta (#1834) 2021-12-02 13:37:23 +01:00
tstadel
bab05c7677
Fix loading and saving of EvaluationReszult (#1831)
* fix spans in csvs

* fix tests
2021-12-02 10:30:11 +01:00
Sara Zan
c21521dc9c
More demo bugfixes (#1832)
* Trying to fix a bug occurring when dataset is None (happens with many parallel request for some reason)

* Change favicon and title and fix bug with version number

* Improve the text description and partially fix the enter-to-run function
2021-12-01 22:25:59 +01:00
Sara Zan
e39d015a59
Allow SQLDocumentStore to filter by many filters (#1776)
* Aliasing the join is not sufficient yet

* Update the filter query in some other functions of SQLDocumentStore - this functionality should be centralized

* Adding tests for get_all_documents, now failing

* Fix tests

* Fix typo spotted by mypy
2021-12-01 16:16:17 +01:00
tstadel
c5540d05ed
Calculation of metrics and presentation of eval results (#1760)
* retriever metrics added

* Add latest docstring and tutorial changes

* answer and document level matching metrics implemented

* Add latest docstring and tutorial changes

* answer related metrics for retriever

* basic reader metrics implemented

* handle no_answers

* fix typing

* fix tests

* fix tests without sas

* first draft for simulated top k

* rename sas and f1 columns in dataframe

* refactoring of EvaluationResult

* Add latest docstring and tutorial changes

* more eval tests added

* fix sas expected value precision

* distinction between ir and qa recall

* EvaluationResult.worst_queries() implemented

* print_evaluation_report() added

* eval report for QA Pipeline improved

* dynamic metrics for worst queries calc

* Add latest docstring and tutorial changes

* method names adjusted

* simple test for print_eval_report() added

* improved documentation

* Add latest docstring and tutorial changes

* minor formatting

* Add latest docstring and tutorial changes

* fix no_answer cases

* adjust one docstring

* Add latest docstring and tutorial changes

* fix no_answer cases for sas

* batchmode for sas implemented

* fix for retriever metrics if there are only no_answers

* fix multilabel tests

* improve documentation for pipeline.eval()

* streamline multilabel aggregates and docs

* Add latest docstring and tutorial changes

* fix multilabel tests

* unify document_id

* add dataframe schema description to EvaluationResult

* Add latest docstring and tutorial changes

* rename worst_queries to wrong_examples

* Add latest docstring and tutorial changes

* make query digesting standard pipelines work with pipeline.eval()

* Add latest docstring and tutorial changes

* tests for multi retriever pipelines added

* remove unnecessary import

* print_eval_report(): support all pipelines without junctions

* Add latest docstring and tutorial changes

* fix typos

* Add latest docstring and tutorial changes

* fix minor simulated_top_k bug and use memory documentstore throughout tests

* sas model param description improved

* Add latest docstring and tutorial changes

* rename recall metrics

* Add latest docstring and tutorial changes

* fix mean average precision link

* Add latest docstring and tutorial changes

* adjust sas description docstring

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-30 19:26:34 +01:00
ju-gu
4cce7ffe85
bugfix metadata extraction in form recognizer & split of surrounding content length (#1829)
* bugfix metadata extraxtion in the formrecognizer and seperation of surrounding in preceding and following content length

* Fix docstring

* fix metadata extraction for content_type text

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-30 19:10:21 +01:00