968 Commits

Author SHA1 Message Date
MichelBartels
f33c2b987a
Adding distillation loss functions from TinyBERT (#1879)
* initial tinybertdistill commit

* add tinybert distill loss

* remove teacher caching for tinybert

* add tinybert to distil_from method

* Add latest docstring and tutorial changes

* add dim mapping and fix type hints

* fix type hints

* fix dummy input

* fix dim mapping for tinybert loss and add comments/doc strings

* add test for tinybert loss

* Add latest docstring and tutorial changes

* add comment

* fix BERT forward parameters

* add doc string to AdaptiveModel forward method

* remove unnecessary data silo

* fix farm import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-23 14:54:02 +01:00
tstadel
fc8df2163d
Fix Windows CI OOM (#1878)
* set fixture scope to "function"

* run FARMReader without multiprocessing

* dispose off ray after tests

* run most expensive tasks first in test files

* run expensive tests first

* run garbage collector between tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-22 17:20:23 +01:00
tstadel
7bdb782871
Raise exception if Elasticsearch search_fields have wrong datatype (#1913) 2021-12-20 16:10:55 +01:00
Dmitry Goryunov
42a0fc3860
Include ray version compatible with M1 processor (#1906) 2021-12-20 10:16:59 +01:00
Johnny-KP
51e84b805b
Changed export to csv method to new answer format (#1907) 2021-12-17 16:10:29 +01:00
bogdankostic
74c80e0c71
Set mypy version to 0.910 (#1899) 2021-12-16 14:02:04 +01:00
javier ramírez
5c7f3c234e
Fix minor typo in readme (#1900)
I just added a missing "r" to the word "contributions" at the "Overview and Usage" section
2021-12-16 13:31:27 +01:00
bogdankostic
4edec04c2c
Add improvements to AzureConverter (#1896)
* Add some improvements to AzureConverter

* Adapt docstring + use Path instead of str

* Fix mypy version to 0.910
2021-12-16 12:45:24 +01:00
Alberto Villa
e4aec4661d
Improved version of print_answers (#1891)
* Improved version of print_answers

* Changed the way max_text_len is checked
2021-12-15 17:16:33 +01:00
Alberto Villa
1bb6244a63
Exchanged minimal with minimum in print_answers function call (#1890) 2021-12-14 15:27:37 +01:00
Alberto Villa
2396f0cd3a
Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881) 2021-12-14 10:50:25 +01:00
tstadel
57a04631df
introduce node_input param (#1854)
* introduce node_input param

* Add latest docstring and tutorial changes

* prediction and label as node_input values

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-14 10:34:35 +01:00
Ivan Lopez
86f5688f47
fix wrong branch and repo, add cloudwatch agent (#1877) 2021-12-13 20:32:25 +01:00
Sara Zan
de71b944d7
Fix typo in the Windows CI UI deps (#1876)
* Fix typo in the WindowsCI UI deps

* Force a deps cache miss
2021-12-13 15:49:44 +01:00
Malte Pietsch
7084a24794
Bump version to 1.0 in REST api (#1875) 2021-12-13 12:39:59 +01:00
Julian Risch
2c184e467f
Upgrade transformers to 4.13.0 (#1659)
* upgrade to pytorch 1.10 and transformers 4.11.3

* pin torch to 1.9.1

* Upgrade transformers and torch to 4.12.2 and 1.10.0

* Test transformers 4.10.2

* Pin transformers to 4.10.2

* transformers 4.10.3

* transformers 4.11.0

* transformers 4.11.1

* transformers 4.11.2

* check fix on current transformer's master branch

* Install transformers from commit id

* update transformers to 4.12.5

* Upgrade torch version for torch-scatter

* Upgrade torch version for torch-scatter in Windows CI

* Build new cache

* Undo last commit

* Use transformers v4.11.2

* bump transformers to 4.12.5

* bump transformers to 4.13.0

* re-allow range of torch versions

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2021-12-11 12:08:16 +01:00
Fabrice Depaulis
77d52ad215
Rely api healthcheck on status code rather than json decoding (#1871)
* Rely api healthcheck on status code rather than json decoding

* Install UI dependencies on the Linux and Windows CI

Co-authored-by: Fabrice Depaulis <fabrice.depaulis@orange.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2021-12-10 18:05:23 +01:00
Andreas Motl
4eb4503f25
Fix typo (#1869) 2021-12-10 09:39:45 +01:00
Branden Chan
ea5aab23ec
Update pydoc-markdown-file-classifier.yml (#1856)
* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Prevent wrapping DataParallel in second DataParallel (#1855)

* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Create v1.0 docs (#1862)

* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Rebase and apply change to v1.0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2021-12-08 18:19:03 +01:00
Branden Chan
ef1e531895
Create v1.0 docs (#1862) 2021-12-08 17:53:00 +01:00
bogdankostic
cbfe2b4626
Prevent wrapping DataParallel in second DataParallel (#1855)
* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-08 09:56:45 +01:00
Malte Pietsch
8cb513c2c6
Bump version to 1.0.0 v1.0.0 2021-12-07 15:13:24 +01:00
Sara Zan
983b20f28d
Demo UI fix debug info (#1846)
* Fix debug info

* Make enter to run work better

* Reintroduce default question in the eval dataset

* Outputting valid json instead of a Python dict
2021-12-06 18:55:39 +01:00
KUNPENG GUO
160f81aaa3
Fix bug ranker: wrong lambda function (#1824)
* Fix bug ranker: wrong lambda function

The zip function used in line 110 intends to choose the logits array to be the key for the lambda function while it should be the first/second logit of the logit array which corresponds to the classification label (has_answer)

* Use label 1 as has_answer label

* generic ranker (add if-cond for logits vector shape)

* remove test code

* remove test code...

* add two_logits test case for ranker module.

* complete the documentation of ranker, support rankers with 1 or 2 logits as output
2021-12-06 17:13:57 +01:00
Sara Zan
8b7b51f0f5
Typo spotted in one question. Removed question that returned wrong answer. Added a couple more that work. (#1843) 2021-12-06 15:44:08 +01:00
Julian Risch
aa1520212f
workaround torch bug with non-continguous tensors (#1845) 2021-12-06 15:10:51 +01:00
Ivan Lopez
4f6dc36869
Deploy demo (#1837)
* Add GH Actions workflow for demo deployment

* update demo ec2 instance type

* remove redundant docker-compose build

* add custom demo command and env vars

* deploy demo on updates to workflow resources
2021-12-03 15:58:47 +01:00
Branden Chan
bec14b63c3
Add live demo link to readme (#1839) 2021-12-03 14:34:19 +01:00
Malte Pietsch
90ced1b246
Update release.yml 2021-12-03 13:23:55 +01:00
Malte Pietsch
e5599bd337
Extend categories for release notes (#1841) 2021-12-03 13:19:45 +01:00
Malte Pietsch
4e76129004
Add config for github release notes (#1840) 2021-12-03 12:27:58 +01:00
Julian Risch
54f776350c
Update evaluation tutorial to cover the new pipeline.eval() (#1765)
* Replace old tutorial 5 with new code based on test cases

* Add latest docstring and tutorial changes

* Use pipeline.eval() in tutorial

* Add latest docstring and tutorial changes

* Restructure notebook

* Add latest docstring and tutorial changes

* Add dataframe example

* Add latest docstring and tutorial changes

* Get eval data from doc store

* Add latest docstring and tutorial changes

* Load data from doc store

* Add latest docstring and tutorial changes

* Clear outputs

* Add latest docstring and tutorial changes

* Change example and add python script

* Add latest docstring and tutorial changes

* Fetch aggregated multilabels from doc store

* Add latest docstring and tutorial changes

* Incorporate review feedback on text comments

* Add latest docstring and tutorial changes

* Add Notebook output

* Remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

* Add output with all metrics

* Add printing of multiple metrics to script

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-03 11:19:41 +01:00
tstadel
9293a902d7
Fix OOM in test_eval.py Windows CI (#1830)
* diable problematic eval tests for windows ci

* move standard pipeline eval tests to separate test file

* switch to elasticsearch documentstore to reduce inproc mem

* Revert "switch to elasticsearch documentstore to reduce inproc mem"

This reverts commit 7a75871909c3317a252dff3a4df17e99eff69d05.

* get retiever from conftest

* use smaller embedding model for summarizer

* use smaller summarizer model

* remove queries param from pipeline.eval()

* isolate problematic tests

* rename separate test file

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 19:23:58 +01:00
tstadel
180c05365a
Deprecate old pipeline eval nodes: EvalDocuments and EvalAnswers (#1778)
* log deprecated warning on init

* deprecation warning included into docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.0.0rc1
2021-12-02 18:09:26 +01:00
tstadel
dc4cd49049
remove queries param from pipeline.eval() (#1836)
* remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 16:04:01 +01:00
Sara Zan
99365e1d8e
Add backlink below the context, if available in the doc's meta (#1834) 2021-12-02 13:37:23 +01:00
tstadel
bab05c7677
Fix loading and saving of EvaluationReszult (#1831)
* fix spans in csvs

* fix tests
2021-12-02 10:30:11 +01:00
Sara Zan
c21521dc9c
More demo bugfixes (#1832)
* Trying to fix a bug occurring when dataset is None (happens with many parallel request for some reason)

* Change favicon and title and fix bug with version number

* Improve the text description and partially fix the enter-to-run function
2021-12-01 22:25:59 +01:00
Sara Zan
e39d015a59
Allow SQLDocumentStore to filter by many filters (#1776)
* Aliasing the join is not sufficient yet

* Update the filter query in some other functions of SQLDocumentStore - this functionality should be centralized

* Adding tests for get_all_documents, now failing

* Fix tests

* Fix typo spotted by mypy
2021-12-01 16:16:17 +01:00
tstadel
c5540d05ed
Calculation of metrics and presentation of eval results (#1760)
* retriever metrics added

* Add latest docstring and tutorial changes

* answer and document level matching metrics implemented

* Add latest docstring and tutorial changes

* answer related metrics for retriever

* basic reader metrics implemented

* handle no_answers

* fix typing

* fix tests

* fix tests without sas

* first draft for simulated top k

* rename sas and f1 columns in dataframe

* refactoring of EvaluationResult

* Add latest docstring and tutorial changes

* more eval tests added

* fix sas expected value precision

* distinction between ir and qa recall

* EvaluationResult.worst_queries() implemented

* print_evaluation_report() added

* eval report for QA Pipeline improved

* dynamic metrics for worst queries calc

* Add latest docstring and tutorial changes

* method names adjusted

* simple test for print_eval_report() added

* improved documentation

* Add latest docstring and tutorial changes

* minor formatting

* Add latest docstring and tutorial changes

* fix no_answer cases

* adjust one docstring

* Add latest docstring and tutorial changes

* fix no_answer cases for sas

* batchmode for sas implemented

* fix for retriever metrics if there are only no_answers

* fix multilabel tests

* improve documentation for pipeline.eval()

* streamline multilabel aggregates and docs

* Add latest docstring and tutorial changes

* fix multilabel tests

* unify document_id

* add dataframe schema description to EvaluationResult

* Add latest docstring and tutorial changes

* rename worst_queries to wrong_examples

* Add latest docstring and tutorial changes

* make query digesting standard pipelines work with pipeline.eval()

* Add latest docstring and tutorial changes

* tests for multi retriever pipelines added

* remove unnecessary import

* print_eval_report(): support all pipelines without junctions

* Add latest docstring and tutorial changes

* fix typos

* Add latest docstring and tutorial changes

* fix minor simulated_top_k bug and use memory documentstore throughout tests

* sas model param description improved

* Add latest docstring and tutorial changes

* rename recall metrics

* Add latest docstring and tutorial changes

* fix mean average precision link

* Add latest docstring and tutorial changes

* adjust sas description docstring

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-30 19:26:34 +01:00
ju-gu
4cce7ffe85
bugfix metadata extraction in form recognizer & split of surrounding content length (#1829)
* bugfix metadata extraxtion in the formrecognizer and seperation of surrounding in preceding and following content length

* Fix docstring

* fix metadata extraction for content_type text

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-30 19:10:21 +01:00
Sara Zan
935689e630
Demo UI add env vars & other small fixes (#1828)
* Add more env vars to the streamlit ui

* Add some more questions to the random ones

* Relax a statuscode check and rename env vars

* Make query error message more descriptive

* Add log message

* Align docker-compose with and without GPU

* Typo in pipeline filename

* Remove prefix from var in docker_compose

* Align docker-compose.yml and add small sleep to the initialized poller to prevent spamming

* Fix the name of the dockerfile used to build the GPU image
2021-11-30 18:11:54 +01:00
AhmedIdr
56e4e8486f
Added max_seq_length and batch_size params to embeddingretriever (#1817)
* Added max_seq_length and batch_size params, added progress_bar to faiss writing_documents

* Add latest docstring and tutorial changes

* fixed typos

* Update dense.py

Changed default batch_size and max_seq_len in EmbeddingRetriever

* Add latest docstring and tutorial changes

* Update faiss.py

Change import tqdm.auto to tqdm

* Update faiss.py

Changing tqdm back to tqdm.auto

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-29 19:49:51 +01:00
Sara Zan
fb511dc4a3
Remove feedback from no-answers (#1827)
* Fix some miscopied code

* Remove feedback from the no-answer, seems the backend can't take it

* Try to raise concurrent requests per worker

* Remove the actual number of workers
2021-11-29 19:42:10 +01:00
bogdankostic
eb5f7bb4c0
Add AzureConverter to support table parsing from documents (#1813)
* Add FormRecognizerConverter

* Change signature of convert method + change return type of all converters

* Adapt preprocessing util to new return type of converters

* Parametrize number of lines used for surrounding context of table

* Change name from FormRecognizerConverter to AzureConverter

* Set version of azure-ai-formrecognizer package

* Change tutorial 8 based on new return type of converters

* Add tests

* Add latest docstring and tutorial changes

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-29 18:44:20 +01:00
Sara Zan
c29f960c47
Fix UI demo feedback (#1816)
* Fix the feedback function of the demo with a workaround

* Some docstring

* Update tests and rename methods in feedback.py

* Fix tests

* Remove operation_ids

* Add a couple of status code checks
2021-11-29 17:03:54 +01:00
MichelBartels
84147edcca
Model Distillation (#1758)
* initial commit

* Add latest docstring and tutorial changes

* added comments and fixed bug

* fixed bugs, added benchmark and added documentation

* Add latest docstring and tutorial changes

* fix type: ignore comment

* fix logging in benchmark

* fixed distillation config

* Add latest docstring and tutorial changes

* added type annotations

* fixed distillation loss calculation

* added type annotations

* fixed distillation mse loss

* improved model distillation benchmark config loading

* added temperature for model distillation

* removed uncessary imports, added comments, added named parameter calls

* Add latest docstring and tutorial changes

* added some more comments

* added distillation test

* fixed distillation test

* removed unnecessary import

* fix softmax dimension

* add grid search

* improved model distillation benchmark config

* fixed model distillation hyperparameter search

* added doc strings and type hints for model distillation

* Add latest docstring and tutorial changes

* fixed type hints

* fixed type hints

* fixed type hints

* wrote out params instead of kwargs in DistillationDataSilo initializer

* fixed type hints

* fixed typo

* fixed typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-26 18:49:30 +01:00
Sara Zan
1a4ee21b92
Adapt docker-compose-gpu.yml to use DPR by default (#1810)
* Adapt docker-compose-gpu.yml to use DPR by default

* Update the comments

* Change the ES image

* Increase the context window and allow no-answers in the DPR pipeline too

* Re-enable file upload in GPU version

* Add env var without value and a commet to explain it
2021-11-25 16:23:18 +01:00
Sara Zan
9ee0ea0c17
Add description to the demo (#1809)
* Improve the Random Question functionality and add three example questions

* Fix the example questions

* Change default docs for the retriever

* Add example short description and make the no-answer boxes blue

* Modify some text and add a fix for the slider's bug

* New no-answer message
2021-11-25 15:27:09 +01:00
Sara Zan
742d4b9db9
Improve the Random Question functionality (#1808)
* Improve the Random Question functionality and add three example questions

* Fix the example questions

* Change default docs for the retriever
2021-11-24 15:55:44 +01:00