661 Commits

Author SHA1 Message Date
bogdankostic
bbb65a19bd
Add Tapas reader with scores (#1997)
* Add Tapas reader with scores

* Adapt possible answer spans

* Add latest docstring and tutorial changes

* Remove unused imports

* Adapt scoring

* Add latest docstring and tutorial changes

* Fix mypy

* Infer model architecture from config

* Adapt answer score calculation

* Add latest docstring and tutorial changes

* Fix mypy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-31 10:23:12 +01:00
Malte Pietsch
ee6b8d0688
Add ADR template for transparent architecture decisions (#2072)
* add adr template for decisions

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-01-28 17:33:53 +01:00
Kristof Herrmann
7764b6992c
DC SDK - load pipeline from deepset cloud (#2013)
* initial load_from_dc

* typo

* adjusted api endpoint

* removed kwargs

* added _load_from_dict

* refactor pipeline loading mechanism

* renaming load_from_dc api

* renaming

* fixed errors

* fix comments and environment variable overrides

* Add latest docstring and tutorial changes

* fix outdated YAML examples

* Add latest docstring and tutorial changes

* Introduce readonly DCDocumentStore (without labels support) (#1991)

* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>

* introduce DeepsetCloudAdapter

* Add latest docstring and tutorial changes

* introduce DeepsetCloudClient

* Add latest docstring and tutorial changes

* use json api for pipeline_config

* indexing pipeline test added

* pseudo change to force cache eviction

* revert pseudo change to force cache eviction

* remove conftest duplicates

* minor formatting and docstring fixes

* fix tests when MOCK_DC=False

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-28 17:32:56 +01:00
Sara Zan
713771095b
Autogenerate OpenAPI specs file (#2047)
* Add docstrings to the REST API endpoint to have them included in the OpenAPI specs

* Attempt at make GitHub CI generate the OpenAPI specs

* Missing __init__.py was breaking rest_api import

* Add comment on dummy pipeline

* Create separate workflow file for the OpenAPI specs generation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2022-01-27 13:06:01 +01:00
Sara Zan
9af1292cda
Remove stray requirements.txt files and update README.md (#2075)
* Remove stray requirements.txt files and update README.md

* Remove requirement files

* Add details about pip bug and link to setup.cfg

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-27 11:22:14 +01:00
Sara Zan
d470b9d0bd
Improve dependency management (#1994)
* Fist attempt at using setup.cfg for dependency management

* Trying the new package on the CI and in Docker too

* Add composite extras_require

* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports

* Fix bug on class import and rephrase error message

* Introduce typing for optional modules and add type: ignore in sparse.py

* Include importlib_metadata backport for py3.7

* Add colab group to extra_requires

* Fix pillow version

* Fix grpcio

* Separate out the crawler as another extra

* Make paths relative in rest_api and ui

* Update the test matrix in the CI

* Add try catch statements around the optional imports too to account for direct imports

* Never mix direct deps with self-references and add ES deps to the base install

* Refactor several paths in tests to make them insensitive to the execution path

* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix

* Wrap pdf conversion utils into safe_import

* Update some tutorials and rever Milvus1 as default for now, see #2067

* Fix mypy config


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 18:12:55 +01:00
MichelBartels
4cc37548e3
Fix finetuning notebook augmentation (#2071)
* fix data augmentation path in finetuning notebook

* Add latest docstring and tutorial changes

* make distillation possible with other models than BERT

* use smaller dataset for distillation in finetuning tutorial

* Add latest docstring and tutorial changes

* make data augmentation in finetuning faster

* update language models forward doc strings

* fix return type of language models

* remove debug output

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 17:49:14 +01:00
Sowmiya Jaganathan
c4fff19018
Supported Highlighting in Elasticsearch (#1930)
* Supported Highlighting

* Review changes

* add example to docstrings

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-26 17:35:33 +01:00
mathislucka
5b7e906e85
fix: get_documents_by_id should return docs for all passed ids (#2064)
* doc store should return all documents matching ids passed to get_documents_by_id

* test for get_document_by_id should be named correctly

* add test for get_documents_by_id

* Add latest docstring and tutorial changes

* document es query limit

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 12:39:04 +01:00
tstadel
8a32d8da92
Introduce readonly DCDocumentStore (without labels support) (#1991)
* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-25 20:36:28 +01:00
MichelBartels
5b6b0cef77
Add UnlabeledTextProcessor (#2054)
* add UnlabeledTextProcessor

* allow choosing processor when finetuning or distilling

* fix type hint

* Add latest docstring and tutorial changes

* improve segment id computation for UnlabeledTextProcessor

* add text and documentation

* change batch size parameter for intermediate layer distillation

* Add latest docstring and tutorial changes

* fix distillation dim mapping

* remove unnecessary changes

* removed confusing parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-25 14:54:34 +01:00
Julian Risch
c6f23dce88
upgrade haystack version number to 1.1.0 (#2039)
* upgrade haystack version number to 1.1.0

* copy docs to new version folder
2022-01-20 13:45:38 +01:00
tstadel
50317d74bd
Add ndcg and eval_mode to docs (#2038)
* add ndcg and eval_mode to docstrings and reorder dataframe columns in docs

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-20 13:02:46 +01:00
MichelBartels
e8cd5ea943
Add distillation to finetuning tutorial (#2025)
* Add finetuning tutorial

* Add latest docstring and tutorial changes

* fix typo

* Add latest docstring and tutorial changes

* improve distillation explanation in finetuning tutorial

* Add latest docstring and tutorial changes

* allow augment_squad.py to be easier to call from within python

* Update Tutorial2_Finetune_a_model_on_your_data.py

* fix squad augmentation test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-20 12:18:32 +01:00
MichelBartels
0cca2b97cd
distinguish intermediate layer & prediction layer distillation phases with different parameters (#2001)
* add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation

* Add latest docstring and tutorial changes

* improve default parameters

* Add latest docstring and tutorial changes

* split up distillation method

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 20:40:38 +01:00
tstadel
f42d2e8ba0
Add nDCG to pipeline.eval()'s document metrics (#2008)
* add ndcg metric

* fix merge

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 18:36:41 +01:00
Julian Risch
2c063e960e
Extend Tutorial 5 with Upper Bound Reader Eval Metrics (#1995)
* print report for closed-domain eval

* Add latest docstring and tutorial changes

* rename parameter and rewrite docs

* Add latest docstring and tutorial changes

* print eval report in separate cell

* Add latest docstring and tutorial changes

* explain when to eval individual components

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 16:29:18 +01:00
Julian Risch
5695d721aa
update link to annotation tool docu (#2005)
* update link to annotation tool docu

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 16:10:59 +01:00
Julian Risch
a3147cae47
Add isolated node eval mode in pipeline eval (#1962)
* run predictions on ground-truth docs in reader

* build dataframe for closed/open domain eval

* fix looping through multilabel

* fix looping through multilabel's list of labels

* simplify collecting relevant docs

* switch closed-domain eval off by default

* Add latest docstring and tutorial changes

* handle edge case params not given

* renaming & generate pipeline eval report

* add test case for closed-domain eval metrics

* Add latest docstring and tutorial changes

* test  report of closed-domain eval

* report closed-domain metrics only for answer metrics not doc metrics

* refactoring

* fix mypy & remove comment

* add second for-loop & use answer as method input

* renaming & add separate loop building docs eval df

* Add latest docstring and tutorial changes

* source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957)
conda activate haystack-dev2

* change column order for evaluatation dataframe

* added missing eval column node_input

* generic order for both document and answer returning nodes; ensure no columns get lost

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>

* fix column reordering after renaming of node_input

* simplify tests &  add docu

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-01-14 14:37:16 +01:00
Sara Zan
e28bf618d7
Implement proper FK in MetaDocumentORM and MetaLabelORM to work on PostgreSQL (#1990)
* Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints

* update_document_meta() was not using index properly

* Exclude ES and Memory from the cosine_sanity_check test

* move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-14 13:48:58 +01:00
MichelBartels
3e4dbbb32c
Align similarity scores across document stores (#1967)
* align document store similarity functions

* remove unnecessary imports

* undone accidental change

* stopped weaviate from pretending to support dot product similarity

* stopped weaviate from pretending to support dot product similarity

* Add latest docstring and tutorial changes

* fix fixture params for document stores

* use cosine similarity for most tests

* fix cosine similarity test

* fix faiss test

* fix weaviate test

* fix accidental deletion

* fix document_store fixture

* test fix; shouldn't be merged

* fix test_normalize_embeddings_diff_shapes

* probably a better fix

* fix for parameter combinations

* revert new pytest_generate_tests functionality

* simplify pytest_generate_tests

* normalize embeddings for test_dpr_embedding

* add to faiss doc that embeddings are normalized

* Add latest docstring and tutorial changes

* remove unnecessary parameters and add comments

* simplify two lines of memory.py into one

* test similarity scores with smaller language model

* fix test_similarity_score


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-12 19:28:20 +01:00
Dmitry Goryunov
79fdda8a7c
Remove hard-coded variables from the Tutorial 15 (#1984)
* Remove hard-coded variables from the Tutorial 15

* Fix missing comma

* Add latest docstring and tutorial changes

* Fix formatting in Tutorial15_TableQA.ipynb

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-11 17:55:20 +01:00
Mathew Kuriakose
a44b6c18c0
Unify vector_dim and embedding_dim parameter in Document Store (#1922)
* Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores

* Unit test cases updated to use `embedding_dim` instead of `vector_dim`

* Unit test case update to use embedding_dim instead of vector_dim

* Add latest docstring and tutorial changes

* Put usage of `vector_dim` param in same if-block as corresponding warning

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-01-10 18:10:32 +01:00
Julian Risch
30ea1d475d
check multiprocessing sharing strategy is available (#1965)
* check multiprocessing sharing strategy is available

* Change default of multiprocessing strategy to None

* Change default sharing strategy to None in retriever

* Add latest docstring and tutorial changes

* Make logging message easier to understand

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-05 18:22:09 +01:00
oryx1729
2910f67718
Use long Commit ID for Docker tags (#1946) 2022-01-04 17:39:49 +01:00
Alon Eirew
7a4fa42fda
Fix #1927 - RuntimeError when loading data using data_silo due to many open file descriptors from multiprocessing (#1928)
* fix #1687

* fix RuntimeError: received 0 items of ancdata

* Add an arg multiprocessing_strategy to DataSilo and DPR.train()

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-04 13:29:26 +01:00
bogdankostic
3e0ef1cc8a
Fix Numba TypingError in normalize_embedding for cosine similarity (#1933)
* Fix Numba TypingError

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 17:14:51 +01:00
bogdankostic
45df18c416
Add RCIReader for TableQA (#1909)
* Add RCIReader

* Add latest docstring and tutorial changes

* Add Doc Strings

* Add latest docstring and tutorial changes

* Add Tests

* Add Doc Strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 16:59:24 +01:00
Kristof Herrmann
6e8e3c68d9
Custom id hashing on documentstore level (#1910)
* adding dynamic id hashing

* Add latest docstring and tutorial changes

* added pr review

* Add latest docstring and tutorial changes

* fixed tests

* fix mypy error

* fix mypy issue

* ignore typing

* fixed correct check

* fixed tests

* try fixing the tests

* set id hash keys only if not none

* dont store id_hash_keys

* fix tests

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 16:58:19 +01:00
Julian Risch
a846be99d1
Extend TranslationWrapper to work with QA Generation (#1905)
* draft translationwrapper example

* draft translation of generated qa pairs

* Add latest docstring and tutorial changes

* fixed pass by reference by deepcopy

* delete adapted tutorial 13 (test purposes only)

* adapt method signature and doc string

* Add latest docstring and tutorial changes

* add type ignore

* extend tutorial 13 with TranslationWrapper example

* Add latest docstring and tutorial changes

* removed duplicate code

* indent if statement

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-03 13:30:24 +01:00
tstadel
a94c274134
Support custom headers per request in pipeline (#1861)
* chain headers param down to document_stores

* Add latest docstring and tutorial changes

* fix InMemoryDocumentStore params

* Add latest docstring and tutorial changes

* fix TfidfRetriever params

* Add latest docstring and tutorial changes

* fix missing headers

* Add latest docstring and tutorial changes

* fix sparql client and update docs

* Add latest docstring and tutorial changes

* test for documentstores

* pipeline tests added

* update header param in docstrings

* Add latest docstring and tutorial changes

* refactoring: headers as implicit param

* Add latest docstring and tutorial changes

* remove unnecessary imports

* propagade batch_size correctly

* Add latest docstring and tutorial changes

* revert InMemoryDocumentStore.write_documents signature

* Add latest docstring and tutorial changes

* remove #type: ignore

* Add latest docstring and tutorial changes

* replace MutableMapping by Dict

* Add latest docstring and tutorial changes

* improve docstrings

* Add latest docstring and tutorial changes

* get rid of **kwargs

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-03 11:38:02 +01:00
el2e10
377c20b8b1
Fix grammatical issue in optimization guides (#1941) 2022-01-03 11:06:13 +01:00
bogdankostic
39573cf0a9
Add ParsrConverter (#1931)
* Add ParsrConverter

* Fix typing error + add Parsr to Linux CI

* Fix valid_language for all converters + fix context generation for ParsrConverter

* Remove ParsrConverter test from WindowsCI

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-30 10:15:11 +01:00
MichelBartels
f33c2b987a
Adding distillation loss functions from TinyBERT (#1879)
* initial tinybertdistill commit

* add tinybert distill loss

* remove teacher caching for tinybert

* add tinybert to distil_from method

* Add latest docstring and tutorial changes

* add dim mapping and fix type hints

* fix type hints

* fix dummy input

* fix dim mapping for tinybert loss and add comments/doc strings

* add test for tinybert loss

* Add latest docstring and tutorial changes

* add comment

* fix BERT forward parameters

* add doc string to AdaptiveModel forward method

* remove unnecessary data silo

* fix farm import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-23 14:54:02 +01:00
javier ramírez
5c7f3c234e
Fix minor typo in readme (#1900)
I just added a missing "r" to the word "contributions" at the "Overview and Usage" section
2021-12-16 13:31:27 +01:00
Alberto Villa
1bb6244a63
Exchanged minimal with minimum in print_answers function call (#1890) 2021-12-14 15:27:37 +01:00
Alberto Villa
2396f0cd3a
Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881) 2021-12-14 10:50:25 +01:00
tstadel
57a04631df
introduce node_input param (#1854)
* introduce node_input param

* Add latest docstring and tutorial changes

* prediction and label as node_input values

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-14 10:34:35 +01:00
Branden Chan
ea5aab23ec
Update pydoc-markdown-file-classifier.yml (#1856)
* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Prevent wrapping DataParallel in second DataParallel (#1855)

* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Create v1.0 docs (#1862)

* Update pydoc-markdown-file-classifier.yml

* Add latest docstring and tutorial changes

* Rebase and apply change to v1.0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2021-12-08 18:19:03 +01:00
Branden Chan
ef1e531895
Create v1.0 docs (#1862) 2021-12-08 17:53:00 +01:00
bogdankostic
cbfe2b4626
Prevent wrapping DataParallel in second DataParallel (#1855)
* Prevent wrapping DataParallel in second DataParallel

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-08 09:56:45 +01:00
Julian Risch
54f776350c
Update evaluation tutorial to cover the new pipeline.eval() (#1765)
* Replace old tutorial 5 with new code based on test cases

* Add latest docstring and tutorial changes

* Use pipeline.eval() in tutorial

* Add latest docstring and tutorial changes

* Restructure notebook

* Add latest docstring and tutorial changes

* Add dataframe example

* Add latest docstring and tutorial changes

* Get eval data from doc store

* Add latest docstring and tutorial changes

* Load data from doc store

* Add latest docstring and tutorial changes

* Clear outputs

* Add latest docstring and tutorial changes

* Change example and add python script

* Add latest docstring and tutorial changes

* Fetch aggregated multilabels from doc store

* Add latest docstring and tutorial changes

* Incorporate review feedback on text comments

* Add latest docstring and tutorial changes

* Add Notebook output

* Remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

* Add output with all metrics

* Add printing of multiple metrics to script

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-03 11:19:41 +01:00
tstadel
180c05365a
Deprecate old pipeline eval nodes: EvalDocuments and EvalAnswers (#1778)
* log deprecated warning on init

* deprecation warning included into docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 18:09:26 +01:00
tstadel
dc4cd49049
remove queries param from pipeline.eval() (#1836)
* remove queries param from pipeline.eval()

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-02 16:04:01 +01:00
tstadel
c5540d05ed
Calculation of metrics and presentation of eval results (#1760)
* retriever metrics added

* Add latest docstring and tutorial changes

* answer and document level matching metrics implemented

* Add latest docstring and tutorial changes

* answer related metrics for retriever

* basic reader metrics implemented

* handle no_answers

* fix typing

* fix tests

* fix tests without sas

* first draft for simulated top k

* rename sas and f1 columns in dataframe

* refactoring of EvaluationResult

* Add latest docstring and tutorial changes

* more eval tests added

* fix sas expected value precision

* distinction between ir and qa recall

* EvaluationResult.worst_queries() implemented

* print_evaluation_report() added

* eval report for QA Pipeline improved

* dynamic metrics for worst queries calc

* Add latest docstring and tutorial changes

* method names adjusted

* simple test for print_eval_report() added

* improved documentation

* Add latest docstring and tutorial changes

* minor formatting

* Add latest docstring and tutorial changes

* fix no_answer cases

* adjust one docstring

* Add latest docstring and tutorial changes

* fix no_answer cases for sas

* batchmode for sas implemented

* fix for retriever metrics if there are only no_answers

* fix multilabel tests

* improve documentation for pipeline.eval()

* streamline multilabel aggregates and docs

* Add latest docstring and tutorial changes

* fix multilabel tests

* unify document_id

* add dataframe schema description to EvaluationResult

* Add latest docstring and tutorial changes

* rename worst_queries to wrong_examples

* Add latest docstring and tutorial changes

* make query digesting standard pipelines work with pipeline.eval()

* Add latest docstring and tutorial changes

* tests for multi retriever pipelines added

* remove unnecessary import

* print_eval_report(): support all pipelines without junctions

* Add latest docstring and tutorial changes

* fix typos

* Add latest docstring and tutorial changes

* fix minor simulated_top_k bug and use memory documentstore throughout tests

* sas model param description improved

* Add latest docstring and tutorial changes

* rename recall metrics

* Add latest docstring and tutorial changes

* fix mean average precision link

* Add latest docstring and tutorial changes

* adjust sas description docstring

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-30 19:26:34 +01:00
AhmedIdr
56e4e8486f
Added max_seq_length and batch_size params to embeddingretriever (#1817)
* Added max_seq_length and batch_size params, added progress_bar to faiss writing_documents

* Add latest docstring and tutorial changes

* fixed typos

* Update dense.py

Changed default batch_size and max_seq_len in EmbeddingRetriever

* Add latest docstring and tutorial changes

* Update faiss.py

Change import tqdm.auto to tqdm

* Update faiss.py

Changing tqdm back to tqdm.auto

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-29 19:49:51 +01:00
bogdankostic
eb5f7bb4c0
Add AzureConverter to support table parsing from documents (#1813)
* Add FormRecognizerConverter

* Change signature of convert method + change return type of all converters

* Adapt preprocessing util to new return type of converters

* Parametrize number of lines used for surrounding context of table

* Change name from FormRecognizerConverter to AzureConverter

* Set version of azure-ai-formrecognizer package

* Change tutorial 8 based on new return type of converters

* Add tests

* Add latest docstring and tutorial changes

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-29 18:44:20 +01:00
MichelBartels
84147edcca
Model Distillation (#1758)
* initial commit

* Add latest docstring and tutorial changes

* added comments and fixed bug

* fixed bugs, added benchmark and added documentation

* Add latest docstring and tutorial changes

* fix type: ignore comment

* fix logging in benchmark

* fixed distillation config

* Add latest docstring and tutorial changes

* added type annotations

* fixed distillation loss calculation

* added type annotations

* fixed distillation mse loss

* improved model distillation benchmark config loading

* added temperature for model distillation

* removed uncessary imports, added comments, added named parameter calls

* Add latest docstring and tutorial changes

* added some more comments

* added distillation test

* fixed distillation test

* removed unnecessary import

* fix softmax dimension

* add grid search

* improved model distillation benchmark config

* fixed model distillation hyperparameter search

* added doc strings and type hints for model distillation

* Add latest docstring and tutorial changes

* fixed type hints

* fixed type hints

* fixed type hints

* wrote out params instead of kwargs in DistillationDataSilo initializer

* fixed type hints

* fixed typo

* fixed typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-26 18:49:30 +01:00
Julian Risch
3b8e2e7b6c
Fix link to colab notebook in tutorial 16 (#1802)
* Fix link to colab notebook in tutorial 16

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-24 13:19:20 +01:00
Sowmiya Jaganathan
04d93ec247
Introduced an arg to add synonyms - Elasticsearch (#1625)
* Introduced an arg add synonyms to Elasticsearch

* Added the test code, removed the whitespace formatting changes, and overwrote the relevant parts from the already existing mapping instead of creating new mapping.

* Added the test code

* Remove whitespace change

* Added the doc_string with examples and link

* Removed unneccessary spaces

* Add latest docstring and tutorial changes

* fix text_field -> content_field

Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-23 19:10:34 +01:00