haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-04 07:26:15 +00:00

Author	SHA1	Message	Date
bogdankostic	bbb65a19bd	Add Tapas reader with scores (#1997 ) * Add Tapas reader with scores * Adapt possible answer spans * Add latest docstring and tutorial changes * Remove unused imports * Adapt scoring * Add latest docstring and tutorial changes * Fix mypy * Infer model architecture from config * Adapt answer score calculation * Add latest docstring and tutorial changes * Fix mypy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-31 10:23:12 +01:00
Malte Pietsch	ee6b8d0688	Add ADR template for transparent architecture decisions (#2072 ) * add adr template for decisions * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-01-28 17:33:53 +01:00
Kristof Herrmann	7764b6992c	DC SDK - load pipeline from deepset cloud (#2013 ) * initial load_from_dc * typo * adjusted api endpoint * removed kwargs * added _load_from_dict * refactor pipeline loading mechanism * renaming load_from_dc api * renaming * fixed errors * fix comments and environment variable overrides * Add latest docstring and tutorial changes * fix outdated YAML examples * Add latest docstring and tutorial changes * Introduce readonly DCDocumentStore (without labels support) (#1991) * minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de> * introduce DeepsetCloudAdapter * Add latest docstring and tutorial changes * introduce DeepsetCloudClient * Add latest docstring and tutorial changes * use json api for pipeline_config * indexing pipeline test added * pseudo change to force cache eviction * revert pseudo change to force cache eviction * remove conftest duplicates * minor formatting and docstring fixes * fix tests when MOCK_DC=False Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-28 17:32:56 +01:00
Sara Zan	713771095b	Autogenerate OpenAPI specs file (#2047 ) * Add docstrings to the REST API endpoint to have them included in the OpenAPI specs * Attempt at make GitHub CI generate the OpenAPI specs * Missing __init__.py was breaking rest_api import * Add comment on dummy pipeline * Create separate workflow file for the OpenAPI specs generation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>	2022-01-27 13:06:01 +01:00
Sara Zan	9af1292cda	Remove stray requirements.txt files and update README.md (#2075 ) * Remove stray requirements.txt files and update README.md * Remove requirement files * Add details about pip bug and link to setup.cfg Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-27 11:22:14 +01:00
Sara Zan	d470b9d0bd	Improve dependency management (#1994 ) * Fist attempt at using setup.cfg for dependency management * Trying the new package on the CI and in Docker too * Add composite extras_require * Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports * Fix bug on class import and rephrase error message * Introduce typing for optional modules and add type: ignore in sparse.py * Include importlib_metadata backport for py3.7 * Add colab group to extra_requires * Fix pillow version * Fix grpcio * Separate out the crawler as another extra * Make paths relative in rest_api and ui * Update the test matrix in the CI * Add try catch statements around the optional imports too to account for direct imports * Never mix direct deps with self-references and add ES deps to the base install * Refactor several paths in tests to make them insensitive to the execution path * Include tstadel review and re-introduce Milvus1 in the tests suite, to fix * Wrap pdf conversion utils into safe_import * Update some tutorials and rever Milvus1 as default for now, see #2067 * Fix mypy config Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 18:12:55 +01:00
MichelBartels	4cc37548e3	Fix finetuning notebook augmentation (#2071 ) * fix data augmentation path in finetuning notebook * Add latest docstring and tutorial changes * make distillation possible with other models than BERT * use smaller dataset for distillation in finetuning tutorial * Add latest docstring and tutorial changes * make data augmentation in finetuning faster * update language models forward doc strings * fix return type of language models * remove debug output Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 17:49:14 +01:00
Sowmiya Jaganathan	c4fff19018	Supported Highlighting in Elasticsearch (#1930 ) * Supported Highlighting * Review changes * add example to docstrings * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net> Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-26 17:35:33 +01:00
mathislucka	5b7e906e85	fix: get_documents_by_id should return docs for all passed ids (#2064 ) * doc store should return all documents matching ids passed to get_documents_by_id * test for get_document_by_id should be named correctly * add test for get_documents_by_id * Add latest docstring and tutorial changes * document es query limit * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 12:39:04 +01:00
tstadel	8a32d8da92	Introduce readonly DCDocumentStore (without labels support) (#1991 ) * minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>	2022-01-25 20:36:28 +01:00
MichelBartels	5b6b0cef77	Add UnlabeledTextProcessor (#2054 ) * add UnlabeledTextProcessor * allow choosing processor when finetuning or distilling * fix type hint * Add latest docstring and tutorial changes * improve segment id computation for UnlabeledTextProcessor * add text and documentation * change batch size parameter for intermediate layer distillation * Add latest docstring and tutorial changes * fix distillation dim mapping * remove unnecessary changes * removed confusing parameter * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-25 14:54:34 +01:00
Julian Risch	c6f23dce88	upgrade haystack version number to 1.1.0 (#2039 ) * upgrade haystack version number to 1.1.0 * copy docs to new version folder	2022-01-20 13:45:38 +01:00
tstadel	50317d74bd	Add ndcg and eval_mode to docs (#2038 ) * add ndcg and eval_mode to docstrings and reorder dataframe columns in docs * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-20 13:02:46 +01:00
MichelBartels	e8cd5ea943	Add distillation to finetuning tutorial (#2025 ) * Add finetuning tutorial * Add latest docstring and tutorial changes * fix typo * Add latest docstring and tutorial changes * improve distillation explanation in finetuning tutorial * Add latest docstring and tutorial changes * allow augment_squad.py to be easier to call from within python * Update Tutorial2_Finetune_a_model_on_your_data.py * fix squad augmentation test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-20 12:18:32 +01:00
MichelBartels	0cca2b97cd	distinguish intermediate layer & prediction layer distillation phases with different parameters (#2001 ) * add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation * Add latest docstring and tutorial changes * improve default parameters * Add latest docstring and tutorial changes * split up distillation method * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 20:40:38 +01:00
tstadel	f42d2e8ba0	Add nDCG to `pipeline.eval()`'s document metrics (#2008 ) * add ndcg metric * fix merge * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 18:36:41 +01:00
Julian Risch	2c063e960e	Extend Tutorial 5 with Upper Bound Reader Eval Metrics (#1995 ) * print report for closed-domain eval * Add latest docstring and tutorial changes * rename parameter and rewrite docs * Add latest docstring and tutorial changes * print eval report in separate cell * Add latest docstring and tutorial changes * explain when to eval individual components Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 16:29:18 +01:00
Julian Risch	5695d721aa	update link to annotation tool docu (#2005 ) * update link to annotation tool docu * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 16:10:59 +01:00
Julian Risch	a3147cae47	Add isolated node eval mode in pipeline eval (#1962 ) * run predictions on ground-truth docs in reader * build dataframe for closed/open domain eval * fix looping through multilabel * fix looping through multilabel's list of labels * simplify collecting relevant docs * switch closed-domain eval off by default * Add latest docstring and tutorial changes * handle edge case params not given * renaming & generate pipeline eval report * add test case for closed-domain eval metrics * Add latest docstring and tutorial changes * test report of closed-domain eval * report closed-domain metrics only for answer metrics not doc metrics * refactoring * fix mypy & remove comment * add second for-loop & use answer as method input * renaming & add separate loop building docs eval df * Add latest docstring and tutorial changes * source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957) conda activate haystack-dev2 * change column order for evaluatation dataframe * added missing eval column node_input * generic order for both document and answer returning nodes; ensure no columns get lost Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> * fix column reordering after renaming of node_input * simplify tests & add docu * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-01-14 14:37:16 +01:00
Sara Zan	e28bf618d7	Implement proper FK in `MetaDocumentORM` and `MetaLabelORM` to work on PostgreSQL (#1990 ) * Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints * update_document_meta() was not using index properly * Exclude ES and Memory from the cosine_sanity_check test * move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 13:48:58 +01:00
MichelBartels	3e4dbbb32c	Align similarity scores across document stores (#1967 ) * align document store similarity functions * remove unnecessary imports * undone accidental change * stopped weaviate from pretending to support dot product similarity * stopped weaviate from pretending to support dot product similarity * Add latest docstring and tutorial changes * fix fixture params for document stores * use cosine similarity for most tests * fix cosine similarity test * fix faiss test * fix weaviate test * fix accidental deletion * fix document_store fixture * test fix; shouldn't be merged * fix test_normalize_embeddings_diff_shapes * probably a better fix * fix for parameter combinations * revert new pytest_generate_tests functionality * simplify pytest_generate_tests * normalize embeddings for test_dpr_embedding * add to faiss doc that embeddings are normalized * Add latest docstring and tutorial changes * remove unnecessary parameters and add comments * simplify two lines of memory.py into one * test similarity scores with smaller language model * fix test_similarity_score Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-12 19:28:20 +01:00
Dmitry Goryunov	79fdda8a7c	Remove hard-coded variables from the Tutorial 15 (#1984 ) * Remove hard-coded variables from the Tutorial 15 * Fix missing comma * Add latest docstring and tutorial changes * Fix formatting in Tutorial15_TableQA.ipynb * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-11 17:55:20 +01:00
Mathew Kuriakose	a44b6c18c0	Unify vector_dim and embedding_dim parameter in Document Store (#1922 ) * Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores * Unit test cases updated to use `embedding_dim` instead of `vector_dim` * Unit test case update to use embedding_dim instead of vector_dim * Add latest docstring and tutorial changes * Put usage of `vector_dim` param in same if-block as corresponding warning Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-01-10 18:10:32 +01:00
Julian Risch	30ea1d475d	check multiprocessing sharing strategy is available (#1965 ) * check multiprocessing sharing strategy is available * Change default of multiprocessing strategy to None * Change default sharing strategy to None in retriever * Add latest docstring and tutorial changes * Make logging message easier to understand Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-05 18:22:09 +01:00
oryx1729	2910f67718	Use long Commit ID for Docker tags (#1946 )	2022-01-04 17:39:49 +01:00
Alon Eirew	7a4fa42fda	Fix #1927 - RuntimeError when loading data using data_silo due to many open file descriptors from multiprocessing (#1928 ) * fix #1687 * fix RuntimeError: received 0 items of ancdata * Add an arg multiprocessing_strategy to DataSilo and DPR.train() * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-04 13:29:26 +01:00
bogdankostic	3e0ef1cc8a	Fix Numba TypingError in `normalize_embedding` for cosine similarity (#1933 ) * Fix Numba TypingError * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 17:14:51 +01:00
bogdankostic	45df18c416	Add RCIReader for TableQA (#1909 ) * Add RCIReader * Add latest docstring and tutorial changes * Add Doc Strings * Add latest docstring and tutorial changes * Add Tests * Add Doc Strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 16:59:24 +01:00
Kristof Herrmann	6e8e3c68d9	Custom id hashing on documentstore level (#1910 ) * adding dynamic id hashing * Add latest docstring and tutorial changes * added pr review * Add latest docstring and tutorial changes * fixed tests * fix mypy error * fix mypy issue * ignore typing * fixed correct check * fixed tests * try fixing the tests * set id hash keys only if not none * dont store id_hash_keys * fix tests * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 16:58:19 +01:00
Julian Risch	a846be99d1	Extend TranslationWrapper to work with QA Generation (#1905 ) * draft translationwrapper example * draft translation of generated qa pairs * Add latest docstring and tutorial changes * fixed pass by reference by deepcopy * delete adapted tutorial 13 (test purposes only) * adapt method signature and doc string * Add latest docstring and tutorial changes * add type ignore * extend tutorial 13 with TranslationWrapper example * Add latest docstring and tutorial changes * removed duplicate code * indent if statement Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>	2022-01-03 13:30:24 +01:00
tstadel	a94c274134	Support custom headers per request in pipeline (#1861 ) * chain headers param down to document_stores * Add latest docstring and tutorial changes * fix InMemoryDocumentStore params * Add latest docstring and tutorial changes * fix TfidfRetriever params * Add latest docstring and tutorial changes * fix missing headers * Add latest docstring and tutorial changes * fix sparql client and update docs * Add latest docstring and tutorial changes * test for documentstores * pipeline tests added * update header param in docstrings * Add latest docstring and tutorial changes * refactoring: headers as implicit param * Add latest docstring and tutorial changes * remove unnecessary imports * propagade batch_size correctly * Add latest docstring and tutorial changes * revert InMemoryDocumentStore.write_documents signature * Add latest docstring and tutorial changes * remove #type: ignore * Add latest docstring and tutorial changes * replace MutableMapping by Dict * Add latest docstring and tutorial changes * improve docstrings * Add latest docstring and tutorial changes * get rid of *kwargs Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 11:38:02 +01:00
el2e10	377c20b8b1	Fix grammatical issue in optimization guides (#1941 )	2022-01-03 11:06:13 +01:00
bogdankostic	39573cf0a9	Add ParsrConverter (#1931 ) * Add ParsrConverter * Fix typing error + add Parsr to Linux CI * Fix valid_language for all converters + fix context generation for ParsrConverter * Remove ParsrConverter test from WindowsCI * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-30 10:15:11 +01:00
MichelBartels	f33c2b987a	Adding distillation loss functions from TinyBERT (#1879 ) * initial tinybertdistill commit * add tinybert distill loss * remove teacher caching for tinybert * add tinybert to distil_from method * Add latest docstring and tutorial changes * add dim mapping and fix type hints * fix type hints * fix dummy input * fix dim mapping for tinybert loss and add comments/doc strings * add test for tinybert loss * Add latest docstring and tutorial changes * add comment * fix BERT forward parameters * add doc string to AdaptiveModel forward method * remove unnecessary data silo * fix farm import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-23 14:54:02 +01:00
javier ramírez	5c7f3c234e	Fix minor typo in readme (#1900 ) I just added a missing "r" to the word "contributions" at the "Overview and Usage" section	2021-12-16 13:31:27 +01:00
Alberto Villa	1bb6244a63	Exchanged minimal with minimum in print_answers function call (#1890 )	2021-12-14 15:27:37 +01:00
Alberto Villa	2396f0cd3a	Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881 )	2021-12-14 10:50:25 +01:00
tstadel	57a04631df	introduce node_input param (#1854 ) * introduce node_input param * Add latest docstring and tutorial changes * prediction and label as node_input values * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-14 10:34:35 +01:00
Branden Chan	ea5aab23ec	Update pydoc-markdown-file-classifier.yml (#1856 ) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Prevent wrapping DataParallel in second DataParallel (#1855) * Prevent wrapping DataParallel in second DataParallel * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Create v1.0 docs (#1862) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Rebase and apply change to v1.0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2021-12-08 18:19:03 +01:00
Branden Chan	ef1e531895	Create v1.0 docs (#1862 )	2021-12-08 17:53:00 +01:00
bogdankostic	cbfe2b4626	Prevent wrapping DataParallel in second DataParallel (#1855 ) * Prevent wrapping DataParallel in second DataParallel * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-08 09:56:45 +01:00
Julian Risch	54f776350c	Update evaluation tutorial to cover the new `pipeline.eval()` (#1765 ) * Replace old tutorial 5 with new code based on test cases * Add latest docstring and tutorial changes * Use pipeline.eval() in tutorial * Add latest docstring and tutorial changes * Restructure notebook * Add latest docstring and tutorial changes * Add dataframe example * Add latest docstring and tutorial changes * Get eval data from doc store * Add latest docstring and tutorial changes * Load data from doc store * Add latest docstring and tutorial changes * Clear outputs * Add latest docstring and tutorial changes * Change example and add python script * Add latest docstring and tutorial changes * Fetch aggregated multilabels from doc store * Add latest docstring and tutorial changes * Incorporate review feedback on text comments * Add latest docstring and tutorial changes * Add Notebook output * Remove queries param from pipeline.eval() * Add latest docstring and tutorial changes * Add output with all metrics * Add printing of multiple metrics to script * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-03 11:19:41 +01:00
tstadel	180c05365a	Deprecate old pipeline eval nodes: EvalDocuments and EvalAnswers (#1778 ) * log deprecated warning on init * deprecation warning included into docstrings * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-02 18:09:26 +01:00
tstadel	dc4cd49049	remove queries param from pipeline.eval() (#1836 ) * remove queries param from pipeline.eval() * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-02 16:04:01 +01:00
tstadel	c5540d05ed	Calculation of metrics and presentation of eval results (#1760 ) * retriever metrics added * Add latest docstring and tutorial changes * answer and document level matching metrics implemented * Add latest docstring and tutorial changes * answer related metrics for retriever * basic reader metrics implemented * handle no_answers * fix typing * fix tests * fix tests without sas * first draft for simulated top k * rename sas and f1 columns in dataframe * refactoring of EvaluationResult * Add latest docstring and tutorial changes * more eval tests added * fix sas expected value precision * distinction between ir and qa recall * EvaluationResult.worst_queries() implemented * print_evaluation_report() added * eval report for QA Pipeline improved * dynamic metrics for worst queries calc * Add latest docstring and tutorial changes * method names adjusted * simple test for print_eval_report() added * improved documentation * Add latest docstring and tutorial changes * minor formatting * Add latest docstring and tutorial changes * fix no_answer cases * adjust one docstring * Add latest docstring and tutorial changes * fix no_answer cases for sas * batchmode for sas implemented * fix for retriever metrics if there are only no_answers * fix multilabel tests * improve documentation for pipeline.eval() * streamline multilabel aggregates and docs * Add latest docstring and tutorial changes * fix multilabel tests * unify document_id * add dataframe schema description to EvaluationResult * Add latest docstring and tutorial changes * rename worst_queries to wrong_examples * Add latest docstring and tutorial changes * make query digesting standard pipelines work with pipeline.eval() * Add latest docstring and tutorial changes * tests for multi retriever pipelines added * remove unnecessary import * print_eval_report(): support all pipelines without junctions * Add latest docstring and tutorial changes * fix typos * Add latest docstring and tutorial changes * fix minor simulated_top_k bug and use memory documentstore throughout tests * sas model param description improved * Add latest docstring and tutorial changes * rename recall metrics * Add latest docstring and tutorial changes * fix mean average precision link * Add latest docstring and tutorial changes * adjust sas description docstring * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-11-30 19:26:34 +01:00
AhmedIdr	56e4e8486f	Added max_seq_length and batch_size params to embeddingretriever (#1817 ) * Added max_seq_length and batch_size params, added progress_bar to faiss writing_documents * Add latest docstring and tutorial changes * fixed typos * Update dense.py Changed default batch_size and max_seq_len in EmbeddingRetriever * Add latest docstring and tutorial changes * Update faiss.py Change import tqdm.auto to tqdm * Update faiss.py Changing tqdm back to tqdm.auto Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-11-29 19:49:51 +01:00
bogdankostic	eb5f7bb4c0	Add AzureConverter to support table parsing from documents (#1813 ) * Add FormRecognizerConverter * Change signature of convert method + change return type of all converters * Adapt preprocessing util to new return type of converters * Parametrize number of lines used for surrounding context of table * Change name from FormRecognizerConverter to AzureConverter * Set version of azure-ai-formrecognizer package * Change tutorial 8 based on new return type of converters * Add tests * Add latest docstring and tutorial changes * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-11-29 18:44:20 +01:00
MichelBartels	84147edcca	Model Distillation (#1758 ) * initial commit * Add latest docstring and tutorial changes * added comments and fixed bug * fixed bugs, added benchmark and added documentation * Add latest docstring and tutorial changes * fix type: ignore comment * fix logging in benchmark * fixed distillation config * Add latest docstring and tutorial changes * added type annotations * fixed distillation loss calculation * added type annotations * fixed distillation mse loss * improved model distillation benchmark config loading * added temperature for model distillation * removed uncessary imports, added comments, added named parameter calls * Add latest docstring and tutorial changes * added some more comments * added distillation test * fixed distillation test * removed unnecessary import * fix softmax dimension * add grid search * improved model distillation benchmark config * fixed model distillation hyperparameter search * added doc strings and type hints for model distillation * Add latest docstring and tutorial changes * fixed type hints * fixed type hints * fixed type hints * wrote out params instead of kwargs in DistillationDataSilo initializer * fixed type hints * fixed typo * fixed typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-11-26 18:49:30 +01:00
Julian Risch	3b8e2e7b6c	Fix link to colab notebook in tutorial 16 (#1802 ) * Fix link to colab notebook in tutorial 16 * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-11-24 13:19:20 +01:00
Sowmiya Jaganathan	04d93ec247	Introduced an arg to add synonyms - Elasticsearch (#1625 ) * Introduced an arg add synonyms to Elasticsearch * Added the test code, removed the whitespace formatting changes, and overwrote the relevant parts from the already existing mapping instead of creating new mapping. * Added the test code * Remove whitespace change * Added the doc_string with examples and link * Removed unneccessary spaces * Add latest docstring and tutorial changes * fix text_field -> content_field Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-11-23 19:10:34 +01:00

... 7 8 9 10 11 ...

661 Commits