haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-01 10:19:23 +00:00

Author	SHA1	Message	Date
MichelBartels	f33c2b987a	Adding distillation loss functions from TinyBERT (#1879 ) * initial tinybertdistill commit * add tinybert distill loss * remove teacher caching for tinybert * add tinybert to distil_from method * Add latest docstring and tutorial changes * add dim mapping and fix type hints * fix type hints * fix dummy input * fix dim mapping for tinybert loss and add comments/doc strings * add test for tinybert loss * Add latest docstring and tutorial changes * add comment * fix BERT forward parameters * add doc string to AdaptiveModel forward method * remove unnecessary data silo * fix farm import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-23 14:54:02 +01:00
tstadel	fc8df2163d	Fix Windows CI OOM (#1878 ) * set fixture scope to "function" * run FARMReader without multiprocessing * dispose off ray after tests * run most expensive tasks first in test files * run expensive tests first * run garbage collector between tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-22 17:20:23 +01:00
tstadel	7bdb782871	Raise exception if Elasticsearch search_fields have wrong datatype (#1913 )	2021-12-20 16:10:55 +01:00
Dmitry Goryunov	42a0fc3860	Include ray version compatible with M1 processor (#1906 )	2021-12-20 10:16:59 +01:00
Johnny-KP	51e84b805b	Changed export to csv method to new answer format (#1907 )	2021-12-17 16:10:29 +01:00
bogdankostic	74c80e0c71	Set mypy version to 0.910 (#1899 )	2021-12-16 14:02:04 +01:00
javier ramírez	5c7f3c234e	Fix minor typo in readme (#1900 ) I just added a missing "r" to the word "contributions" at the "Overview and Usage" section	2021-12-16 13:31:27 +01:00
bogdankostic	4edec04c2c	Add improvements to AzureConverter (#1896 ) * Add some improvements to AzureConverter * Adapt docstring + use Path instead of str * Fix mypy version to 0.910	2021-12-16 12:45:24 +01:00
Alberto Villa	e4aec4661d	Improved version of print_answers (#1891 ) * Improved version of print_answers * Changed the way max_text_len is checked	2021-12-15 17:16:33 +01:00
Alberto Villa	1bb6244a63	Exchanged minimal with minimum in print_answers function call (#1890 )	2021-12-14 15:27:37 +01:00
Alberto Villa	2396f0cd3a	Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881 )	2021-12-14 10:50:25 +01:00
tstadel	57a04631df	introduce node_input param (#1854 ) * introduce node_input param * Add latest docstring and tutorial changes * prediction and label as node_input values * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-14 10:34:35 +01:00
Ivan Lopez	86f5688f47	fix wrong branch and repo, add cloudwatch agent (#1877 )	2021-12-13 20:32:25 +01:00
Sara Zan	de71b944d7	Fix typo in the Windows CI UI deps (#1876 ) * Fix typo in the WindowsCI UI deps * Force a deps cache miss	2021-12-13 15:49:44 +01:00
Malte Pietsch	7084a24794	Bump version to 1.0 in REST api (#1875 )	2021-12-13 12:39:59 +01:00
Julian Risch	2c184e467f	Upgrade transformers to 4.13.0 (#1659 ) * upgrade to pytorch 1.10 and transformers 4.11.3 * pin torch to 1.9.1 * Upgrade transformers and torch to 4.12.2 and 1.10.0 * Test transformers 4.10.2 * Pin transformers to 4.10.2 * transformers 4.10.3 * transformers 4.11.0 * transformers 4.11.1 * transformers 4.11.2 * check fix on current transformer's master branch * Install transformers from commit id * update transformers to 4.12.5 * Upgrade torch version for torch-scatter * Upgrade torch version for torch-scatter in Windows CI * Build new cache * Undo last commit * Use transformers v4.11.2 * bump transformers to 4.12.5 * bump transformers to 4.13.0 * re-allow range of torch versions Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: bogdankostic <bogdankostic@web.de>	2021-12-11 12:08:16 +01:00
Fabrice Depaulis	77d52ad215	Rely api healthcheck on status code rather than json decoding (#1871 ) * Rely api healthcheck on status code rather than json decoding * Install UI dependencies on the Linux and Windows CI Co-authored-by: Fabrice Depaulis <fabrice.depaulis@orange.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2021-12-10 18:05:23 +01:00
Andreas Motl	4eb4503f25	Fix typo (#1869 )	2021-12-10 09:39:45 +01:00
Branden Chan	ea5aab23ec	Update pydoc-markdown-file-classifier.yml (#1856 ) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Prevent wrapping DataParallel in second DataParallel (#1855) * Prevent wrapping DataParallel in second DataParallel * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Create v1.0 docs (#1862) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Rebase and apply change to v1.0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2021-12-08 18:19:03 +01:00
Branden Chan	ef1e531895	Create v1.0 docs (#1862 )	2021-12-08 17:53:00 +01:00
bogdankostic	cbfe2b4626	Prevent wrapping DataParallel in second DataParallel (#1855 ) * Prevent wrapping DataParallel in second DataParallel * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-08 09:56:45 +01:00
Malte Pietsch	8cb513c2c6	Bump version to 1.0.0 v1.0.0	2021-12-07 15:13:24 +01:00
Sara Zan	983b20f28d	Demo UI fix debug info (#1846 ) * Fix debug info * Make enter to run work better * Reintroduce default question in the eval dataset * Outputting valid json instead of a Python dict	2021-12-06 18:55:39 +01:00
KUNPENG GUO	160f81aaa3	Fix bug ranker: wrong lambda function (#1824 ) * Fix bug ranker: wrong lambda function The zip function used in line 110 intends to choose the logits array to be the key for the lambda function while it should be the first/second logit of the logit array which corresponds to the classification label (has_answer) * Use label 1 as has_answer label * generic ranker (add if-cond for logits vector shape) * remove test code * remove test code... * add two_logits test case for ranker module. * complete the documentation of ranker, support rankers with 1 or 2 logits as output	2021-12-06 17:13:57 +01:00
Sara Zan	8b7b51f0f5	Typo spotted in one question. Removed question that returned wrong answer. Added a couple more that work. (#1843 )	2021-12-06 15:44:08 +01:00
Julian Risch	aa1520212f	workaround torch bug with non-continguous tensors (#1845 )	2021-12-06 15:10:51 +01:00
Ivan Lopez	4f6dc36869	Deploy demo (#1837 ) * Add GH Actions workflow for demo deployment * update demo ec2 instance type * remove redundant docker-compose build * add custom demo command and env vars * deploy demo on updates to workflow resources	2021-12-03 15:58:47 +01:00
Branden Chan	bec14b63c3	Add live demo link to readme (#1839 )	2021-12-03 14:34:19 +01:00
Malte Pietsch	90ced1b246	Update release.yml	2021-12-03 13:23:55 +01:00
Malte Pietsch	e5599bd337	Extend categories for release notes (#1841 )	2021-12-03 13:19:45 +01:00
Malte Pietsch	4e76129004	Add config for github release notes (#1840 )	2021-12-03 12:27:58 +01:00
Julian Risch	54f776350c	Update evaluation tutorial to cover the new `pipeline.eval()` (#1765 ) * Replace old tutorial 5 with new code based on test cases * Add latest docstring and tutorial changes * Use pipeline.eval() in tutorial * Add latest docstring and tutorial changes * Restructure notebook * Add latest docstring and tutorial changes * Add dataframe example * Add latest docstring and tutorial changes * Get eval data from doc store * Add latest docstring and tutorial changes * Load data from doc store * Add latest docstring and tutorial changes * Clear outputs * Add latest docstring and tutorial changes * Change example and add python script * Add latest docstring and tutorial changes * Fetch aggregated multilabels from doc store * Add latest docstring and tutorial changes * Incorporate review feedback on text comments * Add latest docstring and tutorial changes * Add Notebook output * Remove queries param from pipeline.eval() * Add latest docstring and tutorial changes * Add output with all metrics * Add printing of multiple metrics to script * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-03 11:19:41 +01:00
tstadel	9293a902d7	Fix OOM in test_eval.py Windows CI (#1830 ) * diable problematic eval tests for windows ci * move standard pipeline eval tests to separate test file * switch to elasticsearch documentstore to reduce inproc mem * Revert "switch to elasticsearch documentstore to reduce inproc mem" This reverts commit 7a75871909c3317a252dff3a4df17e99eff69d05. * get retiever from conftest * use smaller embedding model for summarizer * use smaller summarizer model * remove queries param from pipeline.eval() * isolate problematic tests * rename separate test file * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-02 19:23:58 +01:00
tstadel	180c05365a	Deprecate old pipeline eval nodes: EvalDocuments and EvalAnswers (#1778 ) * log deprecated warning on init * deprecation warning included into docstrings * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> v1.0.0rc1	2021-12-02 18:09:26 +01:00
tstadel	dc4cd49049	remove queries param from pipeline.eval() (#1836 ) * remove queries param from pipeline.eval() * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-02 16:04:01 +01:00
Sara Zan	99365e1d8e	Add backlink below the context, if available in the doc's meta (#1834 )	2021-12-02 13:37:23 +01:00
tstadel	bab05c7677	Fix loading and saving of EvaluationReszult (#1831 ) * fix spans in csvs * fix tests	2021-12-02 10:30:11 +01:00
Sara Zan	c21521dc9c	More demo bugfixes (#1832 ) * Trying to fix a bug occurring when dataset is None (happens with many parallel request for some reason) * Change favicon and title and fix bug with version number * Improve the text description and partially fix the enter-to-run function	2021-12-01 22:25:59 +01:00
Sara Zan	e39d015a59	Allow SQLDocumentStore to filter by many filters (#1776 ) * Aliasing the join is not sufficient yet * Update the filter query in some other functions of SQLDocumentStore - this functionality should be centralized * Adding tests for get_all_documents, now failing * Fix tests * Fix typo spotted by mypy	2021-12-01 16:16:17 +01:00
tstadel	c5540d05ed	Calculation of metrics and presentation of eval results (#1760 ) * retriever metrics added * Add latest docstring and tutorial changes * answer and document level matching metrics implemented * Add latest docstring and tutorial changes * answer related metrics for retriever * basic reader metrics implemented * handle no_answers * fix typing * fix tests * fix tests without sas * first draft for simulated top k * rename sas and f1 columns in dataframe * refactoring of EvaluationResult * Add latest docstring and tutorial changes * more eval tests added * fix sas expected value precision * distinction between ir and qa recall * EvaluationResult.worst_queries() implemented * print_evaluation_report() added * eval report for QA Pipeline improved * dynamic metrics for worst queries calc * Add latest docstring and tutorial changes * method names adjusted * simple test for print_eval_report() added * improved documentation * Add latest docstring and tutorial changes * minor formatting * Add latest docstring and tutorial changes * fix no_answer cases * adjust one docstring * Add latest docstring and tutorial changes * fix no_answer cases for sas * batchmode for sas implemented * fix for retriever metrics if there are only no_answers * fix multilabel tests * improve documentation for pipeline.eval() * streamline multilabel aggregates and docs * Add latest docstring and tutorial changes * fix multilabel tests * unify document_id * add dataframe schema description to EvaluationResult * Add latest docstring and tutorial changes * rename worst_queries to wrong_examples * Add latest docstring and tutorial changes * make query digesting standard pipelines work with pipeline.eval() * Add latest docstring and tutorial changes * tests for multi retriever pipelines added * remove unnecessary import * print_eval_report(): support all pipelines without junctions * Add latest docstring and tutorial changes * fix typos * Add latest docstring and tutorial changes * fix minor simulated_top_k bug and use memory documentstore throughout tests * sas model param description improved * Add latest docstring and tutorial changes * rename recall metrics * Add latest docstring and tutorial changes * fix mean average precision link * Add latest docstring and tutorial changes * adjust sas description docstring * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-11-30 19:26:34 +01:00
ju-gu	4cce7ffe85	bugfix metadata extraction in form recognizer & split of surrounding content length (#1829 ) * bugfix metadata extraxtion in the formrecognizer and seperation of surrounding in preceding and following content length * Fix docstring * fix metadata extraction for content_type text Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-11-30 19:10:21 +01:00
Sara Zan	935689e630	Demo UI add env vars & other small fixes (#1828 ) * Add more env vars to the streamlit ui * Add some more questions to the random ones * Relax a statuscode check and rename env vars * Make query error message more descriptive * Add log message * Align docker-compose with and without GPU * Typo in pipeline filename * Remove prefix from var in docker_compose * Align docker-compose.yml and add small sleep to the initialized poller to prevent spamming * Fix the name of the dockerfile used to build the GPU image	2021-11-30 18:11:54 +01:00
AhmedIdr	56e4e8486f	Added max_seq_length and batch_size params to embeddingretriever (#1817 ) * Added max_seq_length and batch_size params, added progress_bar to faiss writing_documents * Add latest docstring and tutorial changes * fixed typos * Update dense.py Changed default batch_size and max_seq_len in EmbeddingRetriever * Add latest docstring and tutorial changes * Update faiss.py Change import tqdm.auto to tqdm * Update faiss.py Changing tqdm back to tqdm.auto Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-11-29 19:49:51 +01:00
Sara Zan	fb511dc4a3	Remove feedback from no-answers (#1827 ) * Fix some miscopied code * Remove feedback from the no-answer, seems the backend can't take it * Try to raise concurrent requests per worker * Remove the actual number of workers	2021-11-29 19:42:10 +01:00
bogdankostic	eb5f7bb4c0	Add AzureConverter to support table parsing from documents (#1813 ) * Add FormRecognizerConverter * Change signature of convert method + change return type of all converters * Adapt preprocessing util to new return type of converters * Parametrize number of lines used for surrounding context of table * Change name from FormRecognizerConverter to AzureConverter * Set version of azure-ai-formrecognizer package * Change tutorial 8 based on new return type of converters * Add tests * Add latest docstring and tutorial changes * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-11-29 18:44:20 +01:00
Sara Zan	c29f960c47	Fix UI demo feedback (#1816 ) * Fix the feedback function of the demo with a workaround * Some docstring * Update tests and rename methods in feedback.py * Fix tests * Remove operation_ids * Add a couple of status code checks	2021-11-29 17:03:54 +01:00
MichelBartels	84147edcca	Model Distillation (#1758 ) * initial commit * Add latest docstring and tutorial changes * added comments and fixed bug * fixed bugs, added benchmark and added documentation * Add latest docstring and tutorial changes * fix type: ignore comment * fix logging in benchmark * fixed distillation config * Add latest docstring and tutorial changes * added type annotations * fixed distillation loss calculation * added type annotations * fixed distillation mse loss * improved model distillation benchmark config loading * added temperature for model distillation * removed uncessary imports, added comments, added named parameter calls * Add latest docstring and tutorial changes * added some more comments * added distillation test * fixed distillation test * removed unnecessary import * fix softmax dimension * add grid search * improved model distillation benchmark config * fixed model distillation hyperparameter search * added doc strings and type hints for model distillation * Add latest docstring and tutorial changes * fixed type hints * fixed type hints * fixed type hints * wrote out params instead of kwargs in DistillationDataSilo initializer * fixed type hints * fixed typo * fixed typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-11-26 18:49:30 +01:00
Sara Zan	1a4ee21b92	Adapt docker-compose-gpu.yml to use DPR by default (#1810 ) * Adapt docker-compose-gpu.yml to use DPR by default * Update the comments * Change the ES image * Increase the context window and allow no-answers in the DPR pipeline too * Re-enable file upload in GPU version * Add env var without value and a commet to explain it	2021-11-25 16:23:18 +01:00
Sara Zan	9ee0ea0c17	Add description to the demo (#1809 ) * Improve the Random Question functionality and add three example questions * Fix the example questions * Change default docs for the retriever * Add example short description and make the no-answer boxes blue * Modify some text and add a fix for the slider's bug * New no-answer message	2021-11-25 15:27:09 +01:00
Sara Zan	742d4b9db9	Improve the Random Question functionality (#1808 ) * Improve the Random Question functionality and add three example questions * Fix the example questions * Change default docs for the retriever	2021-11-24 15:55:44 +01:00

1 2 3 4 5 ...

968 Commits