haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-26 15:06:58 +00:00

Author	SHA1	Message	Date
tstadel	c861fdb2ce	Enable batch mode for SAS cross encoders (#1987 ) * enable batch mode for sas cross encoders * fix mypy * comment on top_1 values added	2022-01-11 17:54:43 +01:00
Sara Zan	9c3d9b4885	Add models to demo docker image (#1978 ) * Add utility to cache models and nltk data & modify Dockerfiles to use it * Fix punkt data not being cached	2022-01-11 16:37:45 +01:00
tstadel	192e03be33	Fix elasticsearch scores if they are 0.0 (#1980 ) * fix elasticsearch zero scores * remove unnecessary None check	2022-01-11 09:35:02 +01:00
Mathew Kuriakose	a44b6c18c0	Unify vector_dim and embedding_dim parameter in Document Store (#1922 ) * Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores * Unit test cases updated to use `embedding_dim` instead of `vector_dim` * Unit test case update to use embedding_dim instead of vector_dim * Add latest docstring and tutorial changes * Put usage of `vector_dim` param in same if-block as corresponding warning Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-01-10 18:10:32 +01:00
Benjamin Bossan	00dc30ae54	Use scikit-learn, not sklearn, in requirements.txt (#1974 )	2022-01-10 09:56:34 +01:00
ju-gu	b7041941df	change column order for evaluatation dataframe (#1957 ) * change column order for evaluatation dataframe * added missing eval column node_input * generic order for both document and answer returning nodes; ensure no columns get lost Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-07 14:13:28 +01:00
oryx1729	5b3f693562	Fix Dockerfile-GPU (#1969 )	2022-01-06 11:13:04 +01:00
mathislucka	db76a5c5c6	fix UserWarning from slow tensor conversion (#1948 ) * fix UserWarning from slow tensor conversion * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-05 22:42:54 +01:00
Julian Risch	30ea1d475d	check multiprocessing sharing strategy is available (#1965 ) * check multiprocessing sharing strategy is available * Change default of multiprocessing strategy to None * Change default sharing strategy to None in retriever * Add latest docstring and tutorial changes * Make logging message easier to understand Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-05 18:22:09 +01:00
oryx1729	e075663feb	Upgrade torch version (#1960 )	2022-01-05 18:14:14 +01:00
Yorick van Zweeden	65cd39b533	Fix vector_id collision in FAISS (#1961 ) * Fix FAISS vector_id count * Fix mypy errors Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>	2022-01-05 18:10:47 +01:00
MichelBartels	0b0b9689a4	Add TinyBERT data augmentation (#1923 ) * add tinybert data augmentation * don't reload glove in tinybert data augmentation * fix unnecessary load_glove call * fix type hints * add comments and type hints * add batch_size argument * don't predict subwords as alternative for words * fix subword predictions * limit sequence length * actually limit sequence length * improve performance by calculating nearest glove vector on gpu * add model and tokenizer parameter * fix type hints * improve data augmentation performance * explained limits of script * corrected comment * added data augmentation test * don't label every question in augmented dataset as impossible * add sample glove * better handling of downloading of glove * fix typo of last commit	2022-01-04 18:34:16 +01:00
oryx1729	854af92dc5	Update docker_build.yml	2022-01-04 17:46:34 +01:00
oryx1729	2910f67718	Use long Commit ID for Docker tags (#1946 )	2022-01-04 17:39:49 +01:00
Yorick van Zweeden	180befd07a	Propagate duplicate_documents to base class initialization (#1936 ) * Add duplicate_documents to base class initialization * Remove redundant assignment in subclasses Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>	2022-01-04 15:04:15 +01:00
oryx1729	00c823cdff	Add GitHub Action for Docker Build for GPU (#1916 )	2022-01-04 14:33:13 +01:00
Alon Eirew	7a4fa42fda	Fix #1927 - RuntimeError when loading data using data_silo due to many open file descriptors from multiprocessing (#1928 ) * fix #1687 * fix RuntimeError: received 0 items of ancdata * Add an arg multiprocessing_strategy to DataSilo and DPR.train() * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-04 13:29:26 +01:00
bogdankostic	381fc302cb	Fix loading a saved `FAISSDocumentStore` (#1937 ) * Remove faiss_index param from config * Add Tests * Add assertions to tests	2022-01-04 12:22:31 +01:00
bogdankostic	3e0ef1cc8a	Fix Numba TypingError in `normalize_embedding` for cosine similarity (#1933 ) * Fix Numba TypingError * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 17:14:51 +01:00
bogdankostic	202ef276ee	Make sure content_type exists (#1938 )	2022-01-03 17:00:31 +01:00
bogdankostic	c85ac2baec	Update Ray to version 1.9.1 (#1934 )	2022-01-03 16:59:58 +01:00
bogdankostic	45df18c416	Add RCIReader for TableQA (#1909 ) * Add RCIReader * Add latest docstring and tutorial changes * Add Doc Strings * Add latest docstring and tutorial changes * Add Tests * Add Doc Strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 16:59:24 +01:00
Kristof Herrmann	6e8e3c68d9	Custom id hashing on documentstore level (#1910 ) * adding dynamic id hashing * Add latest docstring and tutorial changes * added pr review * Add latest docstring and tutorial changes * fixed tests * fix mypy error * fix mypy issue * ignore typing * fixed correct check * fixed tests * try fixing the tests * set id hash keys only if not none * dont store id_hash_keys * fix tests * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 16:58:19 +01:00
Julian Risch	a846be99d1	Extend TranslationWrapper to work with QA Generation (#1905 ) * draft translationwrapper example * draft translation of generated qa pairs * Add latest docstring and tutorial changes * fixed pass by reference by deepcopy * delete adapted tutorial 13 (test purposes only) * adapt method signature and doc string * Add latest docstring and tutorial changes * add type ignore * extend tutorial 13 with TranslationWrapper example * Add latest docstring and tutorial changes * removed duplicate code * indent if statement Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>	2022-01-03 13:30:24 +01:00
tstadel	a94c274134	Support custom headers per request in pipeline (#1861 ) * chain headers param down to document_stores * Add latest docstring and tutorial changes * fix InMemoryDocumentStore params * Add latest docstring and tutorial changes * fix TfidfRetriever params * Add latest docstring and tutorial changes * fix missing headers * Add latest docstring and tutorial changes * fix sparql client and update docs * Add latest docstring and tutorial changes * test for documentstores * pipeline tests added * update header param in docstrings * Add latest docstring and tutorial changes * refactoring: headers as implicit param * Add latest docstring and tutorial changes * remove unnecessary imports * propagade batch_size correctly * Add latest docstring and tutorial changes * revert InMemoryDocumentStore.write_documents signature * Add latest docstring and tutorial changes * remove #type: ignore * Add latest docstring and tutorial changes * replace MutableMapping by Dict * Add latest docstring and tutorial changes * improve docstrings * Add latest docstring and tutorial changes * get rid of *kwargs Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 11:38:02 +01:00
el2e10	377c20b8b1	Fix grammatical issue in optimization guides (#1941 )	2022-01-03 11:06:13 +01:00
Alon Eirew	a1fb70bbbd	Make ctx_segment_ids a list instead of np.zeros_like * fix #1687 * fix - UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow.. * fix RuntimeError: received 0 items of ancdata * Remove set_sharing_strategy from this branch and replace numpy.zeros_like with python numpy	2022-01-03 08:33:55 +01:00
bogdankostic	39573cf0a9	Add ParsrConverter (#1931 ) * Add ParsrConverter * Fix typing error + add Parsr to Linux CI * Fix valid_language for all converters + fix context generation for ParsrConverter * Remove ParsrConverter test from WindowsCI * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-30 10:15:11 +01:00
Markus Paff	04f3b39ad5	Text for contributor license agreement (#1766 ) * text for contributor license agreement * formatting * Add details about process * test	2021-12-28 14:01:20 +01:00
MichelBartels	f33c2b987a	Adding distillation loss functions from TinyBERT (#1879 ) * initial tinybertdistill commit * add tinybert distill loss * remove teacher caching for tinybert * add tinybert to distil_from method * Add latest docstring and tutorial changes * add dim mapping and fix type hints * fix type hints * fix dummy input * fix dim mapping for tinybert loss and add comments/doc strings * add test for tinybert loss * Add latest docstring and tutorial changes * add comment * fix BERT forward parameters * add doc string to AdaptiveModel forward method * remove unnecessary data silo * fix farm import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-23 14:54:02 +01:00
tstadel	fc8df2163d	Fix Windows CI OOM (#1878 ) * set fixture scope to "function" * run FARMReader without multiprocessing * dispose off ray after tests * run most expensive tasks first in test files * run expensive tests first * run garbage collector between tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-22 17:20:23 +01:00
tstadel	7bdb782871	Raise exception if Elasticsearch search_fields have wrong datatype (#1913 )	2021-12-20 16:10:55 +01:00
Dmitry Goryunov	42a0fc3860	Include ray version compatible with M1 processor (#1906 )	2021-12-20 10:16:59 +01:00
Johnny-KP	51e84b805b	Changed export to csv method to new answer format (#1907 )	2021-12-17 16:10:29 +01:00
bogdankostic	74c80e0c71	Set mypy version to 0.910 (#1899 )	2021-12-16 14:02:04 +01:00
javier ramírez	5c7f3c234e	Fix minor typo in readme (#1900 ) I just added a missing "r" to the word "contributions" at the "Overview and Usage" section	2021-12-16 13:31:27 +01:00
bogdankostic	4edec04c2c	Add improvements to AzureConverter (#1896 ) * Add some improvements to AzureConverter * Adapt docstring + use Path instead of str * Fix mypy version to 0.910	2021-12-16 12:45:24 +01:00
Alberto Villa	e4aec4661d	Improved version of print_answers (#1891 ) * Improved version of print_answers * Changed the way max_text_len is checked	2021-12-15 17:16:33 +01:00
Alberto Villa	1bb6244a63	Exchanged minimal with minimum in print_answers function call (#1890 )	2021-12-14 15:27:37 +01:00
Alberto Villa	2396f0cd3a	Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881 )	2021-12-14 10:50:25 +01:00
tstadel	57a04631df	introduce node_input param (#1854 ) * introduce node_input param * Add latest docstring and tutorial changes * prediction and label as node_input values * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-14 10:34:35 +01:00
Ivan Lopez	86f5688f47	fix wrong branch and repo, add cloudwatch agent (#1877 )	2021-12-13 20:32:25 +01:00
Sara Zan	de71b944d7	Fix typo in the Windows CI UI deps (#1876 ) * Fix typo in the WindowsCI UI deps * Force a deps cache miss	2021-12-13 15:49:44 +01:00
Malte Pietsch	7084a24794	Bump version to 1.0 in REST api (#1875 )	2021-12-13 12:39:59 +01:00
Julian Risch	2c184e467f	Upgrade transformers to 4.13.0 (#1659 ) * upgrade to pytorch 1.10 and transformers 4.11.3 * pin torch to 1.9.1 * Upgrade transformers and torch to 4.12.2 and 1.10.0 * Test transformers 4.10.2 * Pin transformers to 4.10.2 * transformers 4.10.3 * transformers 4.11.0 * transformers 4.11.1 * transformers 4.11.2 * check fix on current transformer's master branch * Install transformers from commit id * update transformers to 4.12.5 * Upgrade torch version for torch-scatter * Upgrade torch version for torch-scatter in Windows CI * Build new cache * Undo last commit * Use transformers v4.11.2 * bump transformers to 4.12.5 * bump transformers to 4.13.0 * re-allow range of torch versions Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: bogdankostic <bogdankostic@web.de>	2021-12-11 12:08:16 +01:00
Fabrice Depaulis	77d52ad215	Rely api healthcheck on status code rather than json decoding (#1871 ) * Rely api healthcheck on status code rather than json decoding * Install UI dependencies on the Linux and Windows CI Co-authored-by: Fabrice Depaulis <fabrice.depaulis@orange.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2021-12-10 18:05:23 +01:00
Andreas Motl	4eb4503f25	Fix typo (#1869 )	2021-12-10 09:39:45 +01:00
Branden Chan	ea5aab23ec	Update pydoc-markdown-file-classifier.yml (#1856 ) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Prevent wrapping DataParallel in second DataParallel (#1855) * Prevent wrapping DataParallel in second DataParallel * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Create v1.0 docs (#1862) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Rebase and apply change to v1.0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2021-12-08 18:19:03 +01:00
Branden Chan	ef1e531895	Create v1.0 docs (#1862 )	2021-12-08 17:53:00 +01:00
bogdankostic	cbfe2b4626	Prevent wrapping DataParallel in second DataParallel (#1855 ) * Prevent wrapping DataParallel in second DataParallel * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-08 09:56:45 +01:00

... 51 52 53 54 55 ...

3597 Commits