haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-19 06:52:56 +00:00

Author	SHA1	Message	Date
Julian Risch	5695d721aa	update link to annotation tool docu (#2005 ) * update link to annotation tool docu * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 16:10:59 +01:00
Julian Risch	a3147cae47	Add isolated node eval mode in pipeline eval (#1962 ) * run predictions on ground-truth docs in reader * build dataframe for closed/open domain eval * fix looping through multilabel * fix looping through multilabel's list of labels * simplify collecting relevant docs * switch closed-domain eval off by default * Add latest docstring and tutorial changes * handle edge case params not given * renaming & generate pipeline eval report * add test case for closed-domain eval metrics * Add latest docstring and tutorial changes * test report of closed-domain eval * report closed-domain metrics only for answer metrics not doc metrics * refactoring * fix mypy & remove comment * add second for-loop & use answer as method input * renaming & add separate loop building docs eval df * Add latest docstring and tutorial changes * source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957) conda activate haystack-dev2 * change column order for evaluatation dataframe * added missing eval column node_input * generic order for both document and answer returning nodes; ensure no columns get lost Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> * fix column reordering after renaming of node_input * simplify tests & add docu * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-01-14 14:37:16 +01:00
Sara Zan	e28bf618d7	Implement proper FK in `MetaDocumentORM` and `MetaLabelORM` to work on PostgreSQL (#1990 ) * Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints * update_document_meta() was not using index properly * Exclude ES and Memory from the cosine_sanity_check test * move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 13:48:58 +01:00
MichelBartels	3e4dbbb32c	Align similarity scores across document stores (#1967 ) * align document store similarity functions * remove unnecessary imports * undone accidental change * stopped weaviate from pretending to support dot product similarity * stopped weaviate from pretending to support dot product similarity * Add latest docstring and tutorial changes * fix fixture params for document stores * use cosine similarity for most tests * fix cosine similarity test * fix faiss test * fix weaviate test * fix accidental deletion * fix document_store fixture * test fix; shouldn't be merged * fix test_normalize_embeddings_diff_shapes * probably a better fix * fix for parameter combinations * revert new pytest_generate_tests functionality * simplify pytest_generate_tests * normalize embeddings for test_dpr_embedding * add to faiss doc that embeddings are normalized * Add latest docstring and tutorial changes * remove unnecessary parameters and add comments * simplify two lines of memory.py into one * test similarity scores with smaller language model * fix test_similarity_score Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-12 19:28:20 +01:00
Manos Papathanasiou	965b9614db	Upgrade pillow version to 9.0.0 (#1992 )	2022-01-12 09:59:51 +01:00
Dmitry Goryunov	79fdda8a7c	Remove hard-coded variables from the Tutorial 15 (#1984 ) * Remove hard-coded variables from the Tutorial 15 * Fix missing comma * Add latest docstring and tutorial changes * Fix formatting in Tutorial15_TableQA.ipynb * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-11 17:55:20 +01:00
tstadel	c861fdb2ce	Enable batch mode for SAS cross encoders (#1987 ) * enable batch mode for sas cross encoders * fix mypy * comment on top_1 values added	2022-01-11 17:54:43 +01:00
Sara Zan	9c3d9b4885	Add models to demo docker image (#1978 ) * Add utility to cache models and nltk data & modify Dockerfiles to use it * Fix punkt data not being cached	2022-01-11 16:37:45 +01:00
tstadel	192e03be33	Fix elasticsearch scores if they are 0.0 (#1980 ) * fix elasticsearch zero scores * remove unnecessary None check	2022-01-11 09:35:02 +01:00
Mathew Kuriakose	a44b6c18c0	Unify vector_dim and embedding_dim parameter in Document Store (#1922 ) * Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores * Unit test cases updated to use `embedding_dim` instead of `vector_dim` * Unit test case update to use embedding_dim instead of vector_dim * Add latest docstring and tutorial changes * Put usage of `vector_dim` param in same if-block as corresponding warning Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-01-10 18:10:32 +01:00
Benjamin Bossan	00dc30ae54	Use scikit-learn, not sklearn, in requirements.txt (#1974 )	2022-01-10 09:56:34 +01:00
ju-gu	b7041941df	change column order for evaluatation dataframe (#1957 ) * change column order for evaluatation dataframe * added missing eval column node_input * generic order for both document and answer returning nodes; ensure no columns get lost Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-07 14:13:28 +01:00
oryx1729	5b3f693562	Fix Dockerfile-GPU (#1969 )	2022-01-06 11:13:04 +01:00
mathislucka	db76a5c5c6	fix UserWarning from slow tensor conversion (#1948 ) * fix UserWarning from slow tensor conversion * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-05 22:42:54 +01:00
Julian Risch	30ea1d475d	check multiprocessing sharing strategy is available (#1965 ) * check multiprocessing sharing strategy is available * Change default of multiprocessing strategy to None * Change default sharing strategy to None in retriever * Add latest docstring and tutorial changes * Make logging message easier to understand Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-05 18:22:09 +01:00
oryx1729	e075663feb	Upgrade torch version (#1960 )	2022-01-05 18:14:14 +01:00
Yorick van Zweeden	65cd39b533	Fix vector_id collision in FAISS (#1961 ) * Fix FAISS vector_id count * Fix mypy errors Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>	2022-01-05 18:10:47 +01:00
MichelBartels	0b0b9689a4	Add TinyBERT data augmentation (#1923 ) * add tinybert data augmentation * don't reload glove in tinybert data augmentation * fix unnecessary load_glove call * fix type hints * add comments and type hints * add batch_size argument * don't predict subwords as alternative for words * fix subword predictions * limit sequence length * actually limit sequence length * improve performance by calculating nearest glove vector on gpu * add model and tokenizer parameter * fix type hints * improve data augmentation performance * explained limits of script * corrected comment * added data augmentation test * don't label every question in augmented dataset as impossible * add sample glove * better handling of downloading of glove * fix typo of last commit	2022-01-04 18:34:16 +01:00
oryx1729	854af92dc5	Update docker_build.yml	2022-01-04 17:46:34 +01:00
oryx1729	2910f67718	Use long Commit ID for Docker tags (#1946 )	2022-01-04 17:39:49 +01:00
Yorick van Zweeden	180befd07a	Propagate duplicate_documents to base class initialization (#1936 ) * Add duplicate_documents to base class initialization * Remove redundant assignment in subclasses Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>	2022-01-04 15:04:15 +01:00
oryx1729	00c823cdff	Add GitHub Action for Docker Build for GPU (#1916 )	2022-01-04 14:33:13 +01:00
Alon Eirew	7a4fa42fda	Fix #1927 - RuntimeError when loading data using data_silo due to many open file descriptors from multiprocessing (#1928 ) * fix #1687 * fix RuntimeError: received 0 items of ancdata * Add an arg multiprocessing_strategy to DataSilo and DPR.train() * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-04 13:29:26 +01:00
bogdankostic	381fc302cb	Fix loading a saved `FAISSDocumentStore` (#1937 ) * Remove faiss_index param from config * Add Tests * Add assertions to tests	2022-01-04 12:22:31 +01:00
bogdankostic	3e0ef1cc8a	Fix Numba TypingError in `normalize_embedding` for cosine similarity (#1933 ) * Fix Numba TypingError * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 17:14:51 +01:00
bogdankostic	202ef276ee	Make sure content_type exists (#1938 )	2022-01-03 17:00:31 +01:00
bogdankostic	c85ac2baec	Update Ray to version 1.9.1 (#1934 )	2022-01-03 16:59:58 +01:00
bogdankostic	45df18c416	Add RCIReader for TableQA (#1909 ) * Add RCIReader * Add latest docstring and tutorial changes * Add Doc Strings * Add latest docstring and tutorial changes * Add Tests * Add Doc Strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 16:59:24 +01:00
Kristof Herrmann	6e8e3c68d9	Custom id hashing on documentstore level (#1910 ) * adding dynamic id hashing * Add latest docstring and tutorial changes * added pr review * Add latest docstring and tutorial changes * fixed tests * fix mypy error * fix mypy issue * ignore typing * fixed correct check * fixed tests * try fixing the tests * set id hash keys only if not none * dont store id_hash_keys * fix tests * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 16:58:19 +01:00
Julian Risch	a846be99d1	Extend TranslationWrapper to work with QA Generation (#1905 ) * draft translationwrapper example * draft translation of generated qa pairs * Add latest docstring and tutorial changes * fixed pass by reference by deepcopy * delete adapted tutorial 13 (test purposes only) * adapt method signature and doc string * Add latest docstring and tutorial changes * add type ignore * extend tutorial 13 with TranslationWrapper example * Add latest docstring and tutorial changes * removed duplicate code * indent if statement Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>	2022-01-03 13:30:24 +01:00
tstadel	a94c274134	Support custom headers per request in pipeline (#1861 ) * chain headers param down to document_stores * Add latest docstring and tutorial changes * fix InMemoryDocumentStore params * Add latest docstring and tutorial changes * fix TfidfRetriever params * Add latest docstring and tutorial changes * fix missing headers * Add latest docstring and tutorial changes * fix sparql client and update docs * Add latest docstring and tutorial changes * test for documentstores * pipeline tests added * update header param in docstrings * Add latest docstring and tutorial changes * refactoring: headers as implicit param * Add latest docstring and tutorial changes * remove unnecessary imports * propagade batch_size correctly * Add latest docstring and tutorial changes * revert InMemoryDocumentStore.write_documents signature * Add latest docstring and tutorial changes * remove #type: ignore * Add latest docstring and tutorial changes * replace MutableMapping by Dict * Add latest docstring and tutorial changes * improve docstrings * Add latest docstring and tutorial changes * get rid of *kwargs Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-03 11:38:02 +01:00
el2e10	377c20b8b1	Fix grammatical issue in optimization guides (#1941 )	2022-01-03 11:06:13 +01:00
Alon Eirew	a1fb70bbbd	Make ctx_segment_ids a list instead of np.zeros_like * fix #1687 * fix - UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow.. * fix RuntimeError: received 0 items of ancdata * Remove set_sharing_strategy from this branch and replace numpy.zeros_like with python numpy	2022-01-03 08:33:55 +01:00
bogdankostic	39573cf0a9	Add ParsrConverter (#1931 ) * Add ParsrConverter * Fix typing error + add Parsr to Linux CI * Fix valid_language for all converters + fix context generation for ParsrConverter * Remove ParsrConverter test from WindowsCI * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-30 10:15:11 +01:00
Markus Paff	04f3b39ad5	Text for contributor license agreement (#1766 ) * text for contributor license agreement * formatting * Add details about process * test	2021-12-28 14:01:20 +01:00
MichelBartels	f33c2b987a	Adding distillation loss functions from TinyBERT (#1879 ) * initial tinybertdistill commit * add tinybert distill loss * remove teacher caching for tinybert * add tinybert to distil_from method * Add latest docstring and tutorial changes * add dim mapping and fix type hints * fix type hints * fix dummy input * fix dim mapping for tinybert loss and add comments/doc strings * add test for tinybert loss * Add latest docstring and tutorial changes * add comment * fix BERT forward parameters * add doc string to AdaptiveModel forward method * remove unnecessary data silo * fix farm import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-23 14:54:02 +01:00
tstadel	fc8df2163d	Fix Windows CI OOM (#1878 ) * set fixture scope to "function" * run FARMReader without multiprocessing * dispose off ray after tests * run most expensive tasks first in test files * run expensive tests first * run garbage collector between tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-22 17:20:23 +01:00
tstadel	7bdb782871	Raise exception if Elasticsearch search_fields have wrong datatype (#1913 )	2021-12-20 16:10:55 +01:00
Dmitry Goryunov	42a0fc3860	Include ray version compatible with M1 processor (#1906 )	2021-12-20 10:16:59 +01:00
Johnny-KP	51e84b805b	Changed export to csv method to new answer format (#1907 )	2021-12-17 16:10:29 +01:00
bogdankostic	74c80e0c71	Set mypy version to 0.910 (#1899 )	2021-12-16 14:02:04 +01:00
javier ramírez	5c7f3c234e	Fix minor typo in readme (#1900 ) I just added a missing "r" to the word "contributions" at the "Overview and Usage" section	2021-12-16 13:31:27 +01:00
bogdankostic	4edec04c2c	Add improvements to AzureConverter (#1896 ) * Add some improvements to AzureConverter * Adapt docstring + use Path instead of str * Fix mypy version to 0.910	2021-12-16 12:45:24 +01:00
Alberto Villa	e4aec4661d	Improved version of print_answers (#1891 ) * Improved version of print_answers * Changed the way max_text_len is checked	2021-12-15 17:16:33 +01:00
Alberto Villa	1bb6244a63	Exchanged minimal with minimum in print_answers function call (#1890 )	2021-12-14 15:27:37 +01:00
Alberto Villa	2396f0cd3a	Correct bug with encoding when generating Markdown documentation; linked with issue #1880 (#1881 )	2021-12-14 10:50:25 +01:00
tstadel	57a04631df	introduce node_input param (#1854 ) * introduce node_input param * Add latest docstring and tutorial changes * prediction and label as node_input values * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-12-14 10:34:35 +01:00
Ivan Lopez	86f5688f47	fix wrong branch and repo, add cloudwatch agent (#1877 )	2021-12-13 20:32:25 +01:00
Sara Zan	de71b944d7	Fix typo in the Windows CI UI deps (#1876 ) * Fix typo in the WindowsCI UI deps * Force a deps cache miss	2021-12-13 15:49:44 +01:00
Malte Pietsch	7084a24794	Bump version to 1.0 in REST api (#1875 )	2021-12-13 12:39:59 +01:00

... 55 56 57 58 59 ...

3803 Commits