haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-10-31 17:59:27 +00:00

Author	SHA1	Message	Date
Ikram Ali	4ab1bc3c3e	Improve the progress bar in update_embeddings() + Fix filters in update_embeddings() (#1063 ) * [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed #1037 * change 2nd level loop to docs. switch to tqdm.auto. * [document_stores] Elasticsearch new method get_document_without_embedding_count() added. * [test_case] Elasticsearch documentstore get_document_without_embedding_count() test case added. * [document_stores] Add new bool arg in get_document_count() method and fixed #1082 * [document_stores] typo fixed #1082 Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-05-21 14:18:07 +02:00
Lalit Pagaria	f46b09c756	Using text hash as id to prevent document duplication (#1000 ) * using text hash as id to prevent document duplication. Also providing a way customize it. * Add latest docstring and tutorial changes * Fixing duplicate value test when text is same * Adding test for duplicate ids in document store * Changing exception to generic Exception type * add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-05-17 17:51:52 +02:00
Malte Pietsch	25d1122773	Upgrade milvus to 1.1.0 (#1066 ) * upgrade milvus in CI to 1.1 * fix pymilvus * loose pymilvus requirement again * add date to cache keys * fix date var in action	2021-05-17 17:27:34 +02:00
Moshe Berchansky	880edd139d	Add `use_amp` to DPR's train method to enable mixed precision training. (#1048 )	2021-05-17 15:10:02 +02:00
Ikram Ali	a06e4450d1	Rename delete_all_documents() method to delete_documents() (#1047 )	2021-05-10 13:37:08 +02:00
Branden Chan	5d31e633ce	Squad tools (#1029 ) * Add first commit * Add support for conversion to and from pandas df * Add logging * Add functionality * Satisfy mypy * Incorporate reviewer feedback	2021-05-06 19:02:15 +02:00
Branden Chan	373fef8d1e	Add white space normalization warning (#1022 ) * Add white space normalization warning * Implement safer document id fetching	2021-05-05 17:54:32 +02:00
Branden Chan	aadd8b049a	Add Tutorial 11 to Readme	2021-05-05 15:35:21 +02:00
oryx1729	9bec8859f2	Test ES connection only for the default user (#1028 )	2021-05-04 15:03:19 +02:00
oryx1729	c41101ff74	Upgrade streamlit version (#1024 )	2021-05-03 17:44:57 +02:00
Julian Risch	bf4563e5d2	Filtering duplicate answers (#1021 ) * Allow filtering of duplicate answers as implemented in FARM * Changed default behavior to filtering exact duplicates * Change expected test result due to filtering of duplicate answers by default * Rounding expected test results for comparison with predictions	2021-05-03 17:18:10 +02:00
Bhadresh Savani	ca63f9fee2	Fix debug message for file-upload in UI (#1018 )	2021-05-03 09:18:55 +02:00
brandenchan	5b0b3e4616	Merge branch 'master' of https://github.com/deepset-ai/haystack	2021-04-30 16:41:05 +02:00
brandenchan	4cc853d1c3	Update link	2021-04-30 15:06:45 +02:00
Branden Chan	869b493b61	Regen api docs (#1015 )	2021-04-30 12:35:13 +02:00
oryx1729	99990e7249	Add export of Pipeline YAML config (#1003 )	2021-04-30 12:23:29 +02:00
Mario Jäckle	a00703256f	docs(document_store): add usage information for aws elastic search (#1008 ) Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>	2021-04-30 11:38:25 +02:00
Bhadresh Savani	37a72d2f45	Add File Upload Functionality in UI (#995 )	2021-04-30 10:46:30 +02:00
Branden Chan	056be3354b	Add pipelines tutorial (#1013 )	2021-04-29 18:19:20 +02:00
Branden Chan	9827b3652e	Pipelines tutorial (#991 ) * Start Pipelines tutorial * Make Tutorial 11 run locally * Add colab compatibility * Fix pip install * Add ES install from source * Add ES install from source * Add pygraphviz installation * Incorporate reviewer feedback * Ensure print_answers() works for Generator output * Fix typo	2021-04-29 17:31:28 +02:00
Julian Risch	65f1da00cc	knowledge graph documentation (#979 ) * Create knowledge_graph.md * add doc strings to Text2SparqlRetriever * Add doc strings to GraphDBKnowledgeGraph * Make method calls unambiguous so its clear which class is meant	2021-04-27 16:44:40 +02:00
oryx1729	8a57f6b16a	Update tests for FAISSDocumentStore (#999 )	2021-04-27 09:55:31 +02:00
Markus Paff	cf8a622e35	Streamlit UI Evaluation mode (#920 ) * first running version of eval mode * restructuring, new naming of elements and testing * add new files to Docker, how to start with Haystack reference, remove not needed dependencies * Add latest docstring and tutorial changes * merged changes * fixing bugs after breaking changes from last release * newser version of states in streamlit, more docs for eval mode, eval file as env virable * eval file as env variable Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-22 17:30:17 +02:00
Branden Chan	9626c0d65e	Update Documentation (#976 ) * Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add indentation (pydoc-markdown 3.10.1) * Comment out metadata * Remove Finder deprecation message * Remove Finder in FAQ * Update tutorial link * Incorporate reviewer feedback * Regen api docs * Add type annotations Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-22 16:45:29 +02:00
Malte Pietsch	b1e8ebf81a	Create pull_request_template.md	2021-04-22 15:48:39 +02:00
Andrey A	58ea0a62e0	Add links to GitHub Discussion and SO (#984 ) * Add link to Stack Overflow * Add link to GitHub discussions and re-arrange links	2021-04-22 09:51:21 +02:00
Timo Moeller	2e39361f8a	Add maxsamples and convert data dir to path (#989 )	2021-04-22 09:35:11 +02:00
oryx1729	7269530e45	Add validation for root node in Pipeline (#987 )	2021-04-21 12:18:33 +02:00
oryx1729	8c1e411380	Fix update_embeddings() for FAISSDocumentStore (#978 )	2021-04-21 09:56:35 +02:00
Guillim	0051a34ff9	Add root_path option to REST API for reverse proxy deployments (#982 )	2021-04-20 11:19:28 +02:00
oryx1729	4dd5a7a744	Make FAISS import optional (#971 )	2021-04-15 12:26:34 +02:00
oryx1729	237172f459	Make FAISS import conditional (#970 )	2021-04-14 17:34:01 +02:00
Mario Jäckle	84f90e82c5	feature(aws): add aws iam auth method (#965 ) Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>	2021-04-14 16:34:24 +02:00
oryx1729	5bb66940a9	Fix equality check in preprocessor (#969 )	2021-04-14 16:03:48 +02:00
Markus Paff	0633dae4d0	new docs version (#964 )	2021-04-14 13:40:05 +02:00
oryx1729	bba1d80aef	Update Haystack version v0.8.0	2021-04-13 16:31:19 +02:00
Branden Chan	77d4c2ca1c	Benchmark milvus (#850 ) * Add milvus benchmarking support * Add latest docstring and tutorial changes * Edit config * Disable docker interactive mode * Add milvus index type support * Adjust FAISS and Milvus node branching * Remove duplicate in config * Revert method for speedup * Add latest docstring and tutorial changes * Add latest benchmark run * Add latest docstring and tutorial changes * Add json files * Revert "Add latest docstring and tutorial changes" This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923. * Add latest docstring and tutorial changes * Revert "Add latest docstring and tutorial changes" This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b. * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-13 14:54:15 +02:00
Markus Paff	b87daed62b	fixed link to dpr (#962 )	2021-04-13 09:45:04 +02:00
Julian Risch	8333a13d6f	Adding tutorial on knowledge graphs to README	2021-04-12 15:26:02 +02:00
Markus Paff	dfb0282b74	Update milvus links and docstrings (#959 ) * update milvus links and docstrings * Add latest docstring and tutorial changes * new milvus version * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-12 14:38:57 +02:00
oryx1729	406f7fa679	Disable Gunicorn preload option (#960 )	2021-04-12 12:46:52 +02:00
Timo Moeller	837dea4e6d	Integrate sentence transformers into benchmarks (#843 ) * Integrate sentence transformers into benchmarks * Add doc store asserts * switch data downloads from s3 client to https. add license info * Fix mypy, revert config Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-09 17:24:16 +02:00
Julian Risch	d38c07e0ee	knowledge graph example (#934 ) * Add knowledge graph module * Fix type hint * Add graph retriver module * Change type annotations, change return format * Add graph retriever that executes questions as sparql queries * Linking only those entities that are in the knowledge graph * Added logging and using relations extracted from Knowledge graph for linking * Preventing entity linking from linking the same token to multiple entities * Pruning triples that have no variables for select and count queries * Support knowledge graphs with Pipelines * Add text2sparql * Entity linking and relation linking consider more special cases now based on evaluation on labelled data * Separating example code from KGQA implementation * Add eval on combined extarctive and kg questions * Remove references to hp-test * Add fields sparql_query and long_answer_list to metadata * Removing modular Question2SPARQL approach * Removing additional classes used for modular kgqa approach * preparing lcquad data * change graph db * Translating namespaces in knowledge graph queries * Creating graphdb index and loading triples from .ttl file * Fetching graph config files, triples and model from S3 * Fix incompatibility issues with BaseGraphRetriever and BaseComponent * Removing unused utility functions * Adding doc strings and tutorial header * Adding sparqlwrapper dependency * Moving tutorial header * Sorting tutorials by number within name of notebook * Add latest docstring and tutorial changes * Creating test cases for knowledge graph * Changing knowledge graph example to harry potter * Add latest docstring and tutorial changes * Adapting the tutorial notebook to harry potter example * Add GraphDB fixture for tests * Add latest docstring and tutorial changes * Added GraphDB docker launch to CI * Use correct GraphDB fixture * Check if GraphDB instance is already running * Renaming question/query and incorporating other feedback from Timo and Tanay * Removed type annotation * Add latest docstring and tutorial changes Co-authored-by: oryx1729 <oryx1729@protonmail.com> Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-08 14:05:33 +02:00
oryx1729	fc6368c191	Fix passing a list of values as param (#952 )	2021-04-07 19:50:50 +02:00
oryx1729	8c68699e1c	Refactor REST APIs to use Pipelines (#922 )	2021-04-07 17:53:32 +02:00
Julian Risch	64ad953c6a	Adding indentation to markup files (#947 )	2021-04-07 11:36:11 +02:00
lewtun	8894c4fae9	Reduce precision in pipeline eval print functions (#943 ) A proposal to reduce the precision shown in the `EvalRetriever.print` and `EvalReader.print` to 4 significant figures. If the user wants the full precision, they can access the class attributes directly. Before ``` Retriever ----------------- has_answer recall: 0.8739495798319328 (208/238) no_answer recall: 1.00 (120/120) (no_answer samples are always treated as correctly retrieved) recall: 0.9162011173184358 (328 / 358) ``` After ``` Retriever ----------------- has_answer recall: 0.8739 (208/238) no_answer recall: 1.00 (120/120) (no_answer samples are always treated as correctly retrieved) recall: 0.9162 (328 / 358) ```	2021-04-06 05:11:29 +02:00
lewtun	41a1c8329d	Fix division by zero error in EvalRetriever (#938 ) If the first query in the evaluation returns a document with `no_answer=True` we got a division by zero error because neither `self.has_answer_correct` or `self.has_answer_count` get incremented. This fix moves the `self.has_answer_recall` calculation within the if-else block.	2021-04-03 18:13:36 +02:00
Timo Moeller	5d2b16f3cc	Update farm version (#936 ) * Update farm version * Add new DPR loading, fix dpr param name * Add QA model confidence as answer probability, fix prams in test	2021-04-01 18:23:05 +02:00
Branden Chan	d77152c469	WIP: Add evaluation nodes for Pipelines (#904 ) * Add main eval fns * WIP: make pipeline_eval.py run * Fix typo * Add support for no_answers * Add latest docstring and tutorial changes * Working pipeline eval * Add timing of nodes * Add latest docstring and tutorial changes * Refactor and clean * Update tutorial script * Set default params * Update tutorials * Fix indent * Add latest docstring and tutorial changes * Address mypy issues * Add test * Fix mypy error * Clear outputs * Add doc strings * Incorporate reviewer feedback * Add latest docstring and tutorial changes * Revert query counting * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-01 17:35:18 +02:00

... 6 7 8 9 10 ...

976 Commits