haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-10 23:04:02 +00:00

Author	SHA1	Message	Date
Branden Chan	41b537affe	Add FAQ page (#1151 ) * Add faq page * Update faq.md * Fix mypy CI * Add question	2021-06-10 17:29:14 +02:00
venuraja79	49886f88f0	Integrate Weaviate as another DocumentStore (#1064 ) * Annotation Tool: data is not persisted when using local version #853 * First version of weaviate * First version of weaviate * First version of weaviate * Updated comments * Updated comments * ran query, get and write tests * update embeddings, dynamic schema and filters implemented * Initial set of tests and fixes * Tests added for update_embeddings and delete documents * introduced duplicate documents fix * fixed mypy errors * Added Weaviate to requirements * Fix the weaviate docker env variables * Fixing test dependencies for now * Created weaviate test marker and fixed query * Update docstring * Add documentation * Bump up weaviate version * Bump up weaviate version in documentation * Bump up weaviate version in documentation * Updgrade weaviate version Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-06-10 09:43:53 +02:00
Lalit Pagaria	db17d73a82	Fixing issues caused due to mypy upgrade (#1165 )	2021-06-09 16:24:39 +02:00
Branden Chan	5f0f85989a	Refresh API docs (#1152 )	2021-06-09 16:13:58 +02:00
Shahrukh Khan	545c625a37	Add QueryClassifier incl. baseline models (#1099 ) * restructure query classifier code and add s3 based pickles * make model and vectorizer optional in query classifier * update query classifier as per init style * add query classifiers sklearn/hf * update docstrings for query classifiers * add unit test for query classifier * add type patch for sklearn classifier * fix mypy type issue * revert to pure formatting * add query classifiers * resolve conflict * add output names for query classifier * revert output and update docstring queryclassifier * Update docstring for SklearnQueryClassifier * update transformer query classifier docstring * fix typo * change arg names in query classifier classes * add set_config(). rename attributes * fix set_config() Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-06-08 15:20:13 +02:00
Malte Pietsch	600636e77b	Update README.md	2021-06-08 09:23:56 +02:00
Branden Chan	59e3c55c47	Add More top_k handling to EvalDocuments (#1133 ) * Improve top_k support * Adjust warning * Satisfy mypy * Reinit eval counts if top_k has changed * Incorporate reviewer feedback	2021-06-07 12:11:00 +02:00
Branden Chan	c513865566	Add L2 support for FAISS HNSW (#1138 )	2021-06-04 11:05:18 +02:00
Julian Risch	580e28344d	Add docu of confidence scores and calibration method (#1131 ) * Add docu of confidence scores and calibration method	2021-06-03 15:49:07 +02:00
Malte Pietsch	a1472b040c	Add badges (#1136 )	2021-06-03 14:47:08 +02:00
Malte Pietsch	b41719b7c8	Add config to JoinDocuments node to allow yaml export in pipelines (#1134 ) * add config to JoinNode to allow yaml export * remove test print	2021-06-03 11:03:25 +02:00
Julian Risch	8e3d0d1287	Distinguish labels for calculating similarity scores (#1124 ) * Distinguish labels for calculating similarity scores * Explain label "0" and "1" of TextPairClassifier in Ranker	2021-06-02 17:33:36 +02:00
Branden Chan	b555bc525c	Remove duplicate run (#1132 )	2021-06-02 13:58:55 +02:00
Branden Chan	09ba75073c	Improve Milvus HNSW Performance (#1127 ) * Add simplified script * Optimize HNSW index creation * Adjust benchmark order * Rename script	2021-06-02 13:17:35 +02:00
Branden Chan	9356f637d4	Update Milvus benchmarks (#1128 ) * Update Milvus benchmarks * Add sentence transformers * Update sentence transformers index results * Remove duplicate row	2021-06-02 13:09:45 +02:00
Branden Chan	aa6f768efa	Prevent merge of same questions on different documents during evaluation (#1119 ) * Fix duplicate question in Reader.eval() * Add duplicate question support in document store * Support duplicate questions in retriever eval * Update tutorial * Rename key_tuple * Change error message * Add warning when more than 6 labels * Allow for label grouping options * Add support for aggregating by label meta * Satisfy mypy * Fix duplicate question in Reader.eval() * Add duplicate question support in document store * Support duplicate questions in retriever eval * Update tutorial * Rename key_tuple * Change error message * Add warning when more than 6 labels * Allow for label grouping options * Add support for aggregating by label meta * Satisfy mypy * Make label field flexible, add docstrings * Satisfy mypy * Fix failing tests * Adjust docstring * Fix tutorial Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-06-02 12:09:03 +02:00
Branden Chan	d8c47ed525	Preserve whitespace (#1121 )	2021-06-02 12:08:22 +02:00
Malte Pietsch	022f8586f6	Remove Python 3.6 support (#1059 ) * Remove Python 3.6 support * change cache key for CI	2021-06-01 15:24:44 +02:00
Julian Risch	a7ba146246	Removed comma from last item in json list (#1114 )	2021-06-01 12:32:21 +02:00
Julian Risch	40ceaf418a	Fixing grpcio-tools to version of colab's pre-installed grpcio (#1113 )	2021-05-31 19:09:10 +02:00
Alvise Sembenico	6326cf5710	🐳 add PDF converter dependencies to Docker (#1107 )	2021-05-31 19:01:02 +02:00
Branden Chan	6ca6ac0632	Add OpenDistro init (#1101 )	2021-05-31 18:59:20 +02:00
Julian Risch	84c34295a1	Re-ranking component for document search without QA (#1025 ) * Adding ranker similar to retriever and reader * Sort documents according to query-document similarity scores * Reranking and model training runs for small example * Added EvalRanker node * Calculate recall@k in EvalRetriever and EvalRanker nodes * Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers * Added mean reciprocal rank as metric for EvalDocuments * Fix bug that appeared when ranking documents with same score * Remove commented code for unimplmented eval() of Ranker node * Add documentation of k parameter in EvalDocuments * Add Ranker docu and renaming top_k param	2021-05-31 15:31:36 +02:00
Michaël Bitard	b5cae20ddb	Fix typo in streamlit UI (#1106 )	2021-05-28 11:18:09 +02:00
Ikram Ali	94f1a2b5c9	Improve speed of FAISSDocumentStore.delete_documents() (#1095 )	2021-05-26 07:56:09 +02:00
Ikram Ali	b76ed4c5a4	Add options for handling duplicate documents (skip, fail, overwrite) (#1088 ) * [document_stores] Duplicate document implmentation added for memorystore. * [document_stores]duplicate documents implementation done for faiss store. * [document_store] Duplicate document feature added for elasticsearch document store fixed #1069 * [document_store] Duplicate documents feature added for milvus document store and bug fixed in faiss document store fixed #1069 * [document_store] Code refactored fixed #1069 * [document_store]Test cases refactored. * [document_store] mypy issue fixed. * [test_case] faiss and milvus test case refactored to support duplicate documents implementation. fixed #1069 * [document_store] duplicate_documents_options code refactored. * [document_store] Code refactored.	2021-05-25 13:30:06 +02:00
Avishekh Shrestha	c4ee32d47d	Fix typo in preprocessing.md(#1087 ) Correct variable name from 'd' to 'doc' in line 134.	2021-05-23 19:16:58 +02:00
Ikram Ali	4ab1bc3c3e	Improve the progress bar in update_embeddings() + Fix filters in update_embeddings() (#1063 ) * [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed #1037 * change 2nd level loop to docs. switch to tqdm.auto. * [document_stores] Elasticsearch new method get_document_without_embedding_count() added. * [test_case] Elasticsearch documentstore get_document_without_embedding_count() test case added. * [document_stores] Add new bool arg in get_document_count() method and fixed #1082 * [document_stores] typo fixed #1082 Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-05-21 14:18:07 +02:00
Lalit Pagaria	f46b09c756	Using text hash as id to prevent document duplication (#1000 ) * using text hash as id to prevent document duplication. Also providing a way customize it. * Add latest docstring and tutorial changes * Fixing duplicate value test when text is same * Adding test for duplicate ids in document store * Changing exception to generic Exception type * add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-05-17 17:51:52 +02:00
Malte Pietsch	25d1122773	Upgrade milvus to 1.1.0 (#1066 ) * upgrade milvus in CI to 1.1 * fix pymilvus * loose pymilvus requirement again * add date to cache keys * fix date var in action	2021-05-17 17:27:34 +02:00
Moshe Berchansky	880edd139d	Add `use_amp` to DPR's train method to enable mixed precision training. (#1048 )	2021-05-17 15:10:02 +02:00
Ikram Ali	a06e4450d1	Rename delete_all_documents() method to delete_documents() (#1047 )	2021-05-10 13:37:08 +02:00
Branden Chan	5d31e633ce	Squad tools (#1029 ) * Add first commit * Add support for conversion to and from pandas df * Add logging * Add functionality * Satisfy mypy * Incorporate reviewer feedback	2021-05-06 19:02:15 +02:00
Branden Chan	373fef8d1e	Add white space normalization warning (#1022 ) * Add white space normalization warning * Implement safer document id fetching	2021-05-05 17:54:32 +02:00
Branden Chan	aadd8b049a	Add Tutorial 11 to Readme	2021-05-05 15:35:21 +02:00
oryx1729	9bec8859f2	Test ES connection only for the default user (#1028 )	2021-05-04 15:03:19 +02:00
oryx1729	c41101ff74	Upgrade streamlit version (#1024 )	2021-05-03 17:44:57 +02:00
Julian Risch	bf4563e5d2	Filtering duplicate answers (#1021 ) * Allow filtering of duplicate answers as implemented in FARM * Changed default behavior to filtering exact duplicates * Change expected test result due to filtering of duplicate answers by default * Rounding expected test results for comparison with predictions	2021-05-03 17:18:10 +02:00
Bhadresh Savani	ca63f9fee2	Fix debug message for file-upload in UI (#1018 )	2021-05-03 09:18:55 +02:00
brandenchan	5b0b3e4616	Merge branch 'master' of https://github.com/deepset-ai/haystack	2021-04-30 16:41:05 +02:00
brandenchan	4cc853d1c3	Update link	2021-04-30 15:06:45 +02:00
Branden Chan	869b493b61	Regen api docs (#1015 )	2021-04-30 12:35:13 +02:00
oryx1729	99990e7249	Add export of Pipeline YAML config (#1003 )	2021-04-30 12:23:29 +02:00
Mario Jäckle	a00703256f	docs(document_store): add usage information for aws elastic search (#1008 ) Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>	2021-04-30 11:38:25 +02:00
Bhadresh Savani	37a72d2f45	Add File Upload Functionality in UI (#995 )	2021-04-30 10:46:30 +02:00
Branden Chan	056be3354b	Add pipelines tutorial (#1013 )	2021-04-29 18:19:20 +02:00
Branden Chan	9827b3652e	Pipelines tutorial (#991 ) * Start Pipelines tutorial * Make Tutorial 11 run locally * Add colab compatibility * Fix pip install * Add ES install from source * Add ES install from source * Add pygraphviz installation * Incorporate reviewer feedback * Ensure print_answers() works for Generator output * Fix typo	2021-04-29 17:31:28 +02:00
Julian Risch	65f1da00cc	knowledge graph documentation (#979 ) * Create knowledge_graph.md * add doc strings to Text2SparqlRetriever * Add doc strings to GraphDBKnowledgeGraph * Make method calls unambiguous so its clear which class is meant	2021-04-27 16:44:40 +02:00
oryx1729	8a57f6b16a	Update tests for FAISSDocumentStore (#999 )	2021-04-27 09:55:31 +02:00
Markus Paff	cf8a622e35	Streamlit UI Evaluation mode (#920 ) * first running version of eval mode * restructuring, new naming of elements and testing * add new files to Docker, how to start with Haystack reference, remove not needed dependencies * Add latest docstring and tutorial changes * merged changes * fixing bugs after breaking changes from last release * newser version of states in streamlit, more docs for eval mode, eval file as env virable * eval file as env variable Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-22 17:30:17 +02:00

... 62 63 64 65 66 ...

3803 Commits