haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-04 15:41:00 +00:00

Author	SHA1	Message	Date
Julian Risch	9de140110f	Use smaller model for one generator test case (#1622 ) * Use smaller model for one generator test case * Reduce max_length of generated sequences in tests	2021-10-20 17:57:15 +02:00
bogdankostic	655d721371	Add Table Reader (#1446 ) * first draft / notes on new primitives * wip label / feedback refactor * rename doc.text -> doc.content. add doc.content_type * add datatype for content * remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field * update converters for . Add warning for empty * Add first draft of TableReader * renam label.question -> label.query. Allow sorting of Answers. * Add calculation of answer scores * WIP primitives * Adapt input and output to new primitives * Add doc strings * Add tests * update ui/reader for new Answer format * Improve Label. First refactoring of MultiLabel. Adjust eval code * fixed workflow conflict with introducing new one (#1472) * Add latest docstring and tutorial changes * make add_eval_data() work again * fix reader formats. WIP fix _extract_docs_and_labels_from_dict * fix test reader * Add latest docstring and tutorial changes * fix another test case for reader * fix mypy in farm reader.eval() * fix mypy in farm reader.eval() * WIP ORM refactor * Add latest docstring and tutorial changes * fix mypy weaviate * make label and multilabel dataclasses * bump mypy env in CI to python 3.8 * WIP refactor Label ORM * WIP refactor Label ORM * simplify tests for individual doc stores * WIP refactoring markers of tests * test alternative approach for tests with existing parametrization * WIP refactor ORMs * fix skip logic of already parametrized tests * fix weaviate behaviour in tests - not parametrizing it in our general test cases. * Add latest docstring and tutorial changes * fix some tests * remove sql from document_store_types * fix markers for generator and pipeline test * remove inmemory marker * remove unneeded elasticsearch markers * add dataclasses-json dependency. adjust ORM to just store JSON repr * ignore type as dataclasses_json seems to miss functionality here * update readme and contributing.md * update contributing * adjust example * fix duplicate doc handling for custom index * Add latest docstring and tutorial changes * fix some ORM issues. fix get_all_labels_aggregated. * update drop flags where get_all_labels_aggregated() was used before * Add latest docstring and tutorial changes * add to_json(). add + fix tests * fix no_answer handling in label / multilabel * fix duplicate docs in memory doc store. change primary key for sql doc table * fix mypy issues * fix mypy issues * haystack/retriever/base.py * fix test_write_document_meta[elastic] * fix test_elasticsearch_custom_fields * fix test_labels[elastic] * fix crawler * fix converter * fix docx converter * fix preprocessor * fix test_utils * fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations * Add latest docstring and tutorial changes * fix crawler test. fix ocrconverter attribute * fix test_elasticsearch_custom_query * fix generator pipeline * fix ocr converter * fix ragenerator * Add latest docstring and tutorial changes * fix test_load_and_save_yaml for elasticsearch * fixes for pipeline tests * fix faq pipeline * fix pipeline tests * Add latest docstring and tutorial changes * fix weaviate * Add latest docstring and tutorial changes * trigger CI * satisfy mypy * Add latest docstring and tutorial changes * satisfy mypy * Add latest docstring and tutorial changes * trigger CI * fix question generation test * fix ray. fix Q-generation * fix translator test * satisfy mypy * wip refactor feedback rest api * fix rest api feedback endpoint * fix doc classifier * remove relation of Labels -> Docs in SQL ORM * fix faiss/milvus tests * fix doc classifier test * fix eval test * fixing eval issues * Add latest docstring and tutorial changes * fix mypy * WIP replace dataclasses-json with manual serialization * Add latest docstring and tutorial changes * revert to dataclass-json serialization for now. remove debug prints. * update docstrings * fix extractor. fix Answer Span init * fix api test * Adapt answer format * Add latest docstring and tutorial changes * keep meta data of answers in reader.run() * Fix mypy * fix meta handling * adress review feedback * Add latest docstring and tutorial changes * Allow inference on GPU * Remove automatic aggregation * Add automatic aggregation * Add latest docstring and tutorial changes * Add torch-scatter dependency * Add wheel to torch-scatter dependency * Fix requirements * Fix requirements * Fix requirements * Adapt setup.py to allow for wheels * Fix requirements * Fix requirements * Add type hints and code snippet * Add latest docstring and tutorial changes Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-15 16:34:48 +02:00
Malte Pietsch	4a6c9302b3	Redesign primitives - `Document`, `Answer`, `Label` (#1398 ) * first draft / notes on new primitives * wip label / feedback refactor * rename doc.text -> doc.content. add doc.content_type * add datatype for content * remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field * update converters for . Add warning for empty * renam label.question -> label.query. Allow sorting of Answers. * WIP primitives * update ui/reader for new Answer format * Improve Label. First refactoring of MultiLabel. Adjust eval code * fixed workflow conflict with introducing new one (#1472) * Add latest docstring and tutorial changes * make add_eval_data() work again * fix reader formats. WIP fix _extract_docs_and_labels_from_dict * fix test reader * Add latest docstring and tutorial changes * fix another test case for reader * fix mypy in farm reader.eval() * fix mypy in farm reader.eval() * WIP ORM refactor * Add latest docstring and tutorial changes * fix mypy weaviate * make label and multilabel dataclasses * bump mypy env in CI to python 3.8 * WIP refactor Label ORM * WIP refactor Label ORM * simplify tests for individual doc stores * WIP refactoring markers of tests * test alternative approach for tests with existing parametrization * WIP refactor ORMs * fix skip logic of already parametrized tests * fix weaviate behaviour in tests - not parametrizing it in our general test cases. * Add latest docstring and tutorial changes * fix some tests * remove sql from document_store_types * fix markers for generator and pipeline test * remove inmemory marker * remove unneeded elasticsearch markers * add dataclasses-json dependency. adjust ORM to just store JSON repr * ignore type as dataclasses_json seems to miss functionality here * update readme and contributing.md * update contributing * adjust example * fix duplicate doc handling for custom index * Add latest docstring and tutorial changes * fix some ORM issues. fix get_all_labels_aggregated. * update drop flags where get_all_labels_aggregated() was used before * Add latest docstring and tutorial changes * add to_json(). add + fix tests * fix no_answer handling in label / multilabel * fix duplicate docs in memory doc store. change primary key for sql doc table * fix mypy issues * fix mypy issues * haystack/retriever/base.py * fix test_write_document_meta[elastic] * fix test_elasticsearch_custom_fields * fix test_labels[elastic] * fix crawler * fix converter * fix docx converter * fix preprocessor * fix test_utils * fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations * Add latest docstring and tutorial changes * fix crawler test. fix ocrconverter attribute * fix test_elasticsearch_custom_query * fix generator pipeline * fix ocr converter * fix ragenerator * Add latest docstring and tutorial changes * fix test_load_and_save_yaml for elasticsearch * fixes for pipeline tests * fix faq pipeline * fix pipeline tests * Add latest docstring and tutorial changes * fix weaviate * Add latest docstring and tutorial changes * trigger CI * satisfy mypy * Add latest docstring and tutorial changes * satisfy mypy * Add latest docstring and tutorial changes * trigger CI * fix question generation test * fix ray. fix Q-generation * fix translator test * satisfy mypy * wip refactor feedback rest api * fix rest api feedback endpoint * fix doc classifier * remove relation of Labels -> Docs in SQL ORM * fix faiss/milvus tests * fix doc classifier test * fix eval test * fixing eval issues * Add latest docstring and tutorial changes * fix mypy * WIP replace dataclasses-json with manual serialization * Add latest docstring and tutorial changes * revert to dataclass-json serialization for now. remove debug prints. * update docstrings * fix extractor. fix Answer Span init * fix api test * keep meta data of answers in reader.run() * fix meta handling * adress review feedback * Add latest docstring and tutorial changes * make document=None for open domain labels * add import * fix print utils * fix rest api * adress review feedback * Add latest docstring and tutorial changes * fix mypy Co-authored-by: Markus Paff <markuspaff.mp@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-13 14:23:23 +02:00
Julian Risch	24483d7bad	TransformersDocumentClassifier replacing FARMClassifier (#1540 ) * Initial draft of TransformersClassifier * Add transformers classifier implementation * Add test for SentenceTransformersClassifier * Add truncation and corresponding test case to Classifier * Add zero-shot classification and test * Add document classifier documentation * Add latest docstring and tutorial changes * print meta data with print_documents() * Add latest docstring and tutorial changes * Remove top_k param from Classifier usage example * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-01 11:22:56 +02:00
Malte Pietsch	2df1aa8713	Fix document_store_type flag for tests with multiple fixtures that get parametrized. (#1526 )	2021-09-28 16:38:21 +02:00
Julian Risch	f9d2f786ca	Replace FARM import statements; add dependencies (#1492 ) * Replace FARM import statements; add dependencies * Add InferenceProc., TextCl.Proc., TextPairCl.Proc. * Remove FARMRanker, add type annotations, rename max_sample * Add sample_to_features_text for InferenceProc. * Fix type annotations: model_name_or_path is str not Path * Fix mypy errors: implement _create_dataset in TextCl.Proc. * Add task_type "embeddings" in Inferencer * Allow loading AdaptiveModel for embedding task * Add SQuAD eval metrics; enable InferenceProc for embedding task * Add baskets as param to log_samples and handle empty basket list in log_samples * Remove unused dependencies * Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead * Remove FARMRanker and Classifier from doc generation scripts Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-09-28 16:34:24 +02:00
Malte Pietsch	183fd5ae5a	Simplify tests & allow running on individual doc stores (#1487 ) * simplify tests for individual doc stores * WIP refactoring markers of tests * test alternative approach for tests with existing parametrization * fix skip logic of already parametrized tests * fix weaviate behaviour in tests - not parametrizing it in our general test cases. * Add latest docstring and tutorial changes * fix some tests * remove sql from document_store_types * fix markers for generator and pipeline test * remove inmemory marker * remove unneeded elasticsearch markers * update readme and contributing.md * update contributing * adjust example Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-09-27 10:52:07 +02:00
Julian Risch	60471cecdf	Add inferencer for QA only (#1484 ) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-09-22 16:56:51 +02:00
ramgarg102	51f0a56e5d	delete_all_documents() replaced by delete_documents() (#1377 ) * [UPDT] delete_all_documents() replaced by delete_documents() * [UPDT] warning logs to be fixed * [UPDT] delete_all_documents() renamed and the same method added Co-authored-by: Ram Garg <ramgarg102@gmai.com>	2021-08-30 15:18:28 +02:00
Julian Risch	eb990c9688	Removing probability field from answers in favor of score field (#1340 ) * Removing probability field from reader and from test cases * Add switch to FARMReader to choose score/probability * Remove probability field from doc returned by doc store * Relax assertion testing joined es and dpr predictions * Use switch for confidence scores also for no_answer * Add test that checks switching to old answer scores > 10 * Normalize score in elastic doc store and reset reader.md * Scale weights of JoinDocuments to sum to 1 and adapt test case	2021-08-17 10:27:11 +02:00
Branden Chan	937247d628	Add QuestionGenerator (#1267 ) * Create basic Question Generation * Split texts into 50 word chunks * Allow prompt to be changed * Implement iteration functionality in DS * Add docstrings, create pipelines * Make pipelines work * Add comments * Add tests * Add tutorials and docs * Add doc string	2021-07-26 17:20:43 +02:00
Julian Risch	4e6f7f349d	Add FARMClassifier node for Document Classification (#1265 ) * Add FARM classification node * Add classification output to meta field of document * Update usage example * Add test case for FARMClassifier * Replace FARMRanker with FARMClassifier in documentation strings * Remove base method not implemented by any child class, etc.	2021-07-13 21:44:26 +02:00
Julian Risch	dbb9efbd39	Add SentenceTransformersRanker with pre-trained Cross-Encoder (#1209 ) * Add SentenceTransformersRanker with pre-trained Cross-Encoder * Add test cases for Ranker nodes and update documentation * update docstring * Update docstring * Update __init__.py * update import for test Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-07-07 17:31:45 +02:00
vblagoje	02fc4c7783	Improve document stores unit test parametrization (#1202 )	2021-06-22 16:08:23 +02:00
vblagoje	2a5882578a	Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086 ) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com>	2021-06-14 17:53:43 +02:00
venuraja79	49886f88f0	Integrate Weaviate as another DocumentStore (#1064 ) * Annotation Tool: data is not persisted when using local version #853 * First version of weaviate * First version of weaviate * First version of weaviate * Updated comments * Updated comments * ran query, get and write tests * update embeddings, dynamic schema and filters implemented * Initial set of tests and fixes * Tests added for update_embeddings and delete documents * introduced duplicate documents fix * fixed mypy errors * Added Weaviate to requirements * Fix the weaviate docker env variables * Fixing test dependencies for now * Created weaviate test marker and fixed query * Update docstring * Add documentation * Bump up weaviate version * Bump up weaviate version in documentation * Bump up weaviate version in documentation * Updgrade weaviate version Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-06-10 09:43:53 +02:00
oryx1729	8c1e411380	Fix update_embeddings() for FAISSDocumentStore (#978 )	2021-04-21 09:56:35 +02:00
Julian Risch	d38c07e0ee	knowledge graph example (#934 ) * Add knowledge graph module * Fix type hint * Add graph retriver module * Change type annotations, change return format * Add graph retriever that executes questions as sparql queries * Linking only those entities that are in the knowledge graph * Added logging and using relations extracted from Knowledge graph for linking * Preventing entity linking from linking the same token to multiple entities * Pruning triples that have no variables for select and count queries * Support knowledge graphs with Pipelines * Add text2sparql * Entity linking and relation linking consider more special cases now based on evaluation on labelled data * Separating example code from KGQA implementation * Add eval on combined extarctive and kg questions * Remove references to hp-test * Add fields sparql_query and long_answer_list to metadata * Removing modular Question2SPARQL approach * Removing additional classes used for modular kgqa approach * preparing lcquad data * change graph db * Translating namespaces in knowledge graph queries * Creating graphdb index and loading triples from .ttl file * Fetching graph config files, triples and model from S3 * Fix incompatibility issues with BaseGraphRetriever and BaseComponent * Removing unused utility functions * Adding doc strings and tutorial header * Adding sparqlwrapper dependency * Moving tutorial header * Sorting tutorials by number within name of notebook * Add latest docstring and tutorial changes * Creating test cases for knowledge graph * Changing knowledge graph example to harry potter * Add latest docstring and tutorial changes * Adapting the tutorial notebook to harry potter example * Add GraphDB fixture for tests * Add latest docstring and tutorial changes * Added GraphDB docker launch to CI * Use correct GraphDB fixture * Check if GraphDB instance is already running * Renaming question/query and incorporating other feedback from Timo and Tanay * Removed type annotation * Add latest docstring and tutorial changes Co-authored-by: oryx1729 <oryx1729@protonmail.com> Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-08 14:05:33 +02:00
Lalit Pagaria	5bd94ac5f7	Adding Translator (standalone component & wrapper for pipelines) (#782 ) * Adding translator with many generic input parameter support * Making dict_key as generic * Fixing mypy issue * Adding pipeline and using opus models * Add latest docstring and tutorial changes * Adding test cases for end-to-end translation for generator, summerizer etc * raise error join and merge nodes * Fix test failure * add docstrings. add usage documentation. rm skip_special_tokens param * Add latest docstring and tutorial changes * fix code snippets in md * Adding few extra configuration parameters and fixing tests * Fixingmypy issue and updating usage document * fix for mypy issue in pipeline.py * reverting renaming of pytest_collection_modifyitems method * Addressing review comments * setting skip_special_tokens to True * removing model_max_length argument as None type is not supported to many models * Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-12 15:58:26 +01:00
oryx1729	4059805d89	Fix ElasticsearchDocumentStore.query_by_embedding() (#823 )	2021-02-12 14:57:06 +01:00
oryx1729	c4607cbd98	Revamp CI (#825 )	2021-02-12 13:38:54 +01:00
Lalit Pagaria	9f7f95221f	Milvus integration (#771 ) * Initial commit for Milvus integration * Add latest docstring and tutorial changes * Updating implementation of Milvus document store * Add latest docstring and tutorial changes * Adding tests and updating doc string * Add latest docstring and tutorial changes * Fixing issue caught by tests * Addressing review comments * Fixing mypy detected issue * Fixing issue caught in test about sorting of vector ids * fixing test * Fixing generator test failure * update docstrings * Addressing review comments about multiple network call while fetching embedding from milvus server * Add latest docstring and tutorial changes * Ignoring mypy issue while converting vector_id to int Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-29 13:29:12 +01:00
Tanay Soni	f0aa879a1c	Fix delete_all_documents for the SQLDocumentStore (#761 )	2021-01-22 14:39:24 +01:00
Tanay Soni	337376c81d	Add `batch_size` and generators to document stores. (#733 ) * Add batch update of embeddings in document stores * Resolve merge conflict * Remove document ordering dependency in tests * Adjust index buffer size for tests * Adjust ES Scroll Slice * Use generator for document store pagination * Add pagination for InMemoryDocumentStore * Fix missing index parameter in FAISS update_embeddings() * Fix FAISS update_embeddings() * Update FAISS tests * Update eval tests * Revert code formatting change * Fix document count in FAISS update embeddings * Fix vector_ids reset in SQLDocumentStore * Update doctrings * Update docstring	2021-01-21 16:00:08 +01:00
Tanay Soni	281f9ff970	Fix SQLite errors in tests (#723 )	2021-01-11 13:24:38 +01:00
Lalit Pagaria	75d0ebd076	Add Summarizer (standalone + node in custom pipelines + SearchSummarizationPipeline) (#698 ) * Integration of SummarizationQAPipeline with Haystack. * Moving summarizer tests because of OOM issue * Fixing typo * Splitting summarizer test in separate ci step * Removing sysctl configuration as we already running elastic search in docker container * fixing mypy issue * update parameter names and docstrings * update parameter names in BaseSummarizer * rename pipeline * change return type of summarizer from answer to document * change scope of doc store fixture * revert scope * temp. disable test_faiss_index_save_and_load() * fix mypy. change order for mypy in CI Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-08 14:29:46 +01:00
Tanay Soni	0e4eec9499	Add tests for custom embedding field (#640 )	2020-12-17 09:18:57 +01:00
Tanay Soni	33fe597949	Cleanup Pytest Fixtures (#639 )	2020-12-14 18:15:44 +01:00
Tanay Soni	4152ad8426	Enable dynamic parameter updates for the FARMReader (#650 )	2020-12-07 14:07:20 +01:00
Tanay Soni	8e52b48e1d	Add pipelines for GenerativeQA & FAQs (#645 )	2020-12-03 10:27:06 +01:00
Tanay Soni	5e62e54875	Rename question parameter to query (#614 )	2020-11-30 17:50:04 +01:00
kolk	72b637ae6d	DensePassageRetriever: Add Training, Refactor Inference to FARM modules (#527 ) * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * mypy errors fix * DPR instantiation bugfix * Fix DPR init in RAG Tutorial Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-30 19:22:06 +01:00
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Lalit Pagaria	9521e180b3	Standardize behavior of DocumentStores to return embeddings (#514 ) * Adding support to return embedding along with other result via query_by_embedding function * Adding test case to check return embedding * By default for all tests but DPR tests: disable return_embedding flag * Reducing None test case and fixing query_by_embedding of ElasticsearchDocumentStore when it updating self.excluded_meta_data directly * Fixing mypy reported issue	2020-10-27 08:33:39 +01:00
Lalit Pagaria	abda994116	Pytest fix memory leak and put pytest marker on slow tests (#520 ) * Clear faiss_index during teardown * Marking slow test with pytest markers. So In future these test can be optimized. Also command line option can be added to skip them refer https://pytest.org/en/stable/example/simple.html#control-skipping-of-tests-according-to-command-line-option * Fixing test	2020-10-26 19:19:10 +01:00
Tanay Soni	db4151bbc0	Fix scoring in Elasticsearch for dot product (#517 )	2020-10-23 17:50:49 +02:00
Lalit Pagaria	63c12371b9	Change arg "model" to "model_name_or_path" in TransformersReader (#510 ) * Consistent parameter naming for TransformersReader along with removing unused imports as well. * Addressing review comments	2020-10-21 17:15:35 +02:00
Lalit Pagaria	2e9f3c1512	Fix update_embeddings function in FAISSDocumentStore and add retriever fixture in tests (#481 ) * 1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs. 2. cleaning up tests by adding fixture for retriever. * TfidfRetriever need document store with documents during initialization as it call fit() function in constructor so fixing it by checking self.paragraphs of None * Fix naming of retriever's fixture (embedded to embedding and tfid to tfidf)	2020-10-14 16:15:04 +02:00
Malte Pietsch	8edeb844f7	Remove phi normalization from FAISS, support more index types, 3x speedup (#467 ) * remove phi normalization * add special case for hnsw * rename vector_size to vector_dim * fix loading. fix extra dim in tests * switch to new ES syntax for vector similarity * 3x sql speed up. cascade deletes. add train_index() * add docstrings. remove vector_dim from load() * delete docs from faiss and sql * fix delete of docs in test * relax type hint for faiss index * rename metric to metric_type Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>	2020-10-06 16:09:56 +02:00
Malte Pietsch	9727829cc6	Rename and restructure modules (database, indexing, schemas) (#379 ) * rename database to documentstore * move document, label, multilabel to haystack/schema.py * rename documentstore -> document_store * split indexing modules -> file_converter + preprocessor * fix order of imports * Update tutorial notebooks * fix torch version in tutorial 4	2020-09-16 18:33:23 +02:00
Lalit P	de5ad42e46	Adjust tests for MacOS (#374 )	2020-09-15 15:04:46 +02:00
Tanay Soni	01ff66dfd6	Remove redundant test fixture	2020-08-17 14:19:38 +02:00
Dany	403318b1f5	Add Tika Converter (#314 )	2020-08-17 11:21:09 +02:00
Tanay Soni	1637ce1184	Revert "Add Tika Converter (#314 )" This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.	2020-08-17 11:13:52 +02:00
Tanay Soni	5ef59b1901	Add Tika Converter (#314 )	2020-08-14 14:13:59 +02:00
Tanay Soni	9d0df60aad	Add FAISS Document Store (#253 )	2020-08-07 14:25:08 +02:00
Timo Moeller	d9e8b522a1	Add "no answer" aggregation to Transformersreader (#259 ) * Add no answer aggregation * Change to covariant type annotation * Remove n_best_per_passage from transformersreader	2020-08-06 17:32:55 +02:00
Tanay Soni	5937f9cf16	Deprecate Tags for Document Stores (#286 )	2020-08-04 14:24:12 +02:00
Malte Pietsch	29a15c0d59	Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243 )	2020-07-31 11:34:06 +02:00
Malte Pietsch	99a6a34047	Upgrade to new FARM / Transformers / PyTorch versions (#212 )	2020-07-14 18:53:15 +02:00

1 2 3

106 Commits