haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-30 19:36:23 +00:00

Author	SHA1	Message	Date
Branden Chan	77d4c2ca1c	Benchmark milvus (#850 ) * Add milvus benchmarking support * Add latest docstring and tutorial changes * Edit config * Disable docker interactive mode * Add milvus index type support * Adjust FAISS and Milvus node branching * Remove duplicate in config * Revert method for speedup * Add latest docstring and tutorial changes * Add latest benchmark run * Add latest docstring and tutorial changes * Add json files * Revert "Add latest docstring and tutorial changes" This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923. * Add latest docstring and tutorial changes * Revert "Add latest docstring and tutorial changes" This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b. * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-13 14:54:15 +02:00
Markus Paff	b87daed62b	fixed link to dpr (#962 )	2021-04-13 09:45:04 +02:00
Timo Moeller	837dea4e6d	Integrate sentence transformers into benchmarks (#843 ) * Integrate sentence transformers into benchmarks * Add doc store asserts * switch data downloads from s3 client to https. add license info * Fix mypy, revert config Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-09 17:24:16 +02:00
Julian Risch	d38c07e0ee	knowledge graph example (#934 ) * Add knowledge graph module * Fix type hint * Add graph retriver module * Change type annotations, change return format * Add graph retriever that executes questions as sparql queries * Linking only those entities that are in the knowledge graph * Added logging and using relations extracted from Knowledge graph for linking * Preventing entity linking from linking the same token to multiple entities * Pruning triples that have no variables for select and count queries * Support knowledge graphs with Pipelines * Add text2sparql * Entity linking and relation linking consider more special cases now based on evaluation on labelled data * Separating example code from KGQA implementation * Add eval on combined extarctive and kg questions * Remove references to hp-test * Add fields sparql_query and long_answer_list to metadata * Removing modular Question2SPARQL approach * Removing additional classes used for modular kgqa approach * preparing lcquad data * change graph db * Translating namespaces in knowledge graph queries * Creating graphdb index and loading triples from .ttl file * Fetching graph config files, triples and model from S3 * Fix incompatibility issues with BaseGraphRetriever and BaseComponent * Removing unused utility functions * Adding doc strings and tutorial header * Adding sparqlwrapper dependency * Moving tutorial header * Sorting tutorials by number within name of notebook * Add latest docstring and tutorial changes * Creating test cases for knowledge graph * Changing knowledge graph example to harry potter * Add latest docstring and tutorial changes * Adapting the tutorial notebook to harry potter example * Add GraphDB fixture for tests * Add latest docstring and tutorial changes * Added GraphDB docker launch to CI * Use correct GraphDB fixture * Check if GraphDB instance is already running * Renaming question/query and incorporating other feedback from Timo and Tanay * Removed type annotation * Add latest docstring and tutorial changes Co-authored-by: oryx1729 <oryx1729@protonmail.com> Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-08 14:05:33 +02:00
oryx1729	8c68699e1c	Refactor REST APIs to use Pipelines (#922 )	2021-04-07 17:53:32 +02:00
Timo Moeller	5d2b16f3cc	Update farm version (#936 ) * Update farm version * Add new DPR loading, fix dpr param name * Add QA model confidence as answer probability, fix prams in test	2021-04-01 18:23:05 +02:00
Branden Chan	d77152c469	WIP: Add evaluation nodes for Pipelines (#904 ) * Add main eval fns * WIP: make pipeline_eval.py run * Fix typo * Add support for no_answers * Add latest docstring and tutorial changes * Working pipeline eval * Add timing of nodes * Add latest docstring and tutorial changes * Refactor and clean * Update tutorial script * Set default params * Update tutorials * Fix indent * Add latest docstring and tutorial changes * Address mypy issues * Add test * Fix mypy error * Clear outputs * Add doc strings * Incorporate reviewer feedback * Add latest docstring and tutorial changes * Revert query counting * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-01 17:35:18 +02:00
Lalit Pagaria	e904deefa7	Add Markdown file convertor (#875 )	2021-03-23 16:31:26 +01:00
oryx1729	e9f0076dbd	Fix execution of Pipelines with parallel nodes (#901 )	2021-03-18 12:41:30 +01:00
oryx1729	e0a118fd9a	Add support for parallel paths in Pipeline (#884 )	2021-03-10 18:17:23 +01:00
oryx1729	f3fb9aacce	Fix validation for `split_respect_sentence_boundary` in Preprocessor (#869 )	2021-03-04 15:09:08 +01:00
Malte Pietsch	e641bff7a6	Allow more options for elasticsearch client (auth, multiple hosts) (#845 ) * allow more options for elasticsearch client (auth, multiple hosts) * Add latest docstring and tutorial changes * fix mypy * Add latest docstring and tutorial changes * test client connection via ping() Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-19 14:29:59 +01:00
Tanay Soni	07907f9eac	Add support for indexing pipelines (#816 )	2021-02-16 16:24:28 +01:00
Malte Pietsch	47aae14efa	relax assert precision of arrays	2021-02-15 14:52:13 +01:00
Lalit Pagaria	5bd94ac5f7	Adding Translator (standalone component & wrapper for pipelines) (#782 ) * Adding translator with many generic input parameter support * Making dict_key as generic * Fixing mypy issue * Adding pipeline and using opus models * Add latest docstring and tutorial changes * Adding test cases for end-to-end translation for generator, summerizer etc * raise error join and merge nodes * Fix test failure * add docstrings. add usage documentation. rm skip_special_tokens param * Add latest docstring and tutorial changes * fix code snippets in md * Adding few extra configuration parameters and fixing tests * Fixingmypy issue and updating usage document * fix for mypy issue in pipeline.py * reverting renaming of pytest_collection_modifyitems method * Addressing review comments * setting skip_special_tokens to True * removing model_max_length argument as None type is not supported to many models * Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-12 15:58:26 +01:00
oryx1729	4059805d89	Fix ElasticsearchDocumentStore.query_by_embedding() (#823 )	2021-02-12 14:57:06 +01:00
oryx1729	c4607cbd98	Revamp CI (#825 )	2021-02-12 13:38:54 +01:00
Tanay Soni	fd5c5dd23c	Introduce incremental updates for embeddings in document stores (#812 )	2021-02-09 21:25:01 +01:00
Malte Pietsch	ac9f92466f	Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions (#813 ) * fix encoding of pdftotext. fix version in download instructions * fix test * Add latest docstring and tutorial changes * make latin-1 default encoding again * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-09 13:42:43 +01:00
Tanay Soni	f95b70df38	Fix file upload API (#808 )	2021-02-05 12:17:38 +01:00
Branden Chan	f3a3b73d9b	Choose correct similarity fns during benchmark runs & re-run benchmarks (#773 ) * Adapt to new dataset_from_dicts return signature * rename fn * Align similarity fn in benchmark doc store * Better choice of similarity fn * Increase postgres wait time * Add more expected returned variables * update benchmark results * Fix typo * update all benchmark runs * multiply stats by 100 * Specify similarity fns for website Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-03 11:45:18 +01:00
Tanay Soni	8a5dc8f826	Load Pipeline with YAML config file (#785 )	2021-02-02 17:32:17 +01:00
Timo Moeller	f3ccd59045	Improve preprocessing and adding of eval data (#780 ) * Remove empty document when splitting text * Move error message of problematic ids to a highler level	2021-02-01 17:08:27 +01:00
Tanay Soni	b87dd244c1	Get metadata values for a key from Elasticsearch (#776 )	2021-02-01 16:13:26 +01:00
brandenchan	5665d55ab4	Remove duplicate file	2021-02-01 15:43:53 +01:00
Pavel Soriano	16b8291091	SQuAD to DPR dataset converter (#765 ) * Create squad_to_dpr.py First commit of the squad2dpr script. * adding review corrections/improvements * Merge master 5bf351e * Move script, add docstring * Add type hints Co-authored-by: brandenchan <brandenchan@icloud.com>	2021-02-01 15:40:43 +01:00
Lalit Pagaria	9f7f95221f	Milvus integration (#771 ) * Initial commit for Milvus integration * Add latest docstring and tutorial changes * Updating implementation of Milvus document store * Add latest docstring and tutorial changes * Adding tests and updating doc string * Add latest docstring and tutorial changes * Fixing issue caught by tests * Addressing review comments * Fixing mypy detected issue * Fixing issue caught in test about sorting of vector ids * fixing test * Fixing generator test failure * update docstrings * Addressing review comments about multiple network call while fetching embedding from milvus server * Add latest docstring and tutorial changes * Ignoring mypy issue while converting vector_id to int Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-29 13:29:12 +01:00
Tanay Soni	d9f011da9a	Add flag for use of window queries in SQLDocumentStore (#768 )	2021-01-25 12:54:34 +01:00
Tanay Soni	46307d1571	Remove quotes around placeholders in Elasticsearch custom query (#762 )	2021-01-25 12:46:43 +01:00
Tanay Soni	f0aa879a1c	Fix delete_all_documents for the SQLDocumentStore (#761 )	2021-01-22 14:39:24 +01:00
Tanay Soni	337376c81d	Add `batch_size` and generators to document stores. (#733 ) * Add batch update of embeddings in document stores * Resolve merge conflict * Remove document ordering dependency in tests * Adjust index buffer size for tests * Adjust ES Scroll Slice * Use generator for document store pagination * Add pagination for InMemoryDocumentStore * Fix missing index parameter in FAISS update_embeddings() * Fix FAISS update_embeddings() * Update FAISS tests * Update eval tests * Revert code formatting change * Fix document count in FAISS update embeddings * Fix vector_ids reset in SQLDocumentStore * Update doctrings * Update docstring	2021-01-21 16:00:08 +01:00
Timo Moeller	7522d2d1b0	Increase FARM to Version 0.6.2 (#755 ) * Increase farm version * Fix test	2021-01-21 10:15:41 +01:00
Timo Moeller	4803da009a	Using PreProcessor functions on eval data (#751 ) * Add eval data splitting * Adjust for split by passage, add test and test data, adjust docstrings, add max_docs to highler level fct	2021-01-20 14:40:10 +01:00
Tanay Soni	aa8a3666c3	Support filters for DensePassageRetriever + InMemoryDocumentStore (#754 )	2021-01-20 12:52:52 +01:00
bogdankostic	7709b6cee0	Make batchwise adding of evaluation data possible (#717 ) * Make batchwise adding of evaluation data possible * Fix typos in docstrings * Merge add_eval_data and add_eval_data_batchwise * Improve import statements * Move add_eval_data to BaseDocumentStore * Add batch_size param to write_documents and write_labels in EsDocStore * Adjust docstring Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-12 17:54:43 +01:00
Tanay Soni	281f9ff970	Fix SQLite errors in tests (#723 )	2021-01-11 13:24:38 +01:00
Lalit Pagaria	75d0ebd076	Add Summarizer (standalone + node in custom pipelines + SearchSummarizationPipeline) (#698 ) * Integration of SummarizationQAPipeline with Haystack. * Moving summarizer tests because of OOM issue * Fixing typo * Splitting summarizer test in separate ci step * Removing sysctl configuration as we already running elastic search in docker container * fixing mypy issue * update parameter names and docstrings * update parameter names in BaseSummarizer * rename pipeline * change return type of summarizer from answer to document * change scope of doc store fixture * revert scope * temp. disable test_faiss_index_save_and_load() * fix mypy. change order for mypy in CI Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-08 14:29:46 +01:00
Lalit Pagaria	3a9a756810	Using Columns names instead of ORM to get all documents (#620 ) * Using Columns name instead of ORM object for get all documents call * Separating meta search from documents. This way it will optimize the memory not duplicating document.text * Fixing mypy issue * SQLite have limit on number of host variable hence using batching to fetch meta information * Query meta only if meta field is not Null in DocOrm * Add batch_size to other functions except label * meta can be none so fix that issue * Dummy commit to trigger CI * Using chunked dictionary * Upgrading faiss * reverting change related to faiss upgrade * Changing DB name in test_faiss_retrieving test as it might interfere with exiting files by corrupting DB file * Updating doc string related to batch_size * Update docstring for batch_size Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-06 15:56:19 +01:00
Branden Chan	bb8aba18e0	Create Preprocessing Tutorial (#706 ) * WIP: First version of preprocessing tutorial * stride renamed overlap, ipynb and py files created * rename split_stride in test * Update preprocessor api documentation * define order for markdown files * define order of modules in api docs * Add colab links * Incorporate review feedback Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>	2021-01-06 15:54:05 +01:00
Tanay Soni	0e4eec9499	Add tests for custom embedding field (#640 )	2020-12-17 09:18:57 +01:00
Tanay Soni	4c2804e38e	Add support for aggregating scores in JoinDocuments node (#683 )	2020-12-16 15:54:58 +01:00
Tanay Soni	33fe597949	Cleanup Pytest Fixtures (#639 )	2020-12-14 18:15:44 +01:00
Malte Pietsch	149d98a0fd	Add latest benchmark run (#652 ) * add latest benchmark run * update templates and fix small json errors * Change scale Co-authored-by: brandenchan <brandenchan@icloud.com>	2020-12-10 16:25:51 +01:00
Timo Moeller	efc754b166	Redone: Fix concatenation of sentences in PreProcessor. Add stride for word-based splits with sentence boundaries (#641 ) * Update preprocessor.py Concatenation of sentences done correctly. Stride functionality enabled for splitting by words while respecting sentence boundaries. * Simplify code, add test Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>	2020-12-09 16:12:36 +01:00
Tanay Soni	4152ad8426	Enable dynamic parameter updates for the FARMReader (#650 )	2020-12-07 14:07:20 +01:00
Tanay Soni	8e52b48e1d	Add pipelines for GenerativeQA & FAQs (#645 )	2020-12-03 10:27:06 +01:00
Malte Pietsch	216787ed34	Fix benchmarks (#648 ) * disable fasttokenizer, increase ES timeout for delete requests * add session.close() * fix deletion of docs	2020-12-02 16:59:42 +01:00
Tanay Soni	5e62e54875	Rename question parameter to query (#614 )	2020-11-30 17:50:04 +01:00
Tanay Soni	ea976ba5b5	Add return_embedding parameter for get_all_documents() (#615 )	2020-11-26 10:32:30 +01:00
Tanay Soni	e3a68aedaf	Add support for building custom Search Pipelines (#596 )	2020-11-20 17:41:08 +01:00

1 2 3 4 5

240 Commits