haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-09-17 12:13:35 +00:00

Author	SHA1	Message	Date
Malte Pietsch	9b1924a54a	Revert TOP_K_PER_CANDIDATE value to 3	2021-02-15 14:30:04 +01:00
Malte Pietsch	0eaae3c0dd	Fix UI when API returns fewer answers than expected (#828 ) * fix ui for few answers from api. add top_k_per_sample env * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-15 14:27:17 +01:00
brandenchan	fe47e3a45e	Fix link in documentation	2021-02-15 11:15:54 +01:00
Malte Pietsch	6798192d40	Add API endpoint to export accuracy metrics from user feedback + created_at timestamp (#803 ) * WIP feedback metrics * fix filters and zero division * add created_at and model_name fields to labels * add created_at value * remove debug log level * fix attribute init * move timestamp creation down to docstore / db level * fix import	2021-02-15 10:48:59 +01:00
brandenchan	03cda26d85	Fix link in Tutorial 8	2021-02-15 10:45:27 +01:00
Lalit Pagaria	5bd94ac5f7	Adding Translator (standalone component & wrapper for pipelines) (#782 ) * Adding translator with many generic input parameter support * Making dict_key as generic * Fixing mypy issue * Adding pipeline and using opus models * Add latest docstring and tutorial changes * Adding test cases for end-to-end translation for generator, summerizer etc * raise error join and merge nodes * Fix test failure * add docstrings. add usage documentation. rm skip_special_tokens param * Add latest docstring and tutorial changes * fix code snippets in md * Adding few extra configuration parameters and fixing tests * Fixingmypy issue and updating usage document * fix for mypy issue in pipeline.py * reverting renaming of pytest_collection_modifyitems method * Addressing review comments * setting skip_special_tokens to True * removing model_max_length argument as None type is not supported to many models * Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-12 15:58:26 +01:00
oryx1729	4059805d89	Fix ElasticsearchDocumentStore.query_by_embedding() (#823 )	2021-02-12 14:57:06 +01:00
Pavel Soriano	8adf5b4737	Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg (#811 ) * added parameter to infer DPR tokenizers class * Add latest docstring and tutorial changes * Update docstring. fix mypy * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-12 14:17:55 +01:00
oryx1729	c4607cbd98	Revamp CI (#825 )	2021-02-12 13:38:54 +01:00
Branden Chan	c807f0d050	Add key concepts diagram	2021-02-12 12:49:22 +01:00
Tanay Soni	8b0031bfc1	Remove conditional import of FAISS for Windows (#819 )	2021-02-12 12:15:23 +01:00
Branden Chan	a1983ad84e	Add new images	2021-02-11 15:10:00 +01:00
Branden Chan	db0364c728	Fix uvloop version to maintain Python<3.7 support uvloop released v0.15 which requires Python >=3.7. This commit fixes the version so that Haystack can be directly installed in colab using pip	2021-02-10 19:16:53 +01:00
Tanay Soni	fd5c5dd23c	Introduce incremental updates for embeddings in document stores (#812 )	2021-02-09 21:25:01 +01:00
Malte Pietsch	e91518ee00	Update tutorials (torch versions, ES version, replace Finder with Pipeline) (#814 ) * remove manual torch install on colab * update elasticsearch version everywhere to 7.9.2 * fix FAQPipeline * update tutorials with new pipelines * Add latest docstring and tutorial changes * revert faqpipeline change. fix field names in tutorial 4 * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-09 14:56:54 +01:00
Malte Pietsch	ac9f92466f	Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions (#813 ) * fix encoding of pdftotext. fix version in download instructions * fix test * Add latest docstring and tutorial changes * make latin-1 default encoding again * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-09 13:42:43 +01:00
Tanay Soni	f95b70df38	Fix file upload API (#808 )	2021-02-05 12:17:38 +01:00
Tanay Soni	7b18e324f2	Fix building Pipeline with YAML (#800 )	2021-02-04 11:53:51 +01:00
Branden Chan	f3a3b73d9b	Choose correct similarity fns during benchmark runs & re-run benchmarks (#773 ) * Adapt to new dataset_from_dicts return signature * rename fn * Align similarity fn in benchmark doc store * Better choice of similarity fn * Increase postgres wait time * Add more expected returned variables * update benchmark results * Fix typo * update all benchmark runs * multiply stats by 100 * Specify similarity fns for website Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-03 11:45:18 +01:00
Tanay Soni	8a5dc8f826	Load Pipeline with YAML config file (#785 )	2021-02-02 17:32:17 +01:00
Malte Pietsch	1318b55eec	Make tqdm progress bars optional (less verbose prod logs) (#796 ) * make dpr queries less verbose * add progress bar flag to more components * Add latest docstring and tutorial changes * add type * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-01 20:51:55 +01:00
Timo Moeller	f3ccd59045	Improve preprocessing and adding of eval data (#780 ) * Remove empty document when splitting text * Move error message of problematic ids to a highler level	2021-02-01 17:08:27 +01:00
Tanay Soni	b87dd244c1	Get metadata values for a key from Elasticsearch (#776 )	2021-02-01 16:13:26 +01:00
brandenchan	5665d55ab4	Remove duplicate file	2021-02-01 15:43:53 +01:00
Pavel Soriano	16b8291091	SQuAD to DPR dataset converter (#765 ) * Create squad_to_dpr.py First commit of the squad2dpr script. * adding review corrections/improvements * Merge master 5bf351e * Move script, add docstring * Add type hints Co-authored-by: brandenchan <brandenchan@icloud.com>	2021-02-01 15:40:43 +01:00
Tanay Soni	5bf351ea7b	Fix refresh behaviour for Elasticsearch delete (#794 )	2021-02-01 14:07:55 +01:00
Tanay Soni	d62355ca88	Fix mypy typing (#792 )	2021-02-01 12:15:36 +01:00
Branden Chan	1dc74c7067	Add model versioning support (#784 ) * Add model versioning support * Add latest docstring and tutorial changes * Support DPR versioning * Add RAG versioning support * Add latest docstring and tutorial changes * Add summarizer support * Add Embedding Retriever support * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-01 11:42:36 +01:00
Malte Pietsch	2b05e801c3	Fix pdftotext dependency in CI (#788 ) * Fix pdftotext dependency in CI * udpate xpdf version * Fix version	2021-01-29 16:07:37 +01:00
Lalit Pagaria	9f7f95221f	Milvus integration (#771 ) * Initial commit for Milvus integration * Add latest docstring and tutorial changes * Updating implementation of Milvus document store * Add latest docstring and tutorial changes * Adding tests and updating doc string * Add latest docstring and tutorial changes * Fixing issue caught by tests * Addressing review comments * Fixing mypy detected issue * Fixing issue caught in test about sorting of vector ids * fixing test * Fixing generator test failure * update docstrings * Addressing review comments about multiple network call while fetching embedding from milvus server * Add latest docstring and tutorial changes * Ignoring mypy issue while converting vector_id to int Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-29 13:29:12 +01:00
brandenchan	6efa4f06c1	Add Streamlit UI Image	2021-01-27 17:01:29 +01:00
Timo Moeller	f94bd96ddf	Remove RAG todos after transformers update (#781 )	2021-01-27 16:50:02 +01:00
Tanay Soni	d9f011da9a	Add flag for use of window queries in SQLDocumentStore (#768 )	2021-01-25 12:54:34 +01:00
Tanay Soni	46307d1571	Remove quotes around placeholders in Elasticsearch custom query (#762 )	2021-01-25 12:46:43 +01:00
Tanay Soni	f0aa879a1c	Fix delete_all_documents for the SQLDocumentStore (#761 )	2021-01-22 14:39:24 +01:00
Markus Paff	aee90c5df9	Docs v0.7.0 (#757 ) * new docs version * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-01-22 10:28:33 +01:00
Malte Pietsch	50815421b0	bump haystack version v0.7.0	2021-01-21 16:02:33 +01:00
Tanay Soni	337376c81d	Add `batch_size` and generators to document stores. (#733 ) * Add batch update of embeddings in document stores * Resolve merge conflict * Remove document ordering dependency in tests * Adjust index buffer size for tests * Adjust ES Scroll Slice * Use generator for document store pagination * Add pagination for InMemoryDocumentStore * Fix missing index parameter in FAISS update_embeddings() * Fix FAISS update_embeddings() * Update FAISS tests * Update eval tests * Revert code formatting change * Fix document count in FAISS update embeddings * Fix vector_ids reset in SQLDocumentStore * Update doctrings * Update docstring	2021-01-21 16:00:08 +01:00
Markus Paff	0b583b8972	Generate docstrings and deploy to branches to Staging (Website) (#731 ) * test pre commit hook * test status * test on this branch * push generated docstrings and tutorials to branch * fixed syntax error * Add latest docstring and tutorial changes * add files before commit * catch commit error * separate generation from deployment * add deployment process for staging * add current branch to payload Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-01-21 11:01:09 +01:00
Markus Paff	0f62e0b2ee	Script for releasing docs (#736 ) * script for releasing docs * fix formatting	2021-01-21 10:58:54 +01:00
Timo Moeller	7522d2d1b0	Increase FARM to Version 0.6.2 (#755 ) * Increase farm version * Fix test	2021-01-21 10:15:41 +01:00
Branden Chan	725c03220f	Reduce memory consumption of fetch_archive_from_http (#737 )	2021-01-21 09:57:55 +01:00
Timo Moeller	4803da009a	Using PreProcessor functions on eval data (#751 ) * Add eval data splitting * Adjust for split by passage, add test and test data, adjust docstrings, add max_docs to highler level fct	2021-01-20 14:40:10 +01:00
Tanay Soni	aa8a3666c3	Support filters for DensePassageRetriever + InMemoryDocumentStore (#754 )	2021-01-20 12:52:52 +01:00
Rob192	35dcf23a4b	Use Path class in add_eval_data of haystack.document_store.base.py (#745 ) * use Path class in method add_eval_data of haystack.document_store.base.py * change type of jsonl_filename as squad_json_to_jsonl and add_eval_data are expecting string type	2021-01-19 12:08:49 +01:00
Andrey A	7a0b65a079	Add links to slack, twitter etc (#746 ) * Update README.md	2021-01-19 11:30:26 +01:00
Branden Chan	8d47a71b00	Fix Tutorial 9 (#734 ) * Add package download * Change dev to train file	2021-01-14 10:56:58 +01:00
Julian Risch	3331608e03	Adding a guard that prevents the tutorial code from being executed in every subprocess when using multiprocessing on windows (#729 )	2021-01-13 18:17:54 +01:00
Branden Chan	a3a12bc95b	Remove broken link	2021-01-13 17:32:10 +01:00
brandenchan	01fd9940d8	Fix tutorial link	2021-01-13 15:29:25 +01:00

... 39 40 41 42 43 ...

2539 Commits