haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-09 22:33:47 +00:00

Author	SHA1	Message	Date
Malte Pietsch	50815421b0	bump haystack version v0.7.0	2021-01-21 16:02:33 +01:00
Tanay Soni	337376c81d	Add `batch_size` and generators to document stores. (#733 ) * Add batch update of embeddings in document stores * Resolve merge conflict * Remove document ordering dependency in tests * Adjust index buffer size for tests * Adjust ES Scroll Slice * Use generator for document store pagination * Add pagination for InMemoryDocumentStore * Fix missing index parameter in FAISS update_embeddings() * Fix FAISS update_embeddings() * Update FAISS tests * Update eval tests * Revert code formatting change * Fix document count in FAISS update embeddings * Fix vector_ids reset in SQLDocumentStore * Update doctrings * Update docstring	2021-01-21 16:00:08 +01:00
Markus Paff	0b583b8972	Generate docstrings and deploy to branches to Staging (Website) (#731 ) * test pre commit hook * test status * test on this branch * push generated docstrings and tutorials to branch * fixed syntax error * Add latest docstring and tutorial changes * add files before commit * catch commit error * separate generation from deployment * add deployment process for staging * add current branch to payload Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-01-21 11:01:09 +01:00
Markus Paff	0f62e0b2ee	Script for releasing docs (#736 ) * script for releasing docs * fix formatting	2021-01-21 10:58:54 +01:00
Timo Moeller	7522d2d1b0	Increase FARM to Version 0.6.2 (#755 ) * Increase farm version * Fix test	2021-01-21 10:15:41 +01:00
Branden Chan	725c03220f	Reduce memory consumption of fetch_archive_from_http (#737 )	2021-01-21 09:57:55 +01:00
Timo Moeller	4803da009a	Using PreProcessor functions on eval data (#751 ) * Add eval data splitting * Adjust for split by passage, add test and test data, adjust docstrings, add max_docs to highler level fct	2021-01-20 14:40:10 +01:00
Tanay Soni	aa8a3666c3	Support filters for DensePassageRetriever + InMemoryDocumentStore (#754 )	2021-01-20 12:52:52 +01:00
Rob192	35dcf23a4b	Use Path class in add_eval_data of haystack.document_store.base.py (#745 ) * use Path class in method add_eval_data of haystack.document_store.base.py * change type of jsonl_filename as squad_json_to_jsonl and add_eval_data are expecting string type	2021-01-19 12:08:49 +01:00
Andrey A	7a0b65a079	Add links to slack, twitter etc (#746 ) * Update README.md	2021-01-19 11:30:26 +01:00
Branden Chan	8d47a71b00	Fix Tutorial 9 (#734 ) * Add package download * Change dev to train file	2021-01-14 10:56:58 +01:00
Julian Risch	3331608e03	Adding a guard that prevents the tutorial code from being executed in every subprocess when using multiprocessing on windows (#729 )	2021-01-13 18:17:54 +01:00
Branden Chan	a3a12bc95b	Remove broken link	2021-01-13 17:32:10 +01:00
brandenchan	01fd9940d8	Fix tutorial link	2021-01-13 15:29:25 +01:00
Branden Chan	7376185b65	Create DPR training tutorial (#708 ) * WIP: Start DPR training tutorial * Create basics of DPR Train tutorial * Update documentation * Allow DPR to be initialized without document store * WIP: Add param descriptions to DPR notebook * Clean tutorial * Improve loading * Make doc store optional when loading DPR * Satisfy mypy type check * Add links * Add tutorial header * Add colab badge * Clear outputs * Incorporate reviewer feedback * WIP: Start DPR training tutorial * Create basics of DPR Train tutorial * Update documentation * Allow DPR to be initialized without document store * WIP: Add param descriptions to DPR notebook * Clean tutorial * Improve loading * Make doc store optional when loading DPR * Satisfy mypy type check * Add links * Add tutorial header * Add colab badge * Clear outputs * Incorporate reviewer feedback * Add readme links * Regenerate tutorials * Add excitement * Fix typo * Fix hard negatives comment * Wrap tutorial for windows users * Fix mypy issue	2021-01-13 10:33:55 +01:00
bogdankostic	7709b6cee0	Make batchwise adding of evaluation data possible (#717 ) * Make batchwise adding of evaluation data possible * Fix typos in docstrings * Merge add_eval_data and add_eval_data_batchwise * Improve import statements * Move add_eval_data to BaseDocumentStore * Add batch_size param to write_documents and write_labels in EsDocStore * Adjust docstring Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-12 17:54:43 +01:00
Antonio Lanza	1f00599d2e	Change signature and docstring for ca_certs parameter for SSL connection (#730 )	2021-01-12 17:30:09 +01:00
Malte Pietsch	e9b5439b00	Rename label id field for elastic & add UPDATE_EXISTING_DOCUMENTS to API config (#728 ) * rename label id field for elastic * add UPDATE_EXISTING_DOCUMENTS param to API config	2021-01-12 13:00:56 +01:00
Malte Pietsch	b6e64ca42d	Add ID to label schema (#727 )	2021-01-12 10:02:40 +01:00
Markus Paff	3af3ee1a12	Automate docstring and tutorial generation with every push to master (#718 ) * automate docstring and tutorial generation with every push to master * test CI for current branch * fixed yaml syntax * add setupttools to install process * checkout repo * fixed command for shell script * install wheel as it is needed for CI * install mkdocs * test without shell script * use package from github actions * test other configuration * back to right config * cleaning script	2021-01-11 16:25:43 +01:00
Tanay Soni	281f9ff970	Fix SQLite errors in tests (#723 )	2021-01-11 13:24:38 +01:00
Malte Pietsch	fcc052b554	Pass custom label index name in api config (#724 )	2021-01-11 12:24:09 +01:00
Lalit Pagaria	88b5cbe736	Correcting pypi download badge (#722 )	2021-01-10 06:26:17 +01:00
Lalit Pagaria	75d0ebd076	Add Summarizer (standalone + node in custom pipelines + SearchSummarizationPipeline) (#698 ) * Integration of SummarizationQAPipeline with Haystack. * Moving summarizer tests because of OOM issue * Fixing typo * Splitting summarizer test in separate ci step * Removing sysctl configuration as we already running elastic search in docker container * fixing mypy issue * update parameter names and docstrings * update parameter names in BaseSummarizer * rename pipeline * change return type of summarizer from answer to document * change scope of doc store fixture * revert scope * temp. disable test_faiss_index_save_and_load() * fix mypy. change order for mypy in CI Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-08 14:29:46 +01:00
Lalit Pagaria	3a9a756810	Using Columns names instead of ORM to get all documents (#620 ) * Using Columns name instead of ORM object for get all documents call * Separating meta search from documents. This way it will optimize the memory not duplicating document.text * Fixing mypy issue * SQLite have limit on number of host variable hence using batching to fetch meta information * Query meta only if meta field is not Null in DocOrm * Add batch_size to other functions except label * meta can be none so fix that issue * Dummy commit to trigger CI * Using chunked dictionary * Upgrading faiss * reverting change related to faiss upgrade * Changing DB name in test_faiss_retrieving test as it might interfere with exiting files by corrupting DB file * Updating doc string related to batch_size * Update docstring for batch_size Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-06 15:56:19 +01:00
Branden Chan	bb8aba18e0	Create Preprocessing Tutorial (#706 ) * WIP: First version of preprocessing tutorial * stride renamed overlap, ipynb and py files created * rename split_stride in test * Update preprocessor api documentation * define order for markdown files * define order of modules in api docs * Add colab links * Incorporate review feedback Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>	2021-01-06 15:54:05 +01:00
Malte Pietsch	5db73d4107	Update stale bot	2021-01-05 08:29:24 +01:00
Malte Pietsch	74b0868d28	Fix GPU docker build (#703 )	2020-12-31 15:04:13 +01:00
Malte Pietsch	a284af3ae5	Remove sourcerer.io widget (#702 ) Fix #699	2020-12-30 09:57:02 +01:00
Tanmay Laud	7cd9e09491	Add basic demo UI via streamlit (#671 ) * Added starter code for frontend demo * worked on comments * Added Docker config for frontend * update docker file. restructure folder structure. minimal renamings and defaults * add screenshot to readme Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-12-27 13:36:09 +01:00
Lalit Pagaria	fc521fe293	Haystack logo is not visible on github mobile. Adding download badge. (#697 )	2020-12-27 12:58:48 +01:00
Malte Pietsch	737a47e2fe	Update readme	2020-12-26 12:14:52 +01:00
Malte Pietsch	a2e5e6b09e	Update pipeline documentation and readme (#693 ) * Update README.md * Update pipelines.md * Update pipelines.md * Update README.md	2020-12-22 13:34:28 +01:00
Malte Pietsch	94b7345505	Make use_gpu=True the default in tutorials (#692 ) * enable gpu args in tutorials * add info box for gpu runtime on colab	2020-12-22 07:58:12 +01:00
Markus Paff	b752da1cd5	Add docs v0.6.0 (#689 ) * new docs version * updated directory structure * Add pipelines page * Add Finder deprecation suggestion * header for pipelines file * Document MySQL support * Mention DPR train tutorial coming soon * Mention open distro ES * Update doc strings regarding similarity fn * Add link to API docs * Wrap pipelines docs in box * add api reference for pipelines * copied latest version to v0.6.0 * Remove space * Remove space * Copy to v0.6.0 Co-authored-by: brandenchan <brandenchan@icloud.com>	2020-12-18 12:47:27 +01:00
Tanay Soni	0e4eec9499	Add tests for custom embedding field (#640 )	2020-12-17 09:18:57 +01:00
Malte Pietsch	5b817387c2	Bump version to 0.6.0 v0.6.0	2020-12-17 06:31:22 +01:00
bogdankostic	a9bcabc42d	Fix saving tokenizers in DPR training + unify save and load dirs (#682 )	2020-12-16 17:09:47 +01:00
Tanay Soni	4c2804e38e	Add support for aggregating scores in JoinDocuments node (#683 )	2020-12-16 15:54:58 +01:00
demSd	143da4cb3f	Fix a typo in DPR args, num_negatives -> num_positives (#681 ) * fix a typo, num_negatives -> num_positives * default value for num_positives * Update dense.py	2020-12-15 10:10:41 +01:00
Tanay Soni	369e237fd4	Add DocumentStore for Open Distro Elasticsearch (#676 )	2020-12-15 09:28:40 +01:00
Tanay Soni	33fe597949	Cleanup Pytest Fixtures (#639 )	2020-12-14 18:15:44 +01:00
Branden Chan	d8154939fc	Scale dot product into probabilities (#667 ) * scale dot product * Add tip in documentation * Add recommendation boxes * WIP: Use similarity attribute in all doc stores * Implement similarity for InMemoryDS * Add FAISS support * Clean printout * Update documentation * Implement document field map	2020-12-11 12:10:24 +01:00
demSd	a0e146dde6	add gpu support for rag (#669 ) * add gpu support for rag * Update transformers.py	2020-12-11 12:08:01 +01:00
Malte Pietsch	149d98a0fd	Add latest benchmark run (#652 ) * add latest benchmark run * update templates and fix small json errors * Change scale Co-authored-by: brandenchan <brandenchan@icloud.com>	2020-12-10 16:25:51 +01:00
Timo Moeller	efc754b166	Redone: Fix concatenation of sentences in PreProcessor. Add stride for word-based splits with sentence boundaries (#641 ) * Update preprocessor.py Concatenation of sentences done correctly. Stride functionality enabled for splitting by words while respecting sentence boundaries. * Simplify code, add test Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>	2020-12-09 16:12:36 +01:00
Branden Chan	8c904d79d6	Fix links (#663 )	2020-12-08 10:28:31 +01:00
Tanay Soni	c4a5de59aa	Add set_node() for Pipeline (#659 )	2020-12-07 19:16:35 +01:00
Tanay Soni	4152ad8426	Enable dynamic parameter updates for the FARMReader (#650 )	2020-12-07 14:07:20 +01:00
Malte Pietsch	e6ada08d0e	Update query arg in Tutorial 7 (#656 )	2020-12-04 08:42:09 +01:00

... 65 66 67 68 69 ...

3803 Commits