haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-09 14:23:43 +00:00

Author	SHA1	Message	Date
Zenahr Barzani	955e6f7b3a	Add explicit encoding mode to file_converter/txt.py (#478 ) * add explicit encoding mode * parameterize encoding Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-12 17:32:04 +02:00
Branden Chan	1cebcb7dda	Create time and performance benchmarks for all readers and retrievers (#339 ) * add time and perf benchmark for es * Add retriever benchmarking * Add Reader benchmarking * add nq to squad conversion * add conversion stats * clean benchmarks * Add link to dataset * Update imports * add first support for neg psgs * Refactor test * set max_seq_len * cleanup benchmark * begin retriever speed benchmarking * Add support for retriever query index benchmarking * improve reader eval, retriever speed benchmarking * improve retriever speed benchmarking * Add retriever accuracy benchmark * Add neg doc shuffling * Add top_n * 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging * Add models to sweep * add option for faiss index type * remove unneeded line * change faiss to faiss_flat * begin automatic benchmark script * remove existing postgres docker for benchmarking * Add data processing scripts * Remove shuffle in script bc data already shuffled * switch hnsw setup from 256 to 128 * change es similarity to dot product by default * Error includes stack trace * Change ES default timeout * remove delete_docs() from timing for indexing * Add support for website export * update website on push to benchmarks * add complete benchmarks results * new json format * removed NaN as is not a valid json token * fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches * update benchmarks for hnsw 128,20,80 * don't delete full index in delete_all_documents() * update texts for charts * update recall column for retriever * change scale and add units to desc * add units to legend * add axis titles. update desc * add html tags Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>	2020-10-12 13:34:42 +02:00
Malte Pietsch	8edeb844f7	Remove phi normalization from FAISS, support more index types, 3x speedup (#467 ) * remove phi normalization * add special case for hnsw * rename vector_size to vector_dim * fix loading. fix extra dim in tests * switch to new ES syntax for vector similarity * 3x sql speed up. cascade deletes. add train_index() * add docstrings. remove vector_dim from load() * delete docs from faiss and sql * fix delete of docs in test * relax type hint for faiss index * rename metric to metric_type Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>	2020-10-06 16:09:56 +02:00
Markus Paff	56852f820b	READ.me for Docstring Generation and remove not needed files (#468 )	2020-10-06 15:16:56 +02:00
Markus Paff	25f34babce	Separate data and view for benchmarks (#451 ) * separate data and view for benchmarks * fixed typo	2020-10-06 10:30:19 +02:00
Lalit Pagaria	465ccbc12e	Allow multiple write calls to existing FAISS index. (#422 ) - Fixing issue when update_embeddings always create new FAISS index instead of clearing existing one. New index creation may not free existing used memory and cause memory leak. Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-05 12:01:20 +02:00
Futurne	072e32b38a	Fix filters in query_embedding for ElasticsearchDocumentStore (#464 ) Co-authored-by: Pierre Pereira <pierre.pereira@lexistems.com>	2020-10-05 11:25:07 +02:00
Tanay Soni	669c72d538	Enable bulk operations on vector IDs for FAISSDocumentStore (#460 )	2020-10-02 14:43:25 +02:00
Malte Pietsch	029d1b75f2	Update docstring in DPR for embed_title (#459 )	2020-10-02 13:41:33 +02:00
Lalit Pagaria	9b58374b7c	Skip file conversion if file type is not supported (#456 ) * Skip file converter if file type is not supported. Refer https://github.com/deepset-ai/haystack/issues/453 * Fixing issue reported by mypy * Addressing review comments	2020-10-01 14:47:45 +02:00
Malte Pietsch	a92ca04648	Update GPU docker & fix race condition with multiple workers (#436 ) * fix gpu CMD and set tag to latest * udpate dockerfiles. resolve race condition of index creation with multiple workers * update dockerfiles for preload. remove try catch for elastic index creation * add back try/catch. disable multiproc in default config to comply with --preload of gunicorn * change to pip3 for GPU dockerfile * remove --preload for gpu	2020-09-29 21:12:44 +02:00
Markus Paff	5d1e208186	Create deploy_website.yml (#450 ) Creates a dispatch event on push to master so that we can trigger a build in haystack-website. The website should always have the latest docs version	2020-09-29 19:49:04 +02:00
Tanay Soni	52000ff678	Add Docker setup for the annotation tool (#444 )	2020-09-29 14:09:45 +02:00
Tanay Soni	93fd4aa72f	Update ONNX conversion for FARMReader (#438 )	2020-09-28 16:10:32 +02:00
Malte Pietsch	bb4802ae6a	Make sentence-transformers usage more user-friendly (#439 ) Co-authored-by: guillim <guigloo@msn.com>	2020-09-28 15:34:23 +02:00
Malte Pietsch	cd19d65f1a	Update README.rst	2020-09-27 12:31:11 +02:00
Malte Pietsch	8a21843167	Update README.rst	2020-09-27 12:30:25 +02:00
Malte Pietsch	dfe244e287	Fix typos in roadmap (#434 )	2020-09-25 11:28:46 +02:00
Malte Pietsch	0a123707e4	Fix typos in roadmap (#433 )	2020-09-25 07:38:48 +02:00
Malte Pietsch	15c0064498	add roadmap section to docs (#432 )	2020-09-24 23:43:40 +02:00
Markus Paff	6b35e38e12	Fixed tabs for haystack-website issue (#427 )	2020-09-24 10:36:18 +02:00
Markus Paff	66a1893f79	Moved files to api directory (#418 )	2020-09-22 11:48:26 +02:00
Guillim	29cbd1e4a1	Add embedding_field to existing index in ES (#415 )	2020-09-22 10:25:58 +02:00
Guillim	fb5db59590	Remove useless line from Tutorial4_FAQ_style_QA (#416 ) * Update Tutorial4_FAQ_style_QA.py Used to be useful when `.apply()` was necessary, but not any longer * Update Tutorial4_FAQ_style_QA.ipynb	2020-09-22 09:01:04 +02:00
Markus Paff	8e044dc16f	Fix typo in documentation (#406 ) Co-authored-by: Antonio Lanza <antoniolanza1996@gmail.com>	2020-09-21 13:31:00 +02:00
Malte Pietsch	c5f1f9aa87	Update README.rst v0.4.0	2020-09-21 10:31:25 +02:00
Malte Pietsch	747e0c0046	Bump FARM to 0.4.9. Remove custom torch installation from colab tutorials (#404 )	2020-09-21 10:26:12 +02:00
Malte Pietsch	271ff30262	fix type casting of embeddings for tutorial 4 (#402 )	2020-09-18 18:10:50 +02:00
Malte Pietsch	0c5750fae0	Bump version to 0.4.0	2020-09-18 17:12:29 +02:00
Malte Pietsch	db6864d159	Fix type casting for vectors in FAISS (#399 ) * Fix type casting for vectors in FAISS Co-authored-by: philipp-bode <philipp.bode@student.hpi.de> * add type casts for elastic. refactor embedding retriever tests * fix case: empty embedding field * fix faiss tolerance * add assert in test_faiss_retrieving Co-authored-by: philipp-bode <philipp.bode@student.hpi.de>	2020-09-18 17:08:13 +02:00
Branden Chan	4ea4cfd282	Merge pull request #400 from deepset-ai/fix_imgs Fix images in readme	2020-09-18 15:01:20 +02:00
brandenchan	f4a1682570	Fix images	2020-09-18 14:58:03 +02:00
Malte Pietsch	d69133966d	Fix faiss test tolerance	2020-09-18 13:57:29 +02:00
Branden Chan	7fdb85d63a	Create documentation website (#272 ) * Skeleton of doc website * Flesh out documentation pages * Split concepts into their own rst files * add tutorial rsts * Consistent level 1 markdown headers in tutorials * Change theme to readthedocs * Turn bullet points into prose * Populate sections * Add more text * Add more sphinx files * Add more retriever documentation * combined all documenations in one structure * rename of src to _src as it was ignored by git * Incorporate MP2's changes * add benchmark bar charts * Adapt docstrings in Readers * Improvements to intro, creation of glossary * Adapt docstrings in Retrievers * Adapt docstrings in Finder * Adapt Docstrings of Finder * Updates to text * Edit text * update doc strings * proof read tutorials * Edit text * Edit text * Add stacked chart * populate graph with data * Switch Documentation to markdown (#386) * add way to generate markdown files to sphinx * changed from rst to markdown and extended sphinx for it * fix spelling * Clean titles * delete file * change spelling * add sections to document store usage * add basic rest api docs * fix readme in setup.py * Update Tutorials * Change section names * add windows note to pip install * update intro * new renderer for markdown files * Fix typos * delete dpr_utils.py * fix windows note in get started * Fix docstrings * deleted rest api docs in api * fixed typo * Fix docstring * revert readme to rst * Fix readme * Update setup.py Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com> Co-authored-by: Bogdan Kostić <bogdankostic@web.de> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-09-18 12:57:32 +02:00
Malte Pietsch	4c503158a7	Fix duplicate vector ids in FAISS (#395 ) * fix duplicate vector ids in faiss * Add test Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com> * revert score change * switch to faiss_index.ntotal for ids. add tests Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>	2020-09-18 12:52:22 +02:00
Tanay Soni	0859da8f74	Fix document filtering in SQLDocumentStore (#396 )	2020-09-18 12:22:52 +02:00
Tanay Soni	3399fc784d	Refactor file converter interface (#393 )	2020-09-18 10:42:13 +02:00
Malte Pietsch	4e46d9d176	remove dpr_utils.py	2020-09-17 17:17:19 +02:00
Tanay Soni	06243dbda4	Move retriever probability calculations to document_store (#389 )	2020-09-17 16:25:46 +02:00
Tanay Soni	03fa4a8740	Exclude embedding fields from the REST API (#390 )	2020-09-17 14:37:01 +02:00
Malte Pietsch	3782646948	Add logo to readme (#384 ) * add logo image * add logo to readme * change img path to master * Update README.rst	2020-09-16 18:36:22 +02:00
Malte Pietsch	9727829cc6	Rename and restructure modules (database, indexing, schemas) (#379 ) * rename database to documentstore * move document, label, multilabel to haystack/schema.py * rename documentstore -> document_store * split indexing modules -> file_converter + preprocessor * fix order of imports * Update tutorial notebooks * fix torch version in tutorial 4	2020-09-16 18:33:23 +02:00
Malte Pietsch	bde33ddaaa	Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 (#376 ) * bump farm version to 0.4.8 * move back to original transformers pipeline * remove dpr_utils and use transformers implementation * update tutorial notebooks	2020-09-16 17:24:40 +02:00
Lalit P	de5ad42e46	Adjust tests for MacOS (#374 )	2020-09-15 15:04:46 +02:00
Tanay Soni	c0c2865e58	Add FAISS query scores (#368 )	2020-09-11 13:59:38 +02:00
Tanay Soni	9d93ffbe54	Add Gunicorn timeout (#364 )	2020-09-10 09:20:39 +02:00
maxupp	06e8be30ea	Add index arg to Finder.get_answers() and _via_similar_questions() (#362 ) Co-authored-by: Max Uppenkamp <max.uppenkamp@inform-software.com>	2020-09-09 12:39:13 +02:00
Malte Pietsch	b1cdc68d6c	Update README.rst	2020-09-09 11:47:17 +02:00
Malte Pietsch	d821e8d260	Bump FARM version to 0.4.7 (#340 )	2020-09-04 17:29:14 +02:00
Tanay Soni	26e4e7ad7a	Use port 8000 in docs (#357 )	2020-09-04 09:54:24 +02:00

... 68 69 70 71 72 ...

3803 Commits