haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-13 20:10:45 +00:00

Author	SHA1	Message	Date
Malte Pietsch	df13a6830d	Update annotation docs for website (#505 ) * update annotation docs for website * add md file for docs * add user manual	2020-11-03 11:24:06 +01:00
Guillim	7a43d1a72d	Update readme path in Dockerfile (#537 ) * Update Dockerfile forgot to change the extension i believe * Update Dockerfile * Update Dockerfile-GPU	2020-11-03 10:19:18 +01:00
Malte Pietsch	f0969d8310	Update setup.py	2020-11-02 20:15:10 +01:00
Malte Pietsch	c363fefc6e	New readme (#534 ) * WIP readme to md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * delete rst readme	2020-11-02 20:03:22 +01:00
Malte Pietsch	50709a3f9d	Fix retriever mAP benchmarks	2020-11-02 19:55:58 +01:00
Lalit Pagaria	5d45992c84	Removing (deprecation) warnings (#530 ) 1. Few warnings need fix in FARM 2. Can't remove warning from docx library.	2020-11-02 15:18:43 +01:00
Yaser Martinez Palenzuela	f5419163e7	Add annotation tool manual to readme (#523 ) * Update README.md * Update README.md Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-11-02 10:51:50 +01:00
Branden Chan	eb9e9ceca2	Fix FARMReader.eval( ) handling of no_answers (#531 ) * Fix handling of no_answers * Remove commented out code * Remove extra spaces	2020-10-30 19:22:55 +01:00
kolk	72b637ae6d	DensePassageRetriever: Add Training, Refactor Inference to FARM modules (#527 ) * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * mypy errors fix * DPR instantiation bugfix * Fix DPR init in RAG Tutorial Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-30 19:22:06 +01:00
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Branden Chan	fbf41e53ff	Merge pull request #529 from deepset-ai/fix_website Change metric to queries per second on benchmarks webpage	2020-10-29 10:40:04 +01:00
Branden Chan	7a9f32f264	Fix template	2020-10-29 10:30:03 +01:00
Branden Chan	3793205aa3	Merge branch 'master' into fix_website	2020-10-29 10:29:25 +01:00
Branden Chan	2ba5417f8e	Fix metric for benchmarks website page	2020-10-29 10:26:48 +01:00
bogdankostic	18d315d61a	Make returning predictions in evaluation possible (#524 ) * Make returning preds in evaluation possible * Make returning preds in evaluation possible * Add automated check if eval dict contains predictions	2020-10-28 09:55:31 +01:00
Branden Chan	4fa5d9c3eb	Merge pull request #522 from deepset-ai/automate_benchmarks Add --ci and --update-json to CLI for benchmarks	2020-10-27 12:56:47 +01:00
Branden Chan	8c4865ee5f	Rename n_docs variable to max_docs	2020-10-27 12:45:15 +01:00
Branden Chan	7c81dfdc3a	Address reviewer comments	2020-10-27 12:41:11 +01:00
Branden Chan	d5cb227909	Merge branch 'master' into automate_benchmarks	2020-10-27 11:50:49 +01:00
Lalit Pagaria	9521e180b3	Standardize behavior of DocumentStores to return embeddings (#514 ) * Adding support to return embedding along with other result via query_by_embedding function * Adding test case to check return embedding * By default for all tests but DPR tests: disable return_embedding flag * Reducing None test case and fixing query_by_embedding of ElasticsearchDocumentStore when it updating self.excluded_meta_data directly * Fixing mypy reported issue	2020-10-27 08:33:39 +01:00
Lalit Pagaria	abda994116	Pytest fix memory leak and put pytest marker on slow tests (#520 ) * Clear faiss_index during teardown * Marking slow test with pytest markers. So In future these test can be optimized. Also command line option can be added to skip them refer https://pytest.org/en/stable/example/simple.html#control-skipping-of-tests-according-to-command-line-option * Fixing test	2020-10-26 19:19:10 +01:00
Tanay Soni	db4151bbc0	Fix scoring in Elasticsearch for dot product (#517 )	2020-10-23 17:50:49 +02:00
Timo Moeller	def8fd617a	Make title info optional when evaluating on QA data (#494 ) * Add check for title present in QA file and make title extraction optional * Make missing title None	2020-10-23 11:06:56 +02:00
bogdankostic	f62117c232	Add urllib version requirement to colab notebooks (#509 )	2020-10-23 10:43:58 +02:00
Branden Chan	fbacdfd263	Add logging of error, add n_docs assert	2020-10-22 15:45:46 +02:00
Branden Chan	b0483cfd99	add readme	2020-10-22 15:32:56 +02:00
Tanay Soni	3bec264d76	Add filters for document count (#512 )	2020-10-22 12:42:13 +02:00
brandenchan	87e5f06fa8	add automatic json update	2020-10-21 17:59:44 +02:00
brandenchan	d3743d00e9	Merge branch 'master' into automate_benchmarks	2020-10-21 17:48:10 +02:00
Lalit Pagaria	63c12371b9	Change arg "model" to "model_name_or_path" in TransformersReader (#510 ) * Consistent parameter naming for TransformersReader along with removing unused imports as well. * Addressing review comments	2020-10-21 17:15:35 +02:00
Malte Pietsch	956543e239	Restructure checks in PreProcessor (#504 ) * restructure checks * fix variable name * Fix test	2020-10-20 06:43:59 +02:00
Malte Pietsch	c13abba6d6	Better defaults for PreProcessor & update docstring	2020-10-19 17:37:58 +02:00
Sanjay Kamath	dc16258dab	Updated the example code in readme for Indexing PDF / Docx files (#502 ) * Updated the example code to Indexing PDF / Docx files The example code was referencing a structure haystack.indexing which does not exist anymore. Modified this and the function "extract_pages" with "convert" * Update converter example in readme Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-19 15:04:33 +02:00
Malte Pietsch	11a3976945	update deletes. fix arg in run.py	2020-10-19 14:40:26 +02:00
Malte Pietsch	3434d5205d	Update doc string for ElasticsearchDocumentStore.write_documents() & sync markdown files (#501 ) * update doc string for ElasticsearchDocumentStore.write_documents() * update all markdowns with latest docstrings	2020-10-19 13:56:38 +02:00
Markus Paff	2531c8e061	Add versioning docs (#495 ) * add time and perf benchmark for es * Add retriever benchmarking * Add Reader benchmarking * add nq to squad conversion * add conversion stats * clean benchmarks * Add link to dataset * Update imports * add first support for neg psgs * Refactor test * set max_seq_len * cleanup benchmark * begin retriever speed benchmarking * Add support for retriever query index benchmarking * improve reader eval, retriever speed benchmarking * improve retriever speed benchmarking * Add retriever accuracy benchmark * Add neg doc shuffling * Add top_n * 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging * Add models to sweep * add option for faiss index type * remove unneeded line * change faiss to faiss_flat * begin automatic benchmark script * remove existing postgres docker for benchmarking * Add data processing scripts * Remove shuffle in script bc data already shuffled * switch hnsw setup from 256 to 128 * change es similarity to dot product by default * Error includes stack trace * Change ES default timeout * remove delete_docs() from timing for indexing * Add support for website export * update website on push to benchmarks * add complete benchmarks results * new json format * removed NaN as is not a valid json token * versioning for docs * unsaved changes * cleaning * cleaning * Edit format of benchmarks data * update also jsons in v0.4.0 Co-authored-by: brandenchan <brandenchan@icloud.com> Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-19 11:46:51 +02:00
Malte Pietsch	4a77dc7a02	Allow null filter value in api (#497 )	2020-10-16 18:44:15 +02:00
Malte Pietsch	5a885fc2d1	Fix meta data = None in PreProcessor (#496 )	2020-10-16 17:17:26 +02:00
Lalit Pagaria	b9da789475	Add Elasticsearch Query DSL compliant Query API (#471 )	2020-10-16 13:25:31 +02:00
brandenchan	b9bb8d6cc1	Fix try except	2020-10-16 12:16:32 +02:00
Malte Pietsch	5555274170	Make creation of label index optional in feedback and file_upload api	2020-10-15 19:03:58 +02:00
Malte Pietsch	bdbd1b323b	Add create_index and similarity metric to api config (#493 ) * make creation of label index optional * add params for rest api * reset tutorial flag	2020-10-15 18:41:36 +02:00
brandenchan	6d60cc9451	add automation pipeline	2020-10-15 18:12:17 +02:00
Malte Pietsch	ceb5c87da0	Make creation of label index optional (#490 )	2020-10-15 14:40:59 +02:00
Tanay Soni	974b37eded	Add PreProcessor to simplify splitting and cleaning of docs (#473 ) * Add PreProcessing * Adjust PDF conversion tests * Add tests for Preprocessing * Add requirement * Fix tests * Ignore decoding errors for TextConverter * Rename split_size to split_length * Adjust tests Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-15 10:42:08 +02:00
Lalit Pagaria	2e9f3c1512	Fix update_embeddings function in FAISSDocumentStore and add retriever fixture in tests (#481 ) * 1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs. 2. cleaning up tests by adding fixture for retriever. * TfidfRetriever need document store with documents during initialization as it call fit() function in constructor so fixing it by checking self.paragraphs of None * Fix naming of retriever's fixture (embedded to embedding and tfid to tfidf)	2020-10-14 16:15:04 +02:00
Tanay Soni	ecaf7b8f0b	Add psycopg2 requirement	2020-10-14 12:28:33 +02:00
Tanay Soni	3c6a125380	Add deepcopy for meta dicts in answers (#485 )	2020-10-14 12:15:18 +02:00
Lalit Pagaria	12c4dd7b4b	Adjust requirements for Windows (#480 )	2020-10-13 17:12:24 +02:00
Antonio Lanza	3caaf99dcb	Add automatic mixed precision (AMP) support for reader training (#463 ) * Added automatic mixed precision (AMP) support for reader training * Added clearer comments on docstring	2020-10-12 21:53:05 +02:00

... 67 68 69 70 71 ...

3803 Commits