haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-02-07 07:22:03 +00:00

Author	SHA1	Message	Date
Branden Chan	da97d81305	Change variable names (#1286 )	2021-07-14 14:03:34 +02:00
Julian Risch	2a90471c73	Encapsulate tutorial code in method (#1266 )	2021-07-09 17:08:19 +02:00
Branden Chan	efc03f72db	Make PreProcessor.process() work on lists of documents (#1163 ) * Add process_batch method * Rename methods * Fix doc string, satisfy mypy * Fix mypy CI * Fix typp * Update tutorial * Fix argument name * Change arg name * Incorporate reviewer feedback	2021-06-23 18:13:51 +02:00
Branden Chan	7dbd58f6be	Add about sections (#1195 )	2021-06-14 18:37:00 +02:00
vblagoje	2a5882578a	Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086 ) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com>	2021-06-14 17:53:43 +02:00
Branden Chan	783893c3d2	Tutorial update (#1166 ) * Add header / footer * Add Milvus example * Generate md files * Fix mypy CI	2021-06-11 11:09:15 +02:00
Branden Chan	aa6f768efa	Prevent merge of same questions on different documents during evaluation (#1119 ) * Fix duplicate question in Reader.eval() * Add duplicate question support in document store * Support duplicate questions in retriever eval * Update tutorial * Rename key_tuple * Change error message * Add warning when more than 6 labels * Allow for label grouping options * Add support for aggregating by label meta * Satisfy mypy * Fix duplicate question in Reader.eval() * Add duplicate question support in document store * Support duplicate questions in retriever eval * Update tutorial * Rename key_tuple * Change error message * Add warning when more than 6 labels * Allow for label grouping options * Add support for aggregating by label meta * Satisfy mypy * Make label field flexible, add docstrings * Satisfy mypy * Fix failing tests * Adjust docstring * Fix tutorial Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-06-02 12:09:03 +02:00
Julian Risch	a7ba146246	Removed comma from last item in json list (#1114 )	2021-06-01 12:32:21 +02:00
Julian Risch	40ceaf418a	Fixing grpcio-tools to version of colab's pre-installed grpcio (#1113 )	2021-05-31 19:09:10 +02:00
Julian Risch	84c34295a1	Re-ranking component for document search without QA (#1025 ) * Adding ranker similar to retriever and reader * Sort documents according to query-document similarity scores * Reranking and model training runs for small example * Added EvalRanker node * Calculate recall@k in EvalRetriever and EvalRanker nodes * Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers * Added mean reciprocal rank as metric for EvalDocuments * Fix bug that appeared when ranking documents with same score * Remove commented code for unimplmented eval() of Ranker node * Add documentation of k parameter in EvalDocuments * Add Ranker docu and renaming top_k param	2021-05-31 15:31:36 +02:00
Branden Chan	9827b3652e	Pipelines tutorial (#991 ) * Start Pipelines tutorial * Make Tutorial 11 run locally * Add colab compatibility * Fix pip install * Add ES install from source * Add ES install from source * Add pygraphviz installation * Incorporate reviewer feedback * Ensure print_answers() works for Generator output * Fix typo	2021-04-29 17:31:28 +02:00
Branden Chan	9626c0d65e	Update Documentation (#976 ) * Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add indentation (pydoc-markdown 3.10.1) * Comment out metadata * Remove Finder deprecation message * Remove Finder in FAQ * Update tutorial link * Incorporate reviewer feedback * Regen api docs * Add type annotations Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-22 16:45:29 +02:00
Julian Risch	d38c07e0ee	knowledge graph example (#934 ) * Add knowledge graph module * Fix type hint * Add graph retriver module * Change type annotations, change return format * Add graph retriever that executes questions as sparql queries * Linking only those entities that are in the knowledge graph * Added logging and using relations extracted from Knowledge graph for linking * Preventing entity linking from linking the same token to multiple entities * Pruning triples that have no variables for select and count queries * Support knowledge graphs with Pipelines * Add text2sparql * Entity linking and relation linking consider more special cases now based on evaluation on labelled data * Separating example code from KGQA implementation * Add eval on combined extarctive and kg questions * Remove references to hp-test * Add fields sparql_query and long_answer_list to metadata * Removing modular Question2SPARQL approach * Removing additional classes used for modular kgqa approach * preparing lcquad data * change graph db * Translating namespaces in knowledge graph queries * Creating graphdb index and loading triples from .ttl file * Fetching graph config files, triples and model from S3 * Fix incompatibility issues with BaseGraphRetriever and BaseComponent * Removing unused utility functions * Adding doc strings and tutorial header * Adding sparqlwrapper dependency * Moving tutorial header * Sorting tutorials by number within name of notebook * Add latest docstring and tutorial changes * Creating test cases for knowledge graph * Changing knowledge graph example to harry potter * Add latest docstring and tutorial changes * Adapting the tutorial notebook to harry potter example * Add GraphDB fixture for tests * Add latest docstring and tutorial changes * Added GraphDB docker launch to CI * Use correct GraphDB fixture * Check if GraphDB instance is already running * Renaming question/query and incorporating other feedback from Timo and Tanay * Removed type annotation * Add latest docstring and tutorial changes Co-authored-by: oryx1729 <oryx1729@protonmail.com> Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-08 14:05:33 +02:00
Branden Chan	d77152c469	WIP: Add evaluation nodes for Pipelines (#904 ) * Add main eval fns * WIP: make pipeline_eval.py run * Fix typo * Add support for no_answers * Add latest docstring and tutorial changes * Working pipeline eval * Add timing of nodes * Add latest docstring and tutorial changes * Refactor and clean * Update tutorial script * Set default params * Update tutorials * Fix indent * Add latest docstring and tutorial changes * Address mypy issues * Add test * Fix mypy error * Clear outputs * Add doc strings * Incorporate reviewer feedback * Add latest docstring and tutorial changes * Revert query counting * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-01 17:35:18 +02:00
Timo Moeller	f954f0db38	Fix top_k param in RAG tutorials (#906 ) * Fix top_k param * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-18 18:00:21 +01:00
Branden Chan	24d0c4d42d	Fix DPR training batch size (#898 ) * Adjust batch size * Add latest docstring and tutorial changes * Update training results * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-17 18:33:59 +01:00
brandenchan	03cda26d85	Fix link in Tutorial 8	2021-02-15 10:45:27 +01:00
Malte Pietsch	e91518ee00	Update tutorials (torch versions, ES version, replace Finder with Pipeline) (#814 ) * remove manual torch install on colab * update elasticsearch version everywhere to 7.9.2 * fix FAQPipeline * update tutorials with new pipelines * Add latest docstring and tutorial changes * revert faqpipeline change. fix field names in tutorial 4 * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-09 14:56:54 +01:00
Branden Chan	8d47a71b00	Fix Tutorial 9 (#734 ) * Add package download * Change dev to train file	2021-01-14 10:56:58 +01:00
Julian Risch	3331608e03	Adding a guard that prevents the tutorial code from being executed in every subprocess when using multiprocessing on windows (#729 )	2021-01-13 18:17:54 +01:00
Branden Chan	7376185b65	Create DPR training tutorial (#708 ) * WIP: Start DPR training tutorial * Create basics of DPR Train tutorial * Update documentation * Allow DPR to be initialized without document store * WIP: Add param descriptions to DPR notebook * Clean tutorial * Improve loading * Make doc store optional when loading DPR * Satisfy mypy type check * Add links * Add tutorial header * Add colab badge * Clear outputs * Incorporate reviewer feedback * WIP: Start DPR training tutorial * Create basics of DPR Train tutorial * Update documentation * Allow DPR to be initialized without document store * WIP: Add param descriptions to DPR notebook * Clean tutorial * Improve loading * Make doc store optional when loading DPR * Satisfy mypy type check * Add links * Add tutorial header * Add colab badge * Clear outputs * Incorporate reviewer feedback * Add readme links * Regenerate tutorials * Add excitement * Fix typo * Fix hard negatives comment * Wrap tutorial for windows users * Fix mypy issue	2021-01-13 10:33:55 +01:00
Branden Chan	bb8aba18e0	Create Preprocessing Tutorial (#706 ) * WIP: First version of preprocessing tutorial * stride renamed overlap, ipynb and py files created * rename split_stride in test * Update preprocessor api documentation * define order for markdown files * define order of modules in api docs * Add colab links * Incorporate review feedback Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>	2021-01-06 15:54:05 +01:00
Malte Pietsch	94b7345505	Make use_gpu=True the default in tutorials (#692 ) * enable gpu args in tutorials * add info box for gpu runtime on colab	2020-12-22 07:58:12 +01:00
Branden Chan	d8154939fc	Scale dot product into probabilities (#667 ) * scale dot product * Add tip in documentation * Add recommendation boxes * WIP: Use similarity attribute in all doc stores * Implement similarity for InMemoryDS * Add FAISS support * Clean printout * Update documentation * Implement document field map	2020-12-11 12:10:24 +01:00
Branden Chan	8c904d79d6	Fix links (#663 )	2020-12-08 10:28:31 +01:00
Tanay Soni	8e52b48e1d	Add pipelines for GenerativeQA & FAQs (#645 )	2020-12-03 10:27:06 +01:00
Branden Chan	79555148ac	Add link to FAISS Info in documentation (#643 ) * Add link to FAISS info * Clean link	2020-12-02 15:24:22 +01:00
Branden Chan	1e8af84ecc	Make more changes to documentation (#578 ) * First batch of changes * Add RAG tutorial links * Prettify RAG tutorial * draft of generator doc * Add text * Complete generator page * Create optimization section * Split intro * Fix formatting tutorial 7	2020-11-19 14:58:27 +01:00
Branden Chan	e72f4f4299	Update Colab Torch Version (#576 ) * Update torch version * Update torch version	2020-11-11 13:55:10 +01:00
kolk	72b637ae6d	DensePassageRetriever: Add Training, Refactor Inference to FARM modules (#527 ) * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * mypy errors fix * DPR instantiation bugfix * Fix DPR init in RAG Tutorial Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-30 19:22:06 +01:00
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Tanay Soni	db4151bbc0	Fix scoring in Elasticsearch for dot product (#517 )	2020-10-23 17:50:49 +02:00
bogdankostic	f62117c232	Add urllib version requirement to colab notebooks (#509 )	2020-10-23 10:43:58 +02:00
Lalit Pagaria	63c12371b9	Change arg "model" to "model_name_or_path" in TransformersReader (#510 ) * Consistent parameter naming for TransformersReader along with removing unused imports as well. * Addressing review comments	2020-10-21 17:15:35 +02:00
Malte Pietsch	bdbd1b323b	Add create_index and similarity metric to api config (#493 ) * make creation of label index optional * add params for rest api * reset tutorial flag	2020-10-15 18:41:36 +02:00
Guillim	fb5db59590	Remove useless line from Tutorial4_FAQ_style_QA (#416 ) * Update Tutorial4_FAQ_style_QA.py Used to be useful when `.apply()` was necessary, but not any longer * Update Tutorial4_FAQ_style_QA.ipynb	2020-09-22 09:01:04 +02:00
Malte Pietsch	747e0c0046	Bump FARM to 0.4.9. Remove custom torch installation from colab tutorials (#404 )	2020-09-21 10:26:12 +02:00
Malte Pietsch	271ff30262	fix type casting of embeddings for tutorial 4 (#402 )	2020-09-18 18:10:50 +02:00
Branden Chan	7fdb85d63a	Create documentation website (#272 ) * Skeleton of doc website * Flesh out documentation pages * Split concepts into their own rst files * add tutorial rsts * Consistent level 1 markdown headers in tutorials * Change theme to readthedocs * Turn bullet points into prose * Populate sections * Add more text * Add more sphinx files * Add more retriever documentation * combined all documenations in one structure * rename of src to _src as it was ignored by git * Incorporate MP2's changes * add benchmark bar charts * Adapt docstrings in Readers * Improvements to intro, creation of glossary * Adapt docstrings in Retrievers * Adapt docstrings in Finder * Adapt Docstrings of Finder * Updates to text * Edit text * update doc strings * proof read tutorials * Edit text * Edit text * Add stacked chart * populate graph with data * Switch Documentation to markdown (#386) * add way to generate markdown files to sphinx * changed from rst to markdown and extended sphinx for it * fix spelling * Clean titles * delete file * change spelling * add sections to document store usage * add basic rest api docs * fix readme in setup.py * Update Tutorials * Change section names * add windows note to pip install * update intro * new renderer for markdown files * Fix typos * delete dpr_utils.py * fix windows note in get started * Fix docstrings * deleted rest api docs in api * fixed typo * Fix docstring * revert readme to rst * Fix readme * Update setup.py Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com> Co-authored-by: Bogdan Kostić <bogdankostic@web.de> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-09-18 12:57:32 +02:00
Malte Pietsch	9727829cc6	Rename and restructure modules (database, indexing, schemas) (#379 ) * rename database to documentstore * move document, label, multilabel to haystack/schema.py * rename documentstore -> document_store * split indexing modules -> file_converter + preprocessor * fix order of imports * Update tutorial notebooks * fix torch version in tutorial 4	2020-09-16 18:33:23 +02:00
Malte Pietsch	bde33ddaaa	Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 (#376 ) * bump farm version to 0.4.8 * move back to original transformers pipeline * remove dpr_utils and use transformers implementation * update tutorial notebooks	2020-09-16 17:24:40 +02:00
brandenchan	b44b1ac6ec	Set top_k_per_candidate	2020-08-26 12:03:56 +02:00
kolk	f2b6cc761b	Refactor DPR from FB to Transformers codebase (#308 ) * change_HFBertEncoder to transformers DPREncoder * Removed BertTensorizer * model download relative path * Refactor model load * Tutorial5 DPR updated * fix print_eval_results typo * copy transformers DPR modules in dpr_utils and test * transformer v3.0.2 import errors fixed * remove dependency of DPRConfig on attribute use_return_tuple * Adjust transformers 302 locally to work with dpr * projection layer removed from DPR encoders * fixed mypy errors * transformers DPR compatible code added * transformers DPR compatibility added * bug fix in tutorial 6 notebook * Docstring update and variable naming issues fix * tutorial modified to reflect DPR variable naming change * title addition to passage use-cases handled * modified handling untitled batch * resolved mypy errors * typos in docstrings and comments fixed * cleaned DPR code and added new test cases * warnings added for non-bert model [SEP] token removal * changed warning to logger warning * title mask creation refactored * bug fix on cuda issues * tutorial 6 instantiates modified DPR * tutorial 5 modified * tutorial 5 ipython notebook modified: DPR instantiation * batch_size added to DPR instantiation * tutorial 5 jupyter notebook typos fixed * improved docstrings, fixed typos * Update docstring Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-08-25 20:16:00 +05:30
Branden Chan	a54d6a5bd7	Make Tutorials Work on Colab GPUs (#322 ) * Add pip install torch+cu	2020-08-19 14:52:50 +02:00
bogdankostic	72b1013560	Restructure update embeddings (#304 ) * Restructure update embeddings * Adapt FAISSDocStore * Adapt test and tutorial Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>	2020-08-18 14:04:31 +02:00
brandenchan	8a3eca05c3	Change to retriever eval top_k to match notebook	2020-08-18 11:39:49 +02:00
Tanay Soni	200bb4bafd	Refactor the DPR tutorial to use FAISS (#317 )	2020-08-17 13:30:02 +02:00
Timo Moeller	72e6867278	Aggregate label objects for same questions (#292 ) * Add aggregate labels obj, use in retriever eval function * Change launch ES param * Move aggregation from ES document store to base class * Fix type annotations	2020-08-07 11:24:41 +02:00
Malte Pietsch	29a15c0d59	Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243 )	2020-07-31 11:34:06 +02:00
Malte Pietsch	5b1be233d0	Update Tutorial 4	2020-07-17 19:31:00 +02:00

1 2

96 Commits