haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-16 18:13:54 +00:00

Author	SHA1	Message	Date
Markus Paff	39845c0624	Automate updates docstrings tutorials (#1461 ) * remove not needed githab actions and reactivate docstrings and tutorial generation * test workflow * update pydoc version * update python version * update watchdog * move to latest version pydoc-markdown * remove version check * Add latest docstring and tutorial changes * remove test workflow * test for param docstrings * pin pydoc-markdown version * add test workflow * pin watchdog version * Add latest docstring and tutorial changes * update original workflow and delete test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-09-17 13:44:31 +02:00
Bob van Luijt	c0cc8bc80f	Bump Weaviate version to 1.7.0 (#1412 ) * Bump Weaviate * Bump Weaviate * Bump Weaviate client * Bump Weaviate * Revert client version There is a change in the client API that needs to be addressed before bumping its version	2021-09-05 09:28:55 +02:00
Malte Pietsch	f3e7074c13	Remove stale bot	2021-09-03 17:39:24 +02:00
Malte Pietsch	6093bf9ff6	Fix Github action	2021-09-01 16:50:29 +02:00
Shahrukh Khan	4822536886	Add ImageToTextConverter and PDFToTextOCRConverter that utilize OCR (#1349 ) * add image.py converter * add PDFtoImageConverter * add init to PDFtoImageConverter and classes to __init__ * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * revert change in base.py in file_conv * Update base.py * Update pdf.py * add ocr file_converter testcase & update dockerfile * fix tesseract exception message typo * fix _image_to_text doctstring * add tesseract installation to CI * add tesseract installation to CI * add content test for PDF OCR converter * update PDFToTextOCRConverter constructor doctsring * replace image files with tmp paths for image.py convert * replace image files with tmp paths for image.py convert * Update README.md Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-09-01 16:42:25 +02:00
Markus Paff	be8d305190	Editing docs read.me for new docs website workflow (#1372 ) * editing docs read.me for new docs website workflow * added new links to docs	2021-08-30 14:59:40 +02:00
oryx1729	bafa1b46de	Add Ray integration for Pipelines (#1255 )	2021-08-02 14:51:24 +02:00
Bob van Luijt	8dae844447	Bump Weaviate version to 1.5 (#1287 ) * bump Weaviate version to 1.5 * bump Weaviate version to 1.5	2021-07-15 08:26:22 +02:00
Malte Pietsch	5e23e72f31	Update issue templates	2021-06-30 12:12:07 +02:00
venuraja79	49886f88f0	Integrate Weaviate as another DocumentStore (#1064 ) * Annotation Tool: data is not persisted when using local version #853 * First version of weaviate * First version of weaviate * First version of weaviate * Updated comments * Updated comments * ran query, get and write tests * update embeddings, dynamic schema and filters implemented * Initial set of tests and fixes * Tests added for update_embeddings and delete documents * introduced duplicate documents fix * fixed mypy errors * Added Weaviate to requirements * Fix the weaviate docker env variables * Fixing test dependencies for now * Created weaviate test marker and fixed query * Update docstring * Add documentation * Bump up weaviate version * Bump up weaviate version in documentation * Bump up weaviate version in documentation * Updgrade weaviate version Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-06-10 09:43:53 +02:00
Lalit Pagaria	db17d73a82	Fixing issues caused due to mypy upgrade (#1165 )	2021-06-09 16:24:39 +02:00
Malte Pietsch	022f8586f6	Remove Python 3.6 support (#1059 ) * Remove Python 3.6 support * change cache key for CI	2021-06-01 15:24:44 +02:00
Malte Pietsch	25d1122773	Upgrade milvus to 1.1.0 (#1066 ) * upgrade milvus in CI to 1.1 * fix pymilvus * loose pymilvus requirement again * add date to cache keys * fix date var in action	2021-05-17 17:27:34 +02:00
Malte Pietsch	b1e8ebf81a	Create pull_request_template.md	2021-04-22 15:48:39 +02:00
oryx1729	8c1e411380	Fix update_embeddings() for FAISSDocumentStore (#978 )	2021-04-21 09:56:35 +02:00
Julian Risch	d38c07e0ee	knowledge graph example (#934 ) * Add knowledge graph module * Fix type hint * Add graph retriver module * Change type annotations, change return format * Add graph retriever that executes questions as sparql queries * Linking only those entities that are in the knowledge graph * Added logging and using relations extracted from Knowledge graph for linking * Preventing entity linking from linking the same token to multiple entities * Pruning triples that have no variables for select and count queries * Support knowledge graphs with Pipelines * Add text2sparql * Entity linking and relation linking consider more special cases now based on evaluation on labelled data * Separating example code from KGQA implementation * Add eval on combined extarctive and kg questions * Remove references to hp-test * Add fields sparql_query and long_answer_list to metadata * Removing modular Question2SPARQL approach * Removing additional classes used for modular kgqa approach * preparing lcquad data * change graph db * Translating namespaces in knowledge graph queries * Creating graphdb index and loading triples from .ttl file * Fetching graph config files, triples and model from S3 * Fix incompatibility issues with BaseGraphRetriever and BaseComponent * Removing unused utility functions * Adding doc strings and tutorial header * Adding sparqlwrapper dependency * Moving tutorial header * Sorting tutorials by number within name of notebook * Add latest docstring and tutorial changes * Creating test cases for knowledge graph * Changing knowledge graph example to harry potter * Add latest docstring and tutorial changes * Adapting the tutorial notebook to harry potter example * Add GraphDB fixture for tests * Add latest docstring and tutorial changes * Added GraphDB docker launch to CI * Use correct GraphDB fixture * Check if GraphDB instance is already running * Renaming question/query and incorporating other feedback from Timo and Tanay * Removed type annotation * Add latest docstring and tutorial changes Co-authored-by: oryx1729 <oryx1729@protonmail.com> Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-08 14:05:33 +02:00
Lalit Pagaria	e904deefa7	Add Markdown file convertor (#875 )	2021-03-23 16:31:26 +01:00
oryx1729	c4607cbd98	Revamp CI (#825 )	2021-02-12 13:38:54 +01:00
Malte Pietsch	2b05e801c3	Fix pdftotext dependency in CI (#788 ) * Fix pdftotext dependency in CI * udpate xpdf version * Fix version	2021-01-29 16:07:37 +01:00
Lalit Pagaria	9f7f95221f	Milvus integration (#771 ) * Initial commit for Milvus integration * Add latest docstring and tutorial changes * Updating implementation of Milvus document store * Add latest docstring and tutorial changes * Adding tests and updating doc string * Add latest docstring and tutorial changes * Fixing issue caught by tests * Addressing review comments * Fixing mypy detected issue * Fixing issue caught in test about sorting of vector ids * fixing test * Fixing generator test failure * update docstrings * Addressing review comments about multiple network call while fetching embedding from milvus server * Add latest docstring and tutorial changes * Ignoring mypy issue while converting vector_id to int Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-29 13:29:12 +01:00
Markus Paff	0b583b8972	Generate docstrings and deploy to branches to Staging (Website) (#731 ) * test pre commit hook * test status * test on this branch * push generated docstrings and tutorials to branch * fixed syntax error * Add latest docstring and tutorial changes * add files before commit * catch commit error * separate generation from deployment * add deployment process for staging * add current branch to payload Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-01-21 11:01:09 +01:00
Markus Paff	3af3ee1a12	Automate docstring and tutorial generation with every push to master (#718 ) * automate docstring and tutorial generation with every push to master * test CI for current branch * fixed yaml syntax * add setupttools to install process * checkout repo * fixed command for shell script * install wheel as it is needed for CI * install mkdocs * test without shell script * use package from github actions * test other configuration * back to right config * cleaning script	2021-01-11 16:25:43 +01:00
Lalit Pagaria	75d0ebd076	Add Summarizer (standalone + node in custom pipelines + SearchSummarizationPipeline) (#698 ) * Integration of SummarizationQAPipeline with Haystack. * Moving summarizer tests because of OOM issue * Fixing typo * Splitting summarizer test in separate ci step * Removing sysctl configuration as we already running elastic search in docker container * fixing mypy issue * update parameter names and docstrings * update parameter names in BaseSummarizer * rename pipeline * change return type of summarizer from answer to document * change scope of doc store fixture * revert scope * temp. disable test_faiss_index_save_and_load() * fix mypy. change order for mypy in CI Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-08 14:29:46 +01:00
Malte Pietsch	5db73d4107	Update stale bot	2021-01-05 08:29:24 +01:00
Tanay Soni	369e237fd4	Add DocumentStore for Open Distro Elasticsearch (#676 )	2020-12-15 09:28:40 +01:00
Tanay Soni	33fe597949	Cleanup Pytest Fixtures (#639 )	2020-12-14 18:15:44 +01:00
Tanay Soni	8e52b48e1d	Add pipelines for GenerativeQA & FAQs (#645 )	2020-12-03 10:27:06 +01:00
Ky-Anh Huynh	0edd127f35	Add formatting checks for shell scripts (#627 )	2020-11-26 14:36:35 +01:00
Malte Pietsch	0acafc403a	Automate benchmarks via CML (#518 ) * initial test cml * Update cml.yaml * WIP test workflow * switch to general ubuntu ami * switch to general ubuntu ami * disable gpu for tests * rm gpu infos * rm gpu infos * update token env * switch github token * add postgres * test db connection * fix typo * remove tty * add sleep for db * debug runner * debug removal postgres * debug: reset to working commit * debug: change github token * switch to new bot token * debug token * add back postgres * adjust network runner docker * add elastic * fix typo * adjust working dir * fix benchmark execution * enable s3 downloads * add query benchmark. fix path * add saving of markdown files * cat md files. add faiss+dpr. increase n_queries * switch to GPU instance * switch availability zone * switch to public aws DL ami * increase volume size * rm faiss. fix error logging * save markdown files * add reader benchmarks * add download of squad data * correct reader metric normalization * fix newlines between reports * fix max_docs for reader eval data. remove max_docs from ci run config * fix mypy. switch workflow trigger * try trigger for label * try trigger for label * change trigger syntax * debug machine shutdown with test workflow * add es and postgres to test workflow * Revert "add es and postgres to test workflow" This reverts commit 6f038d3d7f12eea924b54529e61b192858eaa9d5. * Revert "debug machine shutdown with test workflow" This reverts commit db70eabae8850b88e1d61fd79b04d4f49d54990a. * fix typo in action. set benchmark config back to original	2020-11-18 18:28:17 +01:00
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Tanay Soni	db4151bbc0	Fix scoring in Elasticsearch for dot product (#517 )	2020-10-23 17:50:49 +02:00
Branden Chan	1cebcb7dda	Create time and performance benchmarks for all readers and retrievers (#339 ) * add time and perf benchmark for es * Add retriever benchmarking * Add Reader benchmarking * add nq to squad conversion * add conversion stats * clean benchmarks * Add link to dataset * Update imports * add first support for neg psgs * Refactor test * set max_seq_len * cleanup benchmark * begin retriever speed benchmarking * Add support for retriever query index benchmarking * improve reader eval, retriever speed benchmarking * improve retriever speed benchmarking * Add retriever accuracy benchmark * Add neg doc shuffling * Add top_n * 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging * Add models to sweep * add option for faiss index type * remove unneeded line * change faiss to faiss_flat * begin automatic benchmark script * remove existing postgres docker for benchmarking * Add data processing scripts * Remove shuffle in script bc data already shuffled * switch hnsw setup from 256 to 128 * change es similarity to dot product by default * Error includes stack trace * Change ES default timeout * remove delete_docs() from timing for indexing * Add support for website export * update website on push to benchmarks * add complete benchmarks results * new json format * removed NaN as is not a valid json token * fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches * update benchmarks for hnsw 128,20,80 * don't delete full index in delete_all_documents() * update texts for charts * update recall column for retriever * change scale and add units to desc * add units to legend * add axis titles. update desc * add html tags Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>	2020-10-12 13:34:42 +02:00
Markus Paff	5d1e208186	Create deploy_website.yml (#450 ) Creates a dispatch event on push to master so that we can trigger a build in haystack-website. The website should always have the latest docs version	2020-09-29 19:49:04 +02:00
Dany	f0222ecd27	Add Tika in CI	2020-08-17 11:35:33 +02:00
Tanay Soni	1637ce1184	Revert "Add Tika Converter (#314 )" This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.	2020-08-17 11:13:52 +02:00
Tanay Soni	5ef59b1901	Add Tika Converter (#314 )	2020-08-14 14:13:59 +02:00
Malte Pietsch	5023fde2be	Update issue templates	2020-07-13 10:45:58 +02:00
Tanay Soni	98f1a3f9a7	Add type hints and mypy checks (#138 )	2020-06-10 17:22:37 +02:00
Tanay Soni	180dc8cbd6	Start Elasticsearch with a Github Action (#142 )	2020-06-09 12:46:15 +02:00
Tanay Soni	160345f3d5	Update build workflow	2020-06-09 11:45:25 +02:00
Tanay Soni	c4592c1b9a	Create ci.yml	2020-06-09 11:36:27 +02:00
Malte Pietsch	0247e362f7	add stalebot (#131 )	2020-06-05 17:55:06 +02:00

... 7 8 9 10 11

542 Commits