haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-11 23:54:37 +00:00

Author	SHA1	Message	Date
Branden Chan	056be3354b	Add pipelines tutorial (#1013 )	2021-04-29 18:19:20 +02:00
Julian Risch	65f1da00cc	knowledge graph documentation (#979 ) * Create knowledge_graph.md * add doc strings to Text2SparqlRetriever * Add doc strings to GraphDBKnowledgeGraph * Make method calls unambiguous so its clear which class is meant	2021-04-27 16:44:40 +02:00
Markus Paff	cf8a622e35	Streamlit UI Evaluation mode (#920 ) * first running version of eval mode * restructuring, new naming of elements and testing * add new files to Docker, how to start with Haystack reference, remove not needed dependencies * Add latest docstring and tutorial changes * merged changes * fixing bugs after breaking changes from last release * newser version of states in streamlit, more docs for eval mode, eval file as env virable * eval file as env variable Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-22 17:30:17 +02:00
Branden Chan	9626c0d65e	Update Documentation (#976 ) * Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add indentation (pydoc-markdown 3.10.1) * Comment out metadata * Remove Finder deprecation message * Remove Finder in FAQ * Update tutorial link * Incorporate reviewer feedback * Regen api docs * Add type annotations Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-22 16:45:29 +02:00
Markus Paff	0633dae4d0	new docs version (#964 )	2021-04-14 13:40:05 +02:00
Branden Chan	77d4c2ca1c	Benchmark milvus (#850 ) * Add milvus benchmarking support * Add latest docstring and tutorial changes * Edit config * Disable docker interactive mode * Add milvus index type support * Adjust FAISS and Milvus node branching * Remove duplicate in config * Revert method for speedup * Add latest docstring and tutorial changes * Add latest benchmark run * Add latest docstring and tutorial changes * Add json files * Revert "Add latest docstring and tutorial changes" This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923. * Add latest docstring and tutorial changes * Revert "Add latest docstring and tutorial changes" This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b. * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-13 14:54:15 +02:00
Markus Paff	b87daed62b	fixed link to dpr (#962 )	2021-04-13 09:45:04 +02:00
Markus Paff	dfb0282b74	Update milvus links and docstrings (#959 ) * update milvus links and docstrings * Add latest docstring and tutorial changes * new milvus version * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-12 14:38:57 +02:00
Timo Moeller	837dea4e6d	Integrate sentence transformers into benchmarks (#843 ) * Integrate sentence transformers into benchmarks * Add doc store asserts * switch data downloads from s3 client to https. add license info * Fix mypy, revert config Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-09 17:24:16 +02:00
Julian Risch	d38c07e0ee	knowledge graph example (#934 ) * Add knowledge graph module * Fix type hint * Add graph retriver module * Change type annotations, change return format * Add graph retriever that executes questions as sparql queries * Linking only those entities that are in the knowledge graph * Added logging and using relations extracted from Knowledge graph for linking * Preventing entity linking from linking the same token to multiple entities * Pruning triples that have no variables for select and count queries * Support knowledge graphs with Pipelines * Add text2sparql * Entity linking and relation linking consider more special cases now based on evaluation on labelled data * Separating example code from KGQA implementation * Add eval on combined extarctive and kg questions * Remove references to hp-test * Add fields sparql_query and long_answer_list to metadata * Removing modular Question2SPARQL approach * Removing additional classes used for modular kgqa approach * preparing lcquad data * change graph db * Translating namespaces in knowledge graph queries * Creating graphdb index and loading triples from .ttl file * Fetching graph config files, triples and model from S3 * Fix incompatibility issues with BaseGraphRetriever and BaseComponent * Removing unused utility functions * Adding doc strings and tutorial header * Adding sparqlwrapper dependency * Moving tutorial header * Sorting tutorials by number within name of notebook * Add latest docstring and tutorial changes * Creating test cases for knowledge graph * Changing knowledge graph example to harry potter * Add latest docstring and tutorial changes * Adapting the tutorial notebook to harry potter example * Add GraphDB fixture for tests * Add latest docstring and tutorial changes * Added GraphDB docker launch to CI * Use correct GraphDB fixture * Check if GraphDB instance is already running * Renaming question/query and incorporating other feedback from Timo and Tanay * Removed type annotation * Add latest docstring and tutorial changes Co-authored-by: oryx1729 <oryx1729@protonmail.com> Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-08 14:05:33 +02:00
oryx1729	8c68699e1c	Refactor REST APIs to use Pipelines (#922 )	2021-04-07 17:53:32 +02:00
Julian Risch	64ad953c6a	Adding indentation to markup files (#947 )	2021-04-07 11:36:11 +02:00
Timo Moeller	5d2b16f3cc	Update farm version (#936 ) * Update farm version * Add new DPR loading, fix dpr param name * Add QA model confidence as answer probability, fix prams in test	2021-04-01 18:23:05 +02:00
Branden Chan	d77152c469	WIP: Add evaluation nodes for Pipelines (#904 ) * Add main eval fns * WIP: make pipeline_eval.py run * Fix typo * Add support for no_answers * Add latest docstring and tutorial changes * Working pipeline eval * Add timing of nodes * Add latest docstring and tutorial changes * Refactor and clean * Update tutorial script * Set default params * Update tutorials * Fix indent * Add latest docstring and tutorial changes * Address mypy issues * Add test * Fix mypy error * Clear outputs * Add doc strings * Incorporate reviewer feedback * Add latest docstring and tutorial changes * Revert query counting * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-01 17:35:18 +02:00
lewtun	32050fdce3	Add Milvus to the retriever / document store table (#931 )	2021-03-29 09:53:26 +02:00
Guillim	55b7a820d4	Fixing inconsistency (#926 ) Fixing inconsistency between pipe and p in the doc	2021-03-26 18:55:02 +01:00
Timo Moeller	1244d16010	Better default value for mp chunksize (#923 ) * Better default value for mp chunksize * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-25 19:00:45 +01:00
Lalit Pagaria	e904deefa7	Add Markdown file convertor (#875 )	2021-03-23 16:31:26 +01:00
Timo Moeller	f954f0db38	Fix top_k param in RAG tutorials (#906 ) * Fix top_k param * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-18 18:00:21 +01:00
Timo Moeller	7b559fa4e8	Improve dpr conversion (#826 ) * Bugfix dpr conversion * Add latest docstring and tutorial changes * Fix preprocessor changes	2021-03-18 14:51:01 +01:00
oryx1729	e9f0076dbd	Fix execution of Pipelines with parallel nodes (#901 )	2021-03-18 12:41:30 +01:00
Branden Chan	24d0c4d42d	Fix DPR training batch size (#898 ) * Adjust batch size * Add latest docstring and tutorial changes * Update training results * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-17 18:33:59 +01:00
Mohamed Sayed	9ec2406a05	Remove broken tf-idf youtube link (#888 ) The youtube link is of a deleted video.	2021-03-11 14:23:05 +01:00
oryx1729	4b188b8102	Add runtime parameters to component initialization (#873 )	2021-03-04 12:18:12 +01:00
Branden Chan	325a4e4d14	Add Milvus Documentation (#838 ) * First commit * Add latest docstring and tutorial changes * Add DocStore external setup info * fixed tabs * Add Milvus recommendation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>	2021-02-24 11:43:40 +01:00
Malte Pietsch	e641bff7a6	Allow more options for elasticsearch client (auth, multiple hosts) (#845 ) * allow more options for elasticsearch client (auth, multiple hosts) * Add latest docstring and tutorial changes * fix mypy * Add latest docstring and tutorial changes * test client connection via ping() Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-19 14:29:59 +01:00
Tanay Soni	07907f9eac	Add support for indexing pipelines (#816 )	2021-02-16 16:24:28 +01:00
Branden Chan	7030c94325	Revamp Readme (#820 ) * Text changes * Add new images * First improvements * Next iteration * Resize gif * Add bold * Update key concepts diagram * Center image * Initial import of a more detailed README.md * Slight changes to ToC, requirements and across the text. * Grammar and Streamlit UI png. * Unfix size of gif for mobile * Remove requirements, add formatting to numbered lists. * Formatting, remove img size options. * Another iteration of phrasing the note about open ports. * Rephrase the note about the docker ports. Co-authored-by: Andrey A <56412611+aantti@users.noreply.github.com>	2021-02-16 15:32:43 +01:00
brandenchan	fe47e3a45e	Fix link in documentation	2021-02-15 11:15:54 +01:00
Lalit Pagaria	5bd94ac5f7	Adding Translator (standalone component & wrapper for pipelines) (#782 ) * Adding translator with many generic input parameter support * Making dict_key as generic * Fixing mypy issue * Adding pipeline and using opus models * Add latest docstring and tutorial changes * Adding test cases for end-to-end translation for generator, summerizer etc * raise error join and merge nodes * Fix test failure * add docstrings. add usage documentation. rm skip_special_tokens param * Add latest docstring and tutorial changes * fix code snippets in md * Adding few extra configuration parameters and fixing tests * Fixingmypy issue and updating usage document * fix for mypy issue in pipeline.py * reverting renaming of pytest_collection_modifyitems method * Addressing review comments * setting skip_special_tokens to True * removing model_max_length argument as None type is not supported to many models * Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-12 15:58:26 +01:00
Pavel Soriano	8adf5b4737	Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg (#811 ) * added parameter to infer DPR tokenizers class * Add latest docstring and tutorial changes * Update docstring. fix mypy * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-12 14:17:55 +01:00
Branden Chan	c807f0d050	Add key concepts diagram	2021-02-12 12:49:22 +01:00
Branden Chan	a1983ad84e	Add new images	2021-02-11 15:10:00 +01:00
Tanay Soni	fd5c5dd23c	Introduce incremental updates for embeddings in document stores (#812 )	2021-02-09 21:25:01 +01:00
Malte Pietsch	e91518ee00	Update tutorials (torch versions, ES version, replace Finder with Pipeline) (#814 ) * remove manual torch install on colab * update elasticsearch version everywhere to 7.9.2 * fix FAQPipeline * update tutorials with new pipelines * Add latest docstring and tutorial changes * revert faqpipeline change. fix field names in tutorial 4 * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-09 14:56:54 +01:00
Malte Pietsch	ac9f92466f	Allow custom encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions (#813 ) * fix encoding of pdftotext. fix version in download instructions * fix test * Add latest docstring and tutorial changes * make latin-1 default encoding again * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-09 13:42:43 +01:00
Tanay Soni	7b18e324f2	Fix building Pipeline with YAML (#800 )	2021-02-04 11:53:51 +01:00
Branden Chan	f3a3b73d9b	Choose correct similarity fns during benchmark runs & re-run benchmarks (#773 ) * Adapt to new dataset_from_dicts return signature * rename fn * Align similarity fn in benchmark doc store * Better choice of similarity fn * Increase postgres wait time * Add more expected returned variables * update benchmark results * Fix typo * update all benchmark runs * multiply stats by 100 * Specify similarity fns for website Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-03 11:45:18 +01:00
Tanay Soni	8a5dc8f826	Load Pipeline with YAML config file (#785 )	2021-02-02 17:32:17 +01:00
Malte Pietsch	1318b55eec	Make tqdm progress bars optional (less verbose prod logs) (#796 ) * make dpr queries less verbose * add progress bar flag to more components * Add latest docstring and tutorial changes * add type * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-01 20:51:55 +01:00
Tanay Soni	b87dd244c1	Get metadata values for a key from Elasticsearch (#776 )	2021-02-01 16:13:26 +01:00
Tanay Soni	d62355ca88	Fix mypy typing (#792 )	2021-02-01 12:15:36 +01:00
Branden Chan	1dc74c7067	Add model versioning support (#784 ) * Add model versioning support * Add latest docstring and tutorial changes * Support DPR versioning * Add RAG versioning support * Add latest docstring and tutorial changes * Add summarizer support * Add Embedding Retriever support * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-01 11:42:36 +01:00
Lalit Pagaria	9f7f95221f	Milvus integration (#771 ) * Initial commit for Milvus integration * Add latest docstring and tutorial changes * Updating implementation of Milvus document store * Add latest docstring and tutorial changes * Adding tests and updating doc string * Add latest docstring and tutorial changes * Fixing issue caught by tests * Addressing review comments * Fixing mypy detected issue * Fixing issue caught in test about sorting of vector ids * fixing test * Fixing generator test failure * update docstrings * Addressing review comments about multiple network call while fetching embedding from milvus server * Add latest docstring and tutorial changes * Ignoring mypy issue while converting vector_id to int Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-29 13:29:12 +01:00
brandenchan	6efa4f06c1	Add Streamlit UI Image	2021-01-27 17:01:29 +01:00
Tanay Soni	46307d1571	Remove quotes around placeholders in Elasticsearch custom query (#762 )	2021-01-25 12:46:43 +01:00
Tanay Soni	f0aa879a1c	Fix delete_all_documents for the SQLDocumentStore (#761 )	2021-01-22 14:39:24 +01:00
Markus Paff	aee90c5df9	Docs v0.7.0 (#757 ) * new docs version * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-01-22 10:28:33 +01:00
Markus Paff	0b583b8972	Generate docstrings and deploy to branches to Staging (Website) (#731 ) * test pre commit hook * test status * test on this branch * push generated docstrings and tutorials to branch * fixed syntax error * Add latest docstring and tutorial changes * add files before commit * catch commit error * separate generation from deployment * add deployment process for staging * add current branch to payload Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-01-21 11:01:09 +01:00
Markus Paff	0f62e0b2ee	Script for releasing docs (#736 ) * script for releasing docs * fix formatting	2021-01-21 10:58:54 +01:00

... 10 11 12 13 14

661 Commits