haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-06 07:38:46 +00:00

Author	SHA1	Message	Date
Guillim	0051a34ff9	Add root_path option to REST API for reverse proxy deployments (#982 )	2021-04-20 11:19:28 +02:00
oryx1729	4dd5a7a744	Make FAISS import optional (#971 )	2021-04-15 12:26:34 +02:00
oryx1729	237172f459	Make FAISS import conditional (#970 )	2021-04-14 17:34:01 +02:00
Mario Jäckle	84f90e82c5	feature(aws): add aws iam auth method (#965 ) Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>	2021-04-14 16:34:24 +02:00
oryx1729	5bb66940a9	Fix equality check in preprocessor (#969 )	2021-04-14 16:03:48 +02:00
Markus Paff	0633dae4d0	new docs version (#964 )	2021-04-14 13:40:05 +02:00
oryx1729	bba1d80aef	Update Haystack version v0.8.0	2021-04-13 16:31:19 +02:00
Branden Chan	77d4c2ca1c	Benchmark milvus (#850 ) * Add milvus benchmarking support * Add latest docstring and tutorial changes * Edit config * Disable docker interactive mode * Add milvus index type support * Adjust FAISS and Milvus node branching * Remove duplicate in config * Revert method for speedup * Add latest docstring and tutorial changes * Add latest benchmark run * Add latest docstring and tutorial changes * Add json files * Revert "Add latest docstring and tutorial changes" This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923. * Add latest docstring and tutorial changes * Revert "Add latest docstring and tutorial changes" This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b. * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-13 14:54:15 +02:00
Markus Paff	b87daed62b	fixed link to dpr (#962 )	2021-04-13 09:45:04 +02:00
Julian Risch	8333a13d6f	Adding tutorial on knowledge graphs to README	2021-04-12 15:26:02 +02:00
Markus Paff	dfb0282b74	Update milvus links and docstrings (#959 ) * update milvus links and docstrings * Add latest docstring and tutorial changes * new milvus version * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-12 14:38:57 +02:00
oryx1729	406f7fa679	Disable Gunicorn preload option (#960 )	2021-04-12 12:46:52 +02:00
Timo Moeller	837dea4e6d	Integrate sentence transformers into benchmarks (#843 ) * Integrate sentence transformers into benchmarks * Add doc store asserts * switch data downloads from s3 client to https. add license info * Fix mypy, revert config Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-09 17:24:16 +02:00
Julian Risch	d38c07e0ee	knowledge graph example (#934 ) * Add knowledge graph module * Fix type hint * Add graph retriver module * Change type annotations, change return format * Add graph retriever that executes questions as sparql queries * Linking only those entities that are in the knowledge graph * Added logging and using relations extracted from Knowledge graph for linking * Preventing entity linking from linking the same token to multiple entities * Pruning triples that have no variables for select and count queries * Support knowledge graphs with Pipelines * Add text2sparql * Entity linking and relation linking consider more special cases now based on evaluation on labelled data * Separating example code from KGQA implementation * Add eval on combined extarctive and kg questions * Remove references to hp-test * Add fields sparql_query and long_answer_list to metadata * Removing modular Question2SPARQL approach * Removing additional classes used for modular kgqa approach * preparing lcquad data * change graph db * Translating namespaces in knowledge graph queries * Creating graphdb index and loading triples from .ttl file * Fetching graph config files, triples and model from S3 * Fix incompatibility issues with BaseGraphRetriever and BaseComponent * Removing unused utility functions * Adding doc strings and tutorial header * Adding sparqlwrapper dependency * Moving tutorial header * Sorting tutorials by number within name of notebook * Add latest docstring and tutorial changes * Creating test cases for knowledge graph * Changing knowledge graph example to harry potter * Add latest docstring and tutorial changes * Adapting the tutorial notebook to harry potter example * Add GraphDB fixture for tests * Add latest docstring and tutorial changes * Added GraphDB docker launch to CI * Use correct GraphDB fixture * Check if GraphDB instance is already running * Renaming question/query and incorporating other feedback from Timo and Tanay * Removed type annotation * Add latest docstring and tutorial changes Co-authored-by: oryx1729 <oryx1729@protonmail.com> Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-08 14:05:33 +02:00
oryx1729	fc6368c191	Fix passing a list of values as param (#952 )	2021-04-07 19:50:50 +02:00
oryx1729	8c68699e1c	Refactor REST APIs to use Pipelines (#922 )	2021-04-07 17:53:32 +02:00
Julian Risch	64ad953c6a	Adding indentation to markup files (#947 )	2021-04-07 11:36:11 +02:00
lewtun	8894c4fae9	Reduce precision in pipeline eval print functions (#943 ) A proposal to reduce the precision shown in the `EvalRetriever.print` and `EvalReader.print` to 4 significant figures. If the user wants the full precision, they can access the class attributes directly. Before ``` Retriever ----------------- has_answer recall: 0.8739495798319328 (208/238) no_answer recall: 1.00 (120/120) (no_answer samples are always treated as correctly retrieved) recall: 0.9162011173184358 (328 / 358) ``` After ``` Retriever ----------------- has_answer recall: 0.8739 (208/238) no_answer recall: 1.00 (120/120) (no_answer samples are always treated as correctly retrieved) recall: 0.9162 (328 / 358) ```	2021-04-06 05:11:29 +02:00
lewtun	41a1c8329d	Fix division by zero error in EvalRetriever (#938 ) If the first query in the evaluation returns a document with `no_answer=True` we got a division by zero error because neither `self.has_answer_correct` or `self.has_answer_count` get incremented. This fix moves the `self.has_answer_recall` calculation within the if-else block.	2021-04-03 18:13:36 +02:00
Timo Moeller	5d2b16f3cc	Update farm version (#936 ) * Update farm version * Add new DPR loading, fix dpr param name * Add QA model confidence as answer probability, fix prams in test	2021-04-01 18:23:05 +02:00
Branden Chan	d77152c469	WIP: Add evaluation nodes for Pipelines (#904 ) * Add main eval fns * WIP: make pipeline_eval.py run * Fix typo * Add support for no_answers * Add latest docstring and tutorial changes * Working pipeline eval * Add timing of nodes * Add latest docstring and tutorial changes * Refactor and clean * Update tutorial script * Set default params * Update tutorials * Fix indent * Add latest docstring and tutorial changes * Address mypy issues * Add test * Fix mypy error * Clear outputs * Add doc strings * Incorporate reviewer feedback * Add latest docstring and tutorial changes * Revert query counting * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-01 17:35:18 +02:00
lewtun	32050fdce3	Add Milvus to the retriever / document store table (#931 )	2021-03-29 09:53:26 +02:00
Guillim	55b7a820d4	Fixing inconsistency (#926 ) Fixing inconsistency between pipe and p in the doc	2021-03-26 18:55:02 +01:00
Timo Moeller	1244d16010	Better default value for mp chunksize (#923 ) * Better default value for mp chunksize * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-25 19:00:45 +01:00
Peter Adorjan	cafa1230da	Warning instead of Exception in FAISS and Milvus filtering (#913 )	2021-03-23 17:49:47 +01:00
Lalit Pagaria	e904deefa7	Add Markdown file convertor (#875 )	2021-03-23 16:31:26 +01:00
Moshe Berchansky	47dc069afe	Fix for allocate memory exception by specifing max_processes (#910 )	2021-03-19 18:11:25 +01:00
Timo Moeller	f954f0db38	Fix top_k param in RAG tutorials (#906 ) * Fix top_k param * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-18 18:00:21 +01:00
Branden Chan	26093452a4	Add code of conduct	2021-03-18 16:39:16 +01:00
Timo Moeller	7b559fa4e8	Improve dpr conversion (#826 ) * Bugfix dpr conversion * Add latest docstring and tutorial changes * Fix preprocessor changes	2021-03-18 14:51:01 +01:00
oryx1729	e9f0076dbd	Fix execution of Pipelines with parallel nodes (#901 )	2021-03-18 12:41:30 +01:00
Branden Chan	24d0c4d42d	Fix DPR training batch size (#898 ) * Adjust batch size * Add latest docstring and tutorial changes * Update training results * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-03-17 18:33:59 +01:00
Peter Demin	992277e812	Run Grammarly over README.md (#890 ) * Run Grammarly over README.md * Update README.md Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com> * Update README.md Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com> * Update README.md Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com> * Update README.md Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com> * Update README.md Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com> * Update README.md Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com> * Update README.md Co-authored-by: Andrey A. <56412611+aantti@users.noreply.github.com>	2021-03-16 18:00:57 +03:00
Mohamed Sayed	9ec2406a05	Remove broken tf-idf youtube link (#888 ) The youtube link is of a deleted video.	2021-03-11 14:23:05 +01:00
Malte Pietsch	91007c15dc	Add abstract run() method to Basecomponent (#887 )	2021-03-11 12:47:10 +01:00
oryx1729	e0a118fd9a	Add support for parallel paths in Pipeline (#884 )	2021-03-10 18:17:23 +01:00
oryx1729	6d00eff796	Add PDF converter in Dockerfiles (#877 )	2021-03-08 09:55:11 +01:00
Malte Pietsch	81b83293c0	Update docker-compose.yml	2021-03-05 10:55:36 +01:00
Eric Lam	5484b8883b	Fix error when is_impossible not exist (#870 )	2021-03-04 18:42:42 +01:00
oryx1729	f3fb9aacce	Fix validation for `split_respect_sentence_boundary` in Preprocessor (#869 )	2021-03-04 15:09:08 +01:00
oryx1729	4b188b8102	Add runtime parameters to component initialization (#873 )	2021-03-04 12:18:12 +01:00
Paul Klyvis	1b609114b8	Fix elasticsearch auth modes (#871 ) Co-authored-by: Paulius Klyvis <paul@convious.com>	2021-03-02 16:24:31 +01:00
Eric Lam	db75498278	Fix error when is_impossible not is_impossible and json dump encoding error (#868 ) * Fix error when is_impossible not is_impossible and json dump encoding in multilingual data Fixing #867 * Fix file encoding, all file open with utf-8	2021-03-02 13:54:58 +01:00
Malte Pietsch	762f194b27	Fix boolean `progress_bar` for disabling tqdm progressbar (#863 )	2021-02-26 10:49:31 +01:00
Branden Chan	325a4e4d14	Add Milvus Documentation (#838 ) * First commit * Add latest docstring and tutorial changes * Add DocStore external setup info * fixed tabs * Add Milvus recommendation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>	2021-02-24 11:43:40 +01:00
venuraja79	e930d8a717	Annotation Tool: data is not persisted when using local version #853 (#855 )	2021-02-21 15:35:45 +01:00
Tu NGUYEN	ba91a90dd6	Fix download ntlk preprocessor (#852 )	2021-02-21 10:17:50 +01:00
Malte Pietsch	e641bff7a6	Allow more options for elasticsearch client (auth, multiple hosts) (#845 ) * allow more options for elasticsearch client (auth, multiple hosts) * Add latest docstring and tutorial changes * fix mypy * Add latest docstring and tutorial changes * test client connection via ping() Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-19 14:29:59 +01:00
Divya Yeruva	6c3ec540a4	Add crawler to get texts from websites (#775 ) * add fetch_data_from_url to extract data and store as files * corrected a typo * corrected variable name error * correction of urlparse error * type error * added selenium, urllib to requirements * removed urllib * minor changes and added function to find out inpage navigation links * quick duplicate links fix * quick type annotation fix * created seperate module for crawler * type error fix * type error fix * import fix * quick type error fix * addee return description * updated include type to list * refactor modules. Add Crawler class. rename params. * add basic pipeline compatibility * update docstrings * fix mypy issues * update args, docstrings, return filepaths * fix mypy * make urls optional in init Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-18 12:00:49 +01:00
Malte Pietsch	d700592c9a	Update GPU Dockerimage (Cuda 11, Fix faiss)(#836 )	2021-02-17 12:40:00 +01:00

... 59 60 61 62 63 ...

3597 Commits