haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-30 08:37:20 +00:00

Author	SHA1	Message	Date
bogdankostic	e2ec0d1c15	feat: FAISS in OpenSearch: check existing index (#3101 ) * Add check for mapping for existing indices * Add test * Check if "method" field exists	2022-08-25 17:33:26 +02:00
Julian Risch	cc9d39c360	increase version to next release candidate (#3100 )	2022-08-25 15:55:34 +02:00
Julian Risch	0950db5032	chore: increase version to 1.7.2 for patch release (#3097 ) * schema update * schema update audio nodes * schema update audio param type v1.7.2	2022-08-25 13:55:28 +02:00
Sebastian	0cf0568dd0	fix: Use use_auth_token in all cases when loading from the HF Hub (#3094 ) * Making sure to pass on use_auth_token to all from_pretrained calls	2022-08-25 10:30:03 +02:00
Sara Zan	e92ea4fccb	refactor: rename `master` into `main` in documentation and links (#3063 ) * master->main * revert master rename * Revert change to sphinx link and rename master schema	2022-08-24 19:05:12 +02:00
tstadel	92046ce5b5	feat: FAISS in OpenSearch: Support HNSW for dot product and l2 (#3029 ) * support faiss hnsw * blacken * update docs * improve similarity check * add tests * update schema * set ef_search param correctly * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * regenerate docs Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 16:43:48 +02:00
James Briggs	9b1b03002f	update to PineconeDocumentStore to remove dependency on SQL db (#2749 ) * update to PineconeDocumentStore to remove dependency on SQL db * Update Documentation & Code Style * typing fixes * Update Documentation & Code Style * fixed embedding generator to yield Documents * Update Documentation & Code Style * fixes for final typing issues * fixes for pylint * Update Documentation & Code Style * uncomment pinecone tests * added new params to docstrings * Update Documentation & Code Style * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * changes based on comments, updated errors and install * Update Documentation & Code Style * mypy * implement simple filtering in pinecone mock * typo * typo in reverse * account for missing meta key in filtering * typo * added metadata filtering to describe index * added handling for users switching indexes in same doc store, and handling duplicate docs in write * syntax tweaks * added index option to document/embedding count calls * labels implementation in progress * added metadata fields to be indexed for pinecone tests * further changes to mock * WIP implementation of labels+multilabels * switched to rely on labels namespace rather than filter * simpler delete_labels * label fixes, remove debug code * Apply dostring fixes Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * pylint * docs * temporarily un-mock Pinecone * Small Pinecone test suite * pylint * Add fake test key to pass the None check * Add again fake test key to pass the None check * Add Pinecone to default docstores and fix filters * Fix field name * Change field name * Change field value * Remove comments * forgot to upgrade pyproject.toml Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sarazanzo94@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 13:27:15 +02:00
Stefano Fiorucci	891707ecaa	bug: handle `Optional` params in schema validation (#2980 ) * not working draft * first draft * fix * revert json schema * better schema * improvements, support different python versions * little simplification * improvements and more tests * Revert "Merge branch 'handle_optional_params' into origin/main" This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34. * fix git mess * handle optional params; schema * test null values Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-24 10:40:19 +02:00
Ofek Lev	f6a4a14790	refactor: update package metadata (#3079 ) * Update package metadata * fix yaml * remove Python version cap * address review	2022-08-24 09:46:21 +02:00
Branden Chan	6d4031d8f6	Add OpenAI Answer Generator API (#3050 ) * Add OpenAI Answer Generator API * Regen tutorials * Regen md files * Incorporate reviewer feedback * Incorporate reviewer feedback * Incorporate reviewer feedback * Incorporate reviewer feedback	2022-08-24 09:20:08 +02:00
Malte Pietsch	76af0444cc	feat: add progressbar to upload_files() of deepset Cloud client (#3069 )	2022-08-23 20:51:08 +02:00
Sebastian	3ea57801ae	feat: Early stopping can be used in Reader and Retriever training (#3071 ) * Add option to set early stopping in training * Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.	2022-08-23 14:18:12 +02:00
bogdankostic	b03de53716	Use `random_sample` instead of `ndarray` for random array (#3083 )	2022-08-22 13:19:45 +02:00
Daniel Bichuetti	149224fe3a	fix: Crawler quits ChromeDriver on destruction (#3070 ) * Close Chrome and Selenium WebDriver on destruction * Fix failed pre-commit hook	2022-08-22 13:08:16 +02:00
Daniel Bichuetti	d715d0202d	fix: update ChromeDriver options on restricted environments and add ChromeDriver options as function parameter (#3043 ) * Fix when env does nto exist * Fix missed line * Set conservative chromedriver options * Set default options based on environment * Fix removed line * Updated documentation * Generate new schemas manually * Add arguments via iterator and helper function * Pre-push doc format * Use imported Option vs full namespace access * Manually update schema * Manually add documentation and schema * Fix language and documentation * Fix typo * Auto generated docs * Updated documentation	2022-08-22 12:59:33 +02:00
David G	e715dee17d	docs:fixed typo (or old documentation) in ipynb tutorial 3 (#3033 ) * Update Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb Just fixed the key in the document dictionary format so `write_documents()` won't raise an error. By the way the `write_documents()` error is really explicative * Run convert_notebooks_into_webpages.py Co-authored-by: David Gervasoni <david.gervasoni@trix.ai>	2022-08-22 12:56:30 +02:00
Massimiliano Pippi	97a8d30512	feat: Allow exact list matching with field in Elasticsearch filtering (#2988 ) * ES filtering - allow exact list matching with field typing fix Update Documentation & Code Style remove default hit limit in filtering queries Update Documentation & Code Style pytest es list eq filter Update Documentation & Code Style * review feedback * fixed test Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>	2022-08-22 12:42:37 +02:00
Daniel Bichuetti	d5e36ce6b4	fix(translator): write translated text to output documents, while keeping input untouched (#3077 ) * Set translated text on a copy of original document * Return new translated list * Manually generated docs TODO: check pre-commit * Hook generated file * Rename variables for better maintenance * fix(translator): prevent inputs from being changed * fix: manual update translator docs * style(translator): explicit type declaration on List * docs(translator): re-run pre-commit hook * style(translator): ignore mypy wrong type check * docs(translator): re-run pre-commit hook	2022-08-22 04:07:05 -04:00
Julian Risch	bc6f71b5ba	chore: increase version to next release candidate (#3067 ) * increase version to next release candidate * generate schema files	2022-08-19 14:49:50 +02:00
Julian Risch	eb0f0da0fd	Prepare 1.7.1 release (#3061 ) * prepare 1.7.1 release * Fix schemas * Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * change back main to master * remove newline at end of file * generate schema file with no newline Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> v1.7.1	2022-08-19 13:24:40 +02:00
Vladimir Blagojevic	be127e5b61	Trigger build failure Slack notify only on main repo (not forks) (#3039 ) Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-08-18 06:51:39 -04:00
Massimiliano Pippi	af24ffae55	feat: take the list of models to cache instead of hardcoding one (#3060 ) * take the list of models to cache as an input * let nltk find the cache dir on its own	2022-08-18 11:55:29 +02:00
tstadel	1027ab3624	Bump Version to 1.7.1rc (#3041 ) * bump version to 1.7.1rc * update openapi	2022-08-18 10:31:57 +02:00
James Briggs	82c9cff3d9	test: update filtering of Pinecone mock to imitate doc store (#3020 ) * updated filtering of doc store to imitate pinecone * Update test/mocks/pinecone.py	2022-08-18 09:57:08 +02:00
Sebastian	74b7c2c12a	Pin pyworld to <=0.2.12 (#3047 )	2022-08-17 08:11:28 +02:00
Massimiliano Pippi	2328097ce0	rename the default branch name (#3045 )	2022-08-16 20:24:58 +02:00
Tuana Celik	2298155a20	changing Slack to Discord (#3040 ) * changing Slack to Discord * Update README.md * updating contributing	2022-08-15 15:56:16 +03:00
tstadel	baefd32b6f	Upgrade to v1.7.0 and copy docs folder (#3014 ) * update version to 1.7.0 * copy docs * update openapi * generate schemas * make update_json_schema() idempotent * update docs, schema and openapi v1.7.0	2022-08-15 14:20:30 +02:00
Julian Risch	d61755322f	chore: fix typo in API docs (#3023 ) * chore: fix typo in API docs * fix openapi Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-08-15 13:25:20 +02:00
tstadel	0aa0c68785	Fix broken `MultiLabel` serialization (#3037 ) * Fix MultiLabel serialization * update docs * better comment * remove unused imports * remove unused imports (2)	2022-08-15 13:09:18 +02:00
Branden Chan	ff38a20863	docs: update File Classifier Docstring (#3018 ) * Update docstring * Trigger pre-commit hook * Trigger pre-commit hook * Incorporate reviewer feedback * Incorporate reviewer feedback	2022-08-15 12:37:28 +02:00
Branden Chan	7312f99584	Update Summarizer Docs (#3032 ) * Change text to content * Change text to content	2022-08-15 12:35:41 +02:00
bogdankostic	3a849d6c07	bug: Make `TranslationWrapperPipeline` work with `QuestionAnswerGenerationPipeline` (#3034 ) * Overwrite output_translator's run method with run_batch * Fix mypy * Revert change * Overwrite run method only with QuestionAnswerGenerationPipeline	2022-08-15 10:05:34 +02:00
Malte Pietsch	1b422ab657	feat: Enable isolated node eval for answer generator nodes (incl. OpenAI Node) (#3036 ) * enable isolated node eval for answer generator nodes * adjust comment * remove unused import * fix mypy Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-08-14 12:11:23 +02:00
Stefano Fiorucci	4f261a4575	docs: extend tutorial14 about query classification (#3013 ) * first draft for tutorial extension * forgotten markdown * improved tutorial * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * add markdown * first draft for tutorial extension * forgotten markdown * improved tutorial * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * add markdown * little corrections * little corrections and add py tutorial * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * update tutorial webpage * fix typo Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-08-12 17:59:47 +02:00
Igor Tarlinskiy	5b06658670	Forbid the key `id` from `Document`s to be written in `WeaviateDocumentStore` (#2846 ) * Raise error upon duplicate document key found within meta info * value error msg fix * Update Documentation & Code Style * Raise exception instead of asserting * Update Documentation & Code Style * add test	2022-08-12 17:50:54 +02:00
Dmitry Goryunov	da7836a931	feat: Support embedding dimensions on DeepsetCloudDocumentStore (#2995 ) * Add embedding_dim to dc store * Remove similarity from query params, it is not used * Remove unused `return_embedding` parameter * Remove unused param * Update the documentation * Update schemas * Revert openapi changes * Revert openapi changes * Fix openapi * Fix json schema * Improve docstrings Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Improve logs Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update the docs * Fix similarity Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-12 11:46:52 +02:00
tstadel	c0fbe45c02	feat: Add `delete_all_files()` to `FileClient` (#3025 ) * add delete_all_files() * rename `file` to `files` * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * streamline "If set to None" and "to the API call" Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-12 11:20:30 +02:00
tstadel	668fd548a6	Fix `embeddings_field_supports_similarity` of `OpenSearchDocumentStore` when creating index (#3030 ) * fix embeddings_field_supports_similarity when creating index * fix test	2022-08-12 11:19:59 +02:00
James Briggs	26c938a8e6	test: add meta fields for meta_config to be used during testing (#3021 ) * added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore * Add documentation on metadata filtering in docstring * docs Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-12 10:27:56 +02:00
bogdankostic	81a5949103	ci: Increase Weaviate's disk usage + print docker logs (#3026 )	2022-08-11 18:13:43 +02:00
Sebastian	44e2b1beed	Resolving issue 2853: no answer logic in FARMReader (#2856 ) * Update FARMReader.eval_on_file to be consistent with FARMReader.eval * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-08-11 16:45:03 +02:00
Sara Zan	fc8ecbf20c	Move `azure-core` pin into the dev dependency list (#3022 )	2022-08-11 15:16:43 +02:00
Zoltan Fedor	408d8e6ff5	Enable the `JoinDocuments` node to work with documents with `score=None` (#2984 ) * Enable the `JoinDocuments` node to work with documents with `score=None` This fixes #2983 As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate. THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`. There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436 This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried. The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default. * Fixing test bug * Addressing PR review comments - Extending unit tests - Simplifying logic * Making the sorting work even with no scores By making the no score being sorted as -Inf * Forgot to commit the change in `join_docs.py` * [EMPTY] Re-trigger CI * Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None` * Adjusting the arguments of `any()` * [EMPTY] Re-trigger CI	2022-08-11 10:43:25 +02:00
Massimiliano Pippi	2cd65e99b8	revert Remove pipes (#3006 )	2022-08-11 10:42:22 +02:00
Zoltan Fedor	aafa017c17	Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run` (#2981 ) * Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run` This is to fix #2968 * Bug: variable `i` was already in use * Removing unused imports * Removing unused import * [EMPTY] Re-trigger CI * Addressing concerns raised pre-review - Removing the attempt to try to make it without the need for `JoinDocuments` - it is okey to fail without `JoinDocuments` for certain pipelines. * Refactoring based on reviews	2022-08-11 09:50:14 +02:00
Zoltan Fedor	f4128d3581	Adding support for additional distance/similarity metrics for Weaviate (#3001 ) * Adding support for additional distance metrics for Weaviate Fixes #3000 * Updating the docs * Fixing error texts * Fixing issues raised by the review * Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity` * [EMPTY] Re-trigger CI * Fixing things based on review * [EMPTY] Re-trigger CI	2022-08-11 09:48:21 +02:00
Florian Hardow	0b39ce6431	fetch experiment run results from dc (#2960 ) * feat: fetch results for DeepsetCloudExperiments * chore: test DC fetch predicitons for eval run * chore: switch to dict iteration with .items() * chore: update DC url to fetch predictions from * chore: update doc strings for fetching eval run results * chore: update DeepsetCloudExperiments description, change function names for fetching predictions of an eval run * chore: test for DeepsetCloudExperiments.get_run_results * chore: adjust request mock for test_get_eval_run_results * chore: push first row of dataframe into variable for test checks * chore: adjust mock data to correct data types * chore: make documentation more readable with line breaks * chore: update documentation for eval run result fetching	2022-08-10 15:02:36 +02:00
Stefano Fiorucci	5778b6f9e9	fix run_batch unbound error (#3016 )	2022-08-10 12:59:15 +02:00
James Briggs	5d4e3bd7ca	convert to set so not relying on correct order (#3015 )	2022-08-10 12:57:31 +02:00

1 2 3 4 5 ...

1491 Commits