haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-22 20:49:46 +00:00

Author	SHA1	Message	Date
Daniel Bichuetti	621e1af74c	refactor: improve support for dataclasses (#3142 ) * refactor: improve support for dataclasses * refactor: refactor class init * refactor: remove unused import * refactor: testing 3.7 diffs * refactor: checking meta where is Optional * refactor: reverting some changes on 3.7 * refactor: remove unused imports * build: manual pre-commit run * doc: run doc pre-commit manually * refactor: post initialization hack for 3.7-3.10 compat. TODO: investigate another method to improve 3.7 compatibility. * doc: force pre-commit * refactor: refactored for both Python 3.7 and 3.9 * docs: manually run pre-commit hooks * docs: run api docs manually * docs: fix wrong comment * refactor: change no type-checked test code * docs: update primitives * docs: api documentation * docs: api documentation * refactor: minor test refactoring * refactor: remova unused enumeration on test * refactor: remove unneeded dir in gitignore * refactor: exclude all private fields and change meta def * refactor: add pydantic comment * refactor : fix for mypy on Python 3.7 * refactor: revert custom init * docs: update docs to new pydoc-markdown style * Update test/nodes/test_generator.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-09-09 11:31:37 +02:00
Daniel Bichuetti	1a6cbca9b6	feat: add health check endpoint to rest api (#3168 ) * feat: add /health endpoint to rest api * refactor: adjust to new dir structure * fix: add new rest api dependency * docs: add new openapi schema * docs: manual black run * refactor: remove some sys-wide details * docs: minor description changes * docs: minor description changes * docs: generate openapi schemas * tests: improved tests * refactor: add cls method decorator	2022-09-08 18:24:16 +02:00
Vladimir Blagojevic	84acb6584f	Type all parameter constructors, add model_version optional parameter where applicable (#3152 )	2022-09-06 05:05:42 -04:00
Daniel Bichuetti	e1f399284f	refactor: update dependencies and remove pins (#3147 ) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version	2022-09-05 14:30:35 +02:00
Branden Chan	d4722c2ec5	Document FARMReader.train() evaluation report log level (#3129 ) * Mention evaluation report logging level * Mention evaluation report logging level	2022-09-01 10:58:47 +02:00
Vladimir Blagojevic	356537c883	Standardize devices parameter and device initialization (#3062 ) * Use devices parameter and initialize devices consistently	2022-08-31 15:30:31 +02:00
Julian Risch	f010a17f04	increase version to next release candidate (#3115 )	2022-08-29 17:05:44 +02:00
Julian Risch	4e518cdddd	chore: increase version for 1.8 release (#3109 ) * increase version for 1.8 release * ignore missing-timeout for pylint	2022-08-26 15:00:14 +02:00
Julian Risch	3e3ff33cdd	feat: add batch evaluation method for pipelines (#2942 ) * add basic pipeline.eval_batch for qa without filters * black formatting * pydoc-markdown * remove batch eval tests failing due to bugs * remove comment * explain commented out tests * avoid code duplication * black * mypy * pydoc markdown * add batch option to execute_eval_run * pydoc markdown * Apply documentation suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply documentation suggestion from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * add documentation based on review comments * black * black * schema updates * remove duplicate tests * add separate method for column reordering * merge _build_eval_dataframe methods * pylint ignore in function * change type annotation of queries to list only * one-liner addressing review comment on params dict * markdown files updated Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-25 17:50:57 +02:00
Julian Risch	cc9d39c360	increase version to next release candidate (#3100 )	2022-08-25 15:55:34 +02:00
Julian Risch	0950db5032	chore: increase version to 1.7.2 for patch release (#3097 ) * schema update * schema update audio nodes * schema update audio param type	2022-08-25 13:55:28 +02:00
Sebastian	0cf0568dd0	fix: Use use_auth_token in all cases when loading from the HF Hub (#3094 ) * Making sure to pass on use_auth_token to all from_pretrained calls	2022-08-25 10:30:03 +02:00
Sara Zan	e92ea4fccb	refactor: rename `master` into `main` in documentation and links (#3063 ) * master->main * revert master rename * Revert change to sphinx link and rename master schema	2022-08-24 19:05:12 +02:00
tstadel	92046ce5b5	feat: FAISS in OpenSearch: Support HNSW for dot product and l2 (#3029 ) * support faiss hnsw * blacken * update docs * improve similarity check * add tests * update schema * set ef_search param correctly * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * regenerate docs Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 16:43:48 +02:00
James Briggs	9b1b03002f	update to PineconeDocumentStore to remove dependency on SQL db (#2749 ) * update to PineconeDocumentStore to remove dependency on SQL db * Update Documentation & Code Style * typing fixes * Update Documentation & Code Style * fixed embedding generator to yield Documents * Update Documentation & Code Style * fixes for final typing issues * fixes for pylint * Update Documentation & Code Style * uncomment pinecone tests * added new params to docstrings * Update Documentation & Code Style * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * changes based on comments, updated errors and install * Update Documentation & Code Style * mypy * implement simple filtering in pinecone mock * typo * typo in reverse * account for missing meta key in filtering * typo * added metadata filtering to describe index * added handling for users switching indexes in same doc store, and handling duplicate docs in write * syntax tweaks * added index option to document/embedding count calls * labels implementation in progress * added metadata fields to be indexed for pinecone tests * further changes to mock * WIP implementation of labels+multilabels * switched to rely on labels namespace rather than filter * simpler delete_labels * label fixes, remove debug code * Apply dostring fixes Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * pylint * docs * temporarily un-mock Pinecone * Small Pinecone test suite * pylint * Add fake test key to pass the None check * Add again fake test key to pass the None check * Add Pinecone to default docstores and fix filters * Fix field name * Change field name * Change field value * Remove comments * forgot to upgrade pyproject.toml Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sarazanzo94@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 13:27:15 +02:00
Branden Chan	6d4031d8f6	Add OpenAI Answer Generator API (#3050 ) * Add OpenAI Answer Generator API * Regen tutorials * Regen md files * Incorporate reviewer feedback * Incorporate reviewer feedback * Incorporate reviewer feedback * Incorporate reviewer feedback	2022-08-24 09:20:08 +02:00
Sebastian	3ea57801ae	feat: Early stopping can be used in Reader and Retriever training (#3071 ) * Add option to set early stopping in training * Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.	2022-08-23 14:18:12 +02:00
Daniel Bichuetti	d715d0202d	fix: update ChromeDriver options on restricted environments and add ChromeDriver options as function parameter (#3043 ) * Fix when env does nto exist * Fix missed line * Set conservative chromedriver options * Set default options based on environment * Fix removed line * Updated documentation * Generate new schemas manually * Add arguments via iterator and helper function * Pre-push doc format * Use imported Option vs full namespace access * Manually update schema * Manually add documentation and schema * Fix language and documentation * Fix typo * Auto generated docs * Updated documentation	2022-08-22 12:59:33 +02:00
Daniel Bichuetti	d5e36ce6b4	fix(translator): write translated text to output documents, while keeping input untouched (#3077 ) * Set translated text on a copy of original document * Return new translated list * Manually generated docs TODO: check pre-commit * Hook generated file * Rename variables for better maintenance * fix(translator): prevent inputs from being changed * fix: manual update translator docs * style(translator): explicit type declaration on List * docs(translator): re-run pre-commit hook * style(translator): ignore mypy wrong type check * docs(translator): re-run pre-commit hook	2022-08-22 04:07:05 -04:00
Julian Risch	bc6f71b5ba	chore: increase version to next release candidate (#3067 ) * increase version to next release candidate * generate schema files	2022-08-19 14:49:50 +02:00
Julian Risch	eb0f0da0fd	Prepare 1.7.1 release (#3061 ) * prepare 1.7.1 release * Fix schemas * Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * change back main to master * remove newline at end of file * generate schema file with no newline Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-19 13:24:40 +02:00
tstadel	1027ab3624	Bump Version to 1.7.1rc (#3041 ) * bump version to 1.7.1rc * update openapi	2022-08-18 10:31:57 +02:00
tstadel	baefd32b6f	Upgrade to v1.7.0 and copy docs folder (#3014 ) * update version to 1.7.0 * copy docs * update openapi * generate schemas * make update_json_schema() idempotent * update docs, schema and openapi	2022-08-15 14:20:30 +02:00
Julian Risch	d61755322f	chore: fix typo in API docs (#3023 ) * chore: fix typo in API docs * fix openapi Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-08-15 13:25:20 +02:00
tstadel	0aa0c68785	Fix broken `MultiLabel` serialization (#3037 ) * Fix MultiLabel serialization * update docs * better comment * remove unused imports * remove unused imports (2)	2022-08-15 13:09:18 +02:00
Branden Chan	ff38a20863	docs: update File Classifier Docstring (#3018 ) * Update docstring * Trigger pre-commit hook * Trigger pre-commit hook * Incorporate reviewer feedback * Incorporate reviewer feedback	2022-08-15 12:37:28 +02:00
Branden Chan	7312f99584	Update Summarizer Docs (#3032 ) * Change text to content * Change text to content	2022-08-15 12:35:41 +02:00
bogdankostic	3a849d6c07	bug: Make `TranslationWrapperPipeline` work with `QuestionAnswerGenerationPipeline` (#3034 ) * Overwrite output_translator's run method with run_batch * Fix mypy * Revert change * Overwrite run method only with QuestionAnswerGenerationPipeline	2022-08-15 10:05:34 +02:00
Dmitry Goryunov	da7836a931	feat: Support embedding dimensions on DeepsetCloudDocumentStore (#2995 ) * Add embedding_dim to dc store * Remove similarity from query params, it is not used * Remove unused `return_embedding` parameter * Remove unused param * Update the documentation * Update schemas * Revert openapi changes * Revert openapi changes * Fix openapi * Fix json schema * Improve docstrings Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Improve logs Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update the docs * Fix similarity Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-12 11:46:52 +02:00
James Briggs	26c938a8e6	test: add meta fields for meta_config to be used during testing (#3021 ) * added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore * Add documentation on metadata filtering in docstring * docs Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-12 10:27:56 +02:00
Sebastian	44e2b1beed	Resolving issue 2853: no answer logic in FARMReader (#2856 ) * Update FARMReader.eval_on_file to be consistent with FARMReader.eval * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-08-11 16:45:03 +02:00
Zoltan Fedor	408d8e6ff5	Enable the `JoinDocuments` node to work with documents with `score=None` (#2984 ) * Enable the `JoinDocuments` node to work with documents with `score=None` This fixes #2983 As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate. THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`. There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436 This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried. The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default. * Fixing test bug * Addressing PR review comments - Extending unit tests - Simplifying logic * Making the sorting work even with no scores By making the no score being sorted as -Inf * Forgot to commit the change in `join_docs.py` * [EMPTY] Re-trigger CI * Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None` * Adjusting the arguments of `any()` * [EMPTY] Re-trigger CI	2022-08-11 10:43:25 +02:00
Massimiliano Pippi	2cd65e99b8	revert Remove pipes (#3006 )	2022-08-11 10:42:22 +02:00
Zoltan Fedor	f4128d3581	Adding support for additional distance/similarity metrics for Weaviate (#3001 ) * Adding support for additional distance metrics for Weaviate Fixes #3000 * Updating the docs * Fixing error texts * Fixing issues raised by the review * Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity` * [EMPTY] Re-trigger CI * Fixing things based on review * [EMPTY] Re-trigger CI	2022-08-11 09:48:21 +02:00
bogdankostic	5c3bfad078	feat: Add page number to Documents coming from PDFConverters and PreProcessor (#2932 ) * Add page number to Documents coming from PDFConverters and PreProcessor * Fix mypy * Update API Docs * Update API Docs * Remove unused imports * Generate JSON schema * Generate JSON schema * Make test variable shorter * Make regex a separate function * Move counting of page breaks to a function * Generate JSON schema * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update API Documentation * Don't create instance for testing staticmethod * Update haystack/nodes/preprocessor/preprocessor.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-09 15:55:27 +02:00
Branden Chan	dfeb171686	Add API page for util functions (#2863 ) * Clean OpenAIAnswerGenerator docstrings * Incorporate reviewer feedback * Update Documentation & Code Style * Improve id_hash_keys description * Simplify id_hash_keys description * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-08-09 14:53:45 +02:00
Stefano Fiorucci	4a63484916	feat: Extend `TransformersQueryClassifier`: clean version (#2965 ) * extend query classifier in one commit * variable number of outgoing edges * improve tests * fix unused import * lightweight approach * fix _calculate_outgoing_edges * remove duplicate label validation * Remove print	2022-08-09 09:43:33 +02:00
MichelBartels	c91316e862	feat: add gradient accumulation in FARMReader (#2925 ) * expose gradient accumulation to train function of FARMReader * add documentation for gradient accumulation * Update Documentation & Code Style * doc string improvements Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * doc string improvements Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * doc string improvements Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-08 18:42:21 +02:00
Vladimir Blagojevic	d1f8b7118c	Add progress bar to batch run component ops (#2864 ) * Add progress bar to batch run component ops * Update docs * Update schema * PR review: thanks Bogdan	2022-08-08 09:32:44 -04:00
Sara Zan	1a0a4c8836	Remove pipes from code block (#2973 ) * Remove pipes * Generate md	2022-08-05 19:18:57 +02:00
Vladimir Blagojevic	4f8d11c591	Update Seq2SeqGenerator API documentation (#2970 ) * Seq2SeqGenerator - update API docs	2022-08-05 17:39:23 +02:00
Vladimir Blagojevic	762a12fcb1	Print eval reports improvements (#2941 )	2022-08-04 11:21:27 -04:00
Bilge Yücel	489699bd98	Fix docs code format for sentence transformers (#2957 ) Co-authored-by: bilge4 <bilge@techwolf.ai>	2022-08-04 12:31:42 +02:00
Vladimir Blagojevic	368828fd4a	Component batch_size should be defined rather than Optional (#2958 ) * Ensure batch_size for components is defined rather than Optional * PR review - update schema	2022-08-04 12:20:28 +02:00
Francesco Castelli	1b238c880b	Generalize <sep>, <pad> and </s> tokens of QuestionGenerator node (#2769 ) * fixed tokens in question generation * simplified assignment * same behavior also for pad and eos * use skip_special_tokens in batch_decode * fixed black error and update docs * fixed schemas ci error * JSON schemas * Add git diff to debug schema issues * opensearch schema was missing * Add missing instruction in the workflow error message * typo	2022-08-03 18:51:34 +02:00
Zoltan Fedor	1e20818328	Ability to run Ray Serve detached (#2945 ) * Ability to run Ray Serve detached Fixes #2944 Ability to run Ray Serve detached - to allow running multiple instances of the app (HA). See https://docs.ray.io/en/latest/serve/package-ref.html#core-apis * Generating the docs * Re-trigger the CI pipeline * Retrigger the CI Pipeline * Typo in docstrings * Fixing docstring and typing issues * Regenerating docs * [EMPTY] Re-trigger CI * [EMPTY] Re-trigger CI * Refactoring to allow any number of args for the `serve.start()` method There seems to be additional arguments of the `serve.start()` method, so we should probably cover all of them at once, instead of only the `detached` option. * [EMPTY] Re-trigger CI * Test whether the ServeControllerClient in fact has the supplied `detached` parameter	2022-08-03 18:49:03 +02:00
Zoltan Fedor	7b97bbbff0	Extending the Ray Serve integration to allow attributes for Serve deployments (#2918 ) * Extending the Ray Serve integration to allow attributes for Serve deployments This closes #2917 We should be able to set Ray Serve attributes for the nodes of pipelines, like amount of GPU to use, max_concurrent_queries, etc. Now this is possible from the pipeline yaml file for each node of the pipeline. * Ran black and regenerated the json schemas * Fixing the JSON Schema generation * Trying to fix the schema CI test issue * Fixing the test and the schemas Python 3.8 was generating a different schema than Python 3.7 is creating in the CI. You MUST use Python 3.7 to generate the schemas, otherwise the CIs will fail. * Merge the two Ray pipeline test cases * Generate the JSON schemas again after `$ pip install .[all]` * Removing `haystack/json-schemas/haystack-pipeline-1.16.schema.json` This was generated by the JSON generator, but based on @ZanSara's instructions, I am removing it. * Making changes based on @ZanSara's request - the newly requested test is failing * Fixing the JSON schema generation again * Renaming `replicas` and moving it under `serve_deployment_kwargs` * add extras validation, untested * Dcoumentation update * Black * [EMPTY] Re-trigger CI Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-08-03 16:38:22 +02:00
tstadel	2c56305ed3	Fix serialization of numpy arrays and pandas dataframes in REST API (#2838 ) * correct serialization of numpy arrays and pandas dataframes * Update Documentation & Code Style * set additional json_encoders globally * Update Documentation & Code Style * add tests for non primitive return types Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-08-02 09:49:32 +02:00
Massimiliano Pippi	e7627c3f8b	Use opensearch-py in OpenSearchDocumentStore (#2691 ) * add Opensearch extras * let OpenSearchDocumentStore use opensearch-py * Update Documentation & Code Style * fix a bug found after adding tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-07-28 10:04:49 +02:00
Zoltan Fedor	adb2b2c312	Add support for BM25 with the Weaviate document store (#2860 ) * Upgrading Weaviate used for testing to 1.14.1 from 1.11.0 This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue. * Weaviate client upgrade From v3.3.3 to v3.6.0 * Adding BM25 Retrieval to Weaviate Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters). This commit adds support for inverted index (BM25) querying against Weaviate. * Running Black on the recent code changes * Update Documentation & Code Style * Fixing linting issues after code changes by black * The BM25 query needs to be in all lowercase for now The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate. See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119 * Fixing method parameter docstring to highlight that they are not supported in Weaviate * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-27 10:07:13 +02:00

1 2 3 4 5 ...

299 Commits