haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-15 00:57:12 +00:00

Author	SHA1	Message	Date
Malte Pietsch	7e79a48540	bug: reactivate benchmarks with quick fixes (#2766 ) * quick fix benchmark runs to make them work with current haystack version * fix minor typo * update readme. fix minor things to make benchmarks run again * Update Documentation & Code Style * fix typo in readme * update result files for reader and retriever querying * reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs) * change default memory allocation back to normal. add note to readme * add first indexing results * add memory to docker cmd * full benchmarks results on commit c5a2651fcbbeffca06ffa9036b10e62669bcc1b0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-09-20 10:22:08 +02:00
Massimiliano Pippi	9399ddf949	fix pydoc-markdown hook (#3238 )	2022-09-19 18:20:35 +02:00
Massimiliano Pippi	8fbccbda82	fix: handle Documents containing dataframes in Multilabel constructor (#3237 ) * format * fix docs	2022-09-19 14:59:20 +02:00
Malte Pietsch	3134b0d679	fix: type of `temperature` param and adjust defaults for `OpenAIAnswerGenerator` (#3073 ) * fix: type of temperature param and adjust defaults * update schema * update api docs	2022-09-16 14:11:33 +02:00
Daniel Bichuetti	df1f4205b6	feat: add public layout-base extraction support on PDFToTextConverter (#3137 ) * feat(PDFToTextConverter): add option to get text in physical layout order * test: add physical layout extraction test to PDFToTextConverter * refactor: change layout parameter attribution places * docs: manually trigger pre-commits * docs: generate new docs to comply with pydoc-markdown style	2022-09-13 16:55:21 +02:00
Bijay Gurung	21aedc644f	feat: Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers (#3164 ) * Add option to use MultipleNegativesRankingLoss Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers * Move out losses into separate retriever/_losses.py module * Remove unused import in retriever/_losses.py * Apply documentation suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-09-12 09:38:04 +02:00
Sebastian	fc07799206	feat: Updates docs and types for language param in PreProcessor (#3186 ) * Small update to language param docs in PreProcessor	2022-09-12 08:52:52 +02:00
Daniel Bichuetti	621e1af74c	refactor: improve support for dataclasses (#3142 ) * refactor: improve support for dataclasses * refactor: refactor class init * refactor: remove unused import * refactor: testing 3.7 diffs * refactor: checking meta where is Optional * refactor: reverting some changes on 3.7 * refactor: remove unused imports * build: manual pre-commit run * doc: run doc pre-commit manually * refactor: post initialization hack for 3.7-3.10 compat. TODO: investigate another method to improve 3.7 compatibility. * doc: force pre-commit * refactor: refactored for both Python 3.7 and 3.9 * docs: manually run pre-commit hooks * docs: run api docs manually * docs: fix wrong comment * refactor: change no type-checked test code * docs: update primitives * docs: api documentation * docs: api documentation * refactor: minor test refactoring * refactor: remova unused enumeration on test * refactor: remove unneeded dir in gitignore * refactor: exclude all private fields and change meta def * refactor: add pydantic comment * refactor : fix for mypy on Python 3.7 * refactor: revert custom init * docs: update docs to new pydoc-markdown style * Update test/nodes/test_generator.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-09-09 11:31:37 +02:00
Daniel Bichuetti	1a6cbca9b6	feat: add health check endpoint to rest api (#3168 ) * feat: add /health endpoint to rest api * refactor: adjust to new dir structure * fix: add new rest api dependency * docs: add new openapi schema * docs: manual black run * refactor: remove some sys-wide details * docs: minor description changes * docs: minor description changes * docs: generate openapi schemas * tests: improved tests * refactor: add cls method decorator	2022-09-08 18:24:16 +02:00
Vladimir Blagojevic	84acb6584f	Type all parameter constructors, add model_version optional parameter where applicable (#3152 )	2022-09-06 05:05:42 -04:00
Daniel Bichuetti	e1f399284f	refactor: update dependencies and remove pins (#3147 ) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version	2022-09-05 14:30:35 +02:00
Branden Chan	d4722c2ec5	Document FARMReader.train() evaluation report log level (#3129 ) * Mention evaluation report logging level * Mention evaluation report logging level	2022-09-01 10:58:47 +02:00
Vladimir Blagojevic	356537c883	Standardize devices parameter and device initialization (#3062 ) * Use devices parameter and initialize devices consistently	2022-08-31 15:30:31 +02:00
Julian Risch	f010a17f04	increase version to next release candidate (#3115 )	2022-08-29 17:05:44 +02:00
Julian Risch	4e518cdddd	chore: increase version for 1.8 release (#3109 ) * increase version for 1.8 release * ignore missing-timeout for pylint	2022-08-26 15:00:14 +02:00
Julian Risch	3e3ff33cdd	feat: add batch evaluation method for pipelines (#2942 ) * add basic pipeline.eval_batch for qa without filters * black formatting * pydoc-markdown * remove batch eval tests failing due to bugs * remove comment * explain commented out tests * avoid code duplication * black * mypy * pydoc markdown * add batch option to execute_eval_run * pydoc markdown * Apply documentation suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply documentation suggestion from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * add documentation based on review comments * black * black * schema updates * remove duplicate tests * add separate method for column reordering * merge _build_eval_dataframe methods * pylint ignore in function * change type annotation of queries to list only * one-liner addressing review comment on params dict * markdown files updated Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-25 17:50:57 +02:00
Julian Risch	cc9d39c360	increase version to next release candidate (#3100 )	2022-08-25 15:55:34 +02:00
Julian Risch	0950db5032	chore: increase version to 1.7.2 for patch release (#3097 ) * schema update * schema update audio nodes * schema update audio param type	2022-08-25 13:55:28 +02:00
Sebastian	0cf0568dd0	fix: Use use_auth_token in all cases when loading from the HF Hub (#3094 ) * Making sure to pass on use_auth_token to all from_pretrained calls	2022-08-25 10:30:03 +02:00
Sara Zan	e92ea4fccb	refactor: rename `master` into `main` in documentation and links (#3063 ) * master->main * revert master rename * Revert change to sphinx link and rename master schema	2022-08-24 19:05:12 +02:00
tstadel	92046ce5b5	feat: FAISS in OpenSearch: Support HNSW for dot product and l2 (#3029 ) * support faiss hnsw * blacken * update docs * improve similarity check * add tests * update schema * set ef_search param correctly * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * regenerate docs Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 16:43:48 +02:00
James Briggs	9b1b03002f	update to PineconeDocumentStore to remove dependency on SQL db (#2749 ) * update to PineconeDocumentStore to remove dependency on SQL db * Update Documentation & Code Style * typing fixes * Update Documentation & Code Style * fixed embedding generator to yield Documents * Update Documentation & Code Style * fixes for final typing issues * fixes for pylint * Update Documentation & Code Style * uncomment pinecone tests * added new params to docstrings * Update Documentation & Code Style * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update Documentation & Code Style * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * Update haystack/document_stores/pinecone.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com> * changes based on comments, updated errors and install * Update Documentation & Code Style * mypy * implement simple filtering in pinecone mock * typo * typo in reverse * account for missing meta key in filtering * typo * added metadata filtering to describe index * added handling for users switching indexes in same doc store, and handling duplicate docs in write * syntax tweaks * added index option to document/embedding count calls * labels implementation in progress * added metadata fields to be indexed for pinecone tests * further changes to mock * WIP implementation of labels+multilabels * switched to rely on labels namespace rather than filter * simpler delete_labels * label fixes, remove debug code * Apply dostring fixes Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * pylint * docs * temporarily un-mock Pinecone * Small Pinecone test suite * pylint * Add fake test key to pass the None check * Add again fake test key to pass the None check * Add Pinecone to default docstores and fix filters * Fix field name * Change field name * Change field value * Remove comments * forgot to upgrade pyproject.toml Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sarazanzo94@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-24 13:27:15 +02:00
Branden Chan	6d4031d8f6	Add OpenAI Answer Generator API (#3050 ) * Add OpenAI Answer Generator API * Regen tutorials * Regen md files * Incorporate reviewer feedback * Incorporate reviewer feedback * Incorporate reviewer feedback * Incorporate reviewer feedback	2022-08-24 09:20:08 +02:00
Sebastian	3ea57801ae	feat: Early stopping can be used in Reader and Retriever training (#3071 ) * Add option to set early stopping in training * Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.	2022-08-23 14:18:12 +02:00
Daniel Bichuetti	d715d0202d	fix: update ChromeDriver options on restricted environments and add ChromeDriver options as function parameter (#3043 ) * Fix when env does nto exist * Fix missed line * Set conservative chromedriver options * Set default options based on environment * Fix removed line * Updated documentation * Generate new schemas manually * Add arguments via iterator and helper function * Pre-push doc format * Use imported Option vs full namespace access * Manually update schema * Manually add documentation and schema * Fix language and documentation * Fix typo * Auto generated docs * Updated documentation	2022-08-22 12:59:33 +02:00
Daniel Bichuetti	d5e36ce6b4	fix(translator): write translated text to output documents, while keeping input untouched (#3077 ) * Set translated text on a copy of original document * Return new translated list * Manually generated docs TODO: check pre-commit * Hook generated file * Rename variables for better maintenance * fix(translator): prevent inputs from being changed * fix: manual update translator docs * style(translator): explicit type declaration on List * docs(translator): re-run pre-commit hook * style(translator): ignore mypy wrong type check * docs(translator): re-run pre-commit hook	2022-08-22 04:07:05 -04:00
Julian Risch	bc6f71b5ba	chore: increase version to next release candidate (#3067 ) * increase version to next release candidate * generate schema files	2022-08-19 14:49:50 +02:00
Julian Risch	eb0f0da0fd	Prepare 1.7.1 release (#3061 ) * prepare 1.7.1 release * Fix schemas * Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * change back main to master * remove newline at end of file * generate schema file with no newline Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-19 13:24:40 +02:00
tstadel	1027ab3624	Bump Version to 1.7.1rc (#3041 ) * bump version to 1.7.1rc * update openapi	2022-08-18 10:31:57 +02:00
tstadel	baefd32b6f	Upgrade to v1.7.0 and copy docs folder (#3014 ) * update version to 1.7.0 * copy docs * update openapi * generate schemas * make update_json_schema() idempotent * update docs, schema and openapi	2022-08-15 14:20:30 +02:00
Julian Risch	d61755322f	chore: fix typo in API docs (#3023 ) * chore: fix typo in API docs * fix openapi Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-08-15 13:25:20 +02:00
tstadel	0aa0c68785	Fix broken `MultiLabel` serialization (#3037 ) * Fix MultiLabel serialization * update docs * better comment * remove unused imports * remove unused imports (2)	2022-08-15 13:09:18 +02:00
Branden Chan	ff38a20863	docs: update File Classifier Docstring (#3018 ) * Update docstring * Trigger pre-commit hook * Trigger pre-commit hook * Incorporate reviewer feedback * Incorporate reviewer feedback	2022-08-15 12:37:28 +02:00
Branden Chan	7312f99584	Update Summarizer Docs (#3032 ) * Change text to content * Change text to content	2022-08-15 12:35:41 +02:00
bogdankostic	3a849d6c07	bug: Make `TranslationWrapperPipeline` work with `QuestionAnswerGenerationPipeline` (#3034 ) * Overwrite output_translator's run method with run_batch * Fix mypy * Revert change * Overwrite run method only with QuestionAnswerGenerationPipeline	2022-08-15 10:05:34 +02:00
Dmitry Goryunov	da7836a931	feat: Support embedding dimensions on DeepsetCloudDocumentStore (#2995 ) * Add embedding_dim to dc store * Remove similarity from query params, it is not used * Remove unused `return_embedding` parameter * Remove unused param * Update the documentation * Update schemas * Revert openapi changes * Revert openapi changes * Fix openapi * Fix json schema * Improve docstrings Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Improve logs Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update the docs * Fix similarity Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-12 11:46:52 +02:00
James Briggs	26c938a8e6	test: add meta fields for meta_config to be used during testing (#3021 ) * added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore * Add documentation on metadata filtering in docstring * docs Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-12 10:27:56 +02:00
Sebastian	44e2b1beed	Resolving issue 2853: no answer logic in FARMReader (#2856 ) * Update FARMReader.eval_on_file to be consistent with FARMReader.eval * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-08-11 16:45:03 +02:00
Zoltan Fedor	408d8e6ff5	Enable the `JoinDocuments` node to work with documents with `score=None` (#2984 ) * Enable the `JoinDocuments` node to work with documents with `score=None` This fixes #2983 As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate. THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`. There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436 This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried. The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default. * Fixing test bug * Addressing PR review comments - Extending unit tests - Simplifying logic * Making the sorting work even with no scores By making the no score being sorted as -Inf * Forgot to commit the change in `join_docs.py` * [EMPTY] Re-trigger CI * Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None` * Adjusting the arguments of `any()` * [EMPTY] Re-trigger CI	2022-08-11 10:43:25 +02:00
Massimiliano Pippi	2cd65e99b8	revert Remove pipes (#3006 )	2022-08-11 10:42:22 +02:00
Zoltan Fedor	f4128d3581	Adding support for additional distance/similarity metrics for Weaviate (#3001 ) * Adding support for additional distance metrics for Weaviate Fixes #3000 * Updating the docs * Fixing error texts * Fixing issues raised by the review * Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity` * [EMPTY] Re-trigger CI * Fixing things based on review * [EMPTY] Re-trigger CI	2022-08-11 09:48:21 +02:00
bogdankostic	5c3bfad078	feat: Add page number to Documents coming from PDFConverters and PreProcessor (#2932 ) * Add page number to Documents coming from PDFConverters and PreProcessor * Fix mypy * Update API Docs * Update API Docs * Remove unused imports * Generate JSON schema * Generate JSON schema * Make test variable shorter * Make regex a separate function * Move counting of page breaks to a function * Generate JSON schema * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update API Documentation * Don't create instance for testing staticmethod * Update haystack/nodes/preprocessor/preprocessor.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-09 15:55:27 +02:00
Branden Chan	dfeb171686	Add API page for util functions (#2863 ) * Clean OpenAIAnswerGenerator docstrings * Incorporate reviewer feedback * Update Documentation & Code Style * Improve id_hash_keys description * Simplify id_hash_keys description * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-08-09 14:53:45 +02:00
Stefano Fiorucci	4a63484916	feat: Extend `TransformersQueryClassifier`: clean version (#2965 ) * extend query classifier in one commit * variable number of outgoing edges * improve tests * fix unused import * lightweight approach * fix _calculate_outgoing_edges * remove duplicate label validation * Remove print	2022-08-09 09:43:33 +02:00
MichelBartels	c91316e862	feat: add gradient accumulation in FARMReader (#2925 ) * expose gradient accumulation to train function of FARMReader * add documentation for gradient accumulation * Update Documentation & Code Style * doc string improvements Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * doc string improvements Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * doc string improvements Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-08-08 18:42:21 +02:00
Vladimir Blagojevic	d1f8b7118c	Add progress bar to batch run component ops (#2864 ) * Add progress bar to batch run component ops * Update docs * Update schema * PR review: thanks Bogdan	2022-08-08 09:32:44 -04:00
Sara Zan	1a0a4c8836	Remove pipes from code block (#2973 ) * Remove pipes * Generate md	2022-08-05 19:18:57 +02:00
Vladimir Blagojevic	4f8d11c591	Update Seq2SeqGenerator API documentation (#2970 ) * Seq2SeqGenerator - update API docs	2022-08-05 17:39:23 +02:00
Vladimir Blagojevic	762a12fcb1	Print eval reports improvements (#2941 )	2022-08-04 11:21:27 -04:00
Bilge Yücel	489699bd98	Fix docs code format for sentence transformers (#2957 ) Co-authored-by: bilge4 <bilge@techwolf.ai>	2022-08-04 12:31:42 +02:00

1 2 3 4 5 ...

306 Commits