haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-22 00:11:14 +00:00

Author	SHA1	Message	Date
nickchomey	e6767fccef	bugfix for TranslationWrapperPipeline (#3290 ) * bugfix for TranslationWrapperPipeline * Update standard_pipelines.py * Update haystack/pipelines/standard_pipelines.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-10-04 09:44:48 +02:00
Jeff Risberg	ad8fbe56ee	bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node (#3170 ) * don't send the list of inputs back as an output in the running of a node. * updated documentation * Update pydoc-markdown.py * added test case for pipeline join fix Co-authored-by: JeffRisberg <jrisberg@aol.com>	2022-09-30 13:27:17 +02:00
Stefano Fiorucci	e2e6887ee8	Improve TransformersDocumentClassifier tests (#3270 )	2022-09-27 13:25:34 +02:00
Taner Topal	24d4591307	docs: Fix a docstring in ray.py	2022-09-27 09:05:04 +02:00
Vladimir Blagojevic	9582a423a2	fix: ONNX FARMReader model conversion is broken (#3211 )	2022-09-26 09:18:12 -04:00
Stefano Fiorucci	b579b9d54a	bug: make `ElasticSearchDocumentStore` use `batch_size` in `get_documents_by_id` (#3166 ) * use batch_size * try to fix git mess * improve docstrings * fix	2022-09-26 13:21:59 +02:00
Vladimir Blagojevic	9ca3ccae98	fix:MostSimilarDocumentsPipeline doesn't have pipeline property (#3265 ) * Add comments and a unit test * More unit tests for MostSimilarDocumentsPipeline	2022-09-23 09:46:48 -04:00
Vladimir Blagojevic	eba7cf51b1	chore: Remove Update API documentation hook (#3271 ) * Remove Update API documentation hook * Remove .github/utils/pydoc-markdown.py file	2022-09-23 08:54:08 -04:00
tstadel	05a86b9d3d	feat: FAISS in OpenSearch: Support HNSW for cosine (#3217 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * split PR * update docs * manually revert tutorial doc change * Fix embedding type * set integration marker correctly * make BaseDocumentStore.normalize_embedding static * format * fix handling of opensearch_faiss param * fix merge * add DenseRetriever typing * organize imports in conftest.py * organize imports in conftest.py (2) * fix DenseRetriever import * add opensearch-tests-linux	2022-09-23 13:26:49 +02:00
tstadel	4fa9d2d8e7	Fix milvus and faiss tests not running (#3263 ) * fix milvus and faiss tests not running * fix schema manually * fix test_dpr_embedding test for milvus * pip freeze on milvus tests * fix milvus1 tests being executed: fix all_doc_stores order * Revert "pip freeze on milvus tests" This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab. * make infer_required_doc_store more robust * don't skip tests without docstore requirements * use markers for docstore tests	2022-09-22 17:46:49 +02:00
Massimiliano Pippi	2b803a265b	run checks on release branches (#3267 )	2022-09-22 16:25:34 +02:00
Vladimir Blagojevic	820742cac7	Fix schema for 1.10.x (#3269 )	2022-09-22 15:20:51 +02:00
tstadel	b10e2c392e	chore: add `DenseRetriever` abstraction (#3252 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * update docs * fix imports * import DenseRetriever normally * update docs * fix deepcopy of documents * update schema * Revert "update schema" This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f. * fix schema for ci manually	2022-09-21 19:08:54 +02:00
Branden Chan	492a8046d8	docs: sync Haystack API with Readme (#3223 ) * First pass at syncing Haystack API with Readme * Reapply changes * Regularize slugs * Regularize slugs * Regularize slugs * Set category id and regen * Trigger workflow * Delete old md files * Test sync * Undo test string * Incorporate reviewer feedback * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Change name of pydoc-markdown scripts * Test on the fly API generation and sync * Remove version tag * Test version tag * Test version tag * Test version tag * Revert test docstring * Revert md file changes * Revert md file changes * Revert script naming * Test on the fly generation and sync * Adjust for on the fly generation and sync * Revert test string * Remove old documentation workflow * Set workflow to work on main * Change readme version name	2022-09-21 17:18:34 +02:00
Massimiliano Pippi	8f76d64f6f	chore: bump release number for unstable version (#3251 ) * bump version for unstable * allow generation of rc schemas * update schemas	2022-09-21 16:58:06 +02:00
Vladimir Blagojevic	938e6fda5b	Classify pipeline's type based on its components (#3132 ) * Add pipeline get_type mehod * Add pipeline uptime * Add pipeline telemetry event sending * Send pipeline telemetry once a day (at most) * Add pipeline invocation counter, change invocation counter logic * Update allowed telemetry parameters - allow pipeline parameters * PR review: add unit test	2022-09-21 14:53:42 +02:00
Stefano Fiorucci	89247b804c	refactor: make `TransformersDocumentClassifier` output consistent between different types of classification (#3224 ) * make output consistent * make output consistent * added tests for details * better tests * Update test_document_classifier.py * make black happy * Update test_document_classifier.py * Update test_document_classifier.py	2022-09-21 13:16:03 +02:00
Massimiliano Pippi	15bb6c2ea2	remove tutorials from the repo (#3244 )	2022-09-20 18:32:45 +02:00
Tuana Celik	336c144e72	chore: updating colab links in older docs versions (#3250 ) * updating colab links to tutorial 1 * remaining tutorials Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-09-20 18:15:29 +02:00
Vladimir Blagojevic	fe31896fcb	Proper retrieval of answers for batch eval (#3245 ) * Proper retrieval of answers and documents for batch eval	2022-09-20 08:16:03 -04:00
Malte Pietsch	7e79a48540	bug: reactivate benchmarks with quick fixes (#2766 ) * quick fix benchmark runs to make them work with current haystack version * fix minor typo * update readme. fix minor things to make benchmarks run again * Update Documentation & Code Style * fix typo in readme * update result files for reader and retriever querying * reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs) * change default memory allocation back to normal. add note to readme * add first indexing results * add memory to docker cmd * full benchmarks results on commit c5a2651fcbbeffca06ffa9036b10e62669bcc1b0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-09-20 10:22:08 +02:00
Massimiliano Pippi	9399ddf949	fix pydoc-markdown hook (#3238 )	2022-09-19 18:20:35 +02:00
Sara Zan	dcb132ba59	chore: remove f-strings from logs for performance reasons (#3212 ) * Use the %s syntax on all debug messages * Use the %s syntax on some more debug messages * Use the %s syntax on info messages * Use the %s syntax on warning messages * Use the %s syntax on error and exception messages * mypy * pylint * trogger tutorials execution in CI * trigger tutorials execution on CI * black * remove embeddings from repr * fix Document `__repr__` * address feedback * mypy	2022-09-19 18:18:32 +02:00
Massimiliano Pippi	8fbccbda82	fix: handle Documents containing dataframes in Multilabel constructor (#3237 ) * format * fix docs	2022-09-19 14:59:20 +02:00
banjocustard	19af6f4e40	bug: fix pdftotext installation verification (#3233 )	2022-09-19 11:32:58 +02:00
Massimiliano Pippi	859c303c16	include fontconfig in the final image and fix tagging (#3230 )	2022-09-16 15:33:24 +02:00
Malte Pietsch	3134b0d679	fix: type of `temperature` param and adjust defaults for `OpenAIAnswerGenerator` (#3073 ) * fix: type of temperature param and adjust defaults * update schema * update api docs	2022-09-16 14:11:33 +02:00
Massimiliano Pippi	4ddeb7b14b	chore: fix Windows CI (#3222 ) * replicate issue * pin openjdk version * not sure it's needed	2022-09-16 13:08:30 +02:00
nickchomey	42c963f54b	Update rest_api Docker Compose yamls for recent refactoring of rest_api (#3197 ) * update rest_api yamls for recent refactoring * Update docker-compose.yml	2022-09-15 19:47:40 +02:00
Anam Saatvik Reddy	f50b496f03	bug: fix embedding_dim mismatch in DocumentStore (#3183 ) * match index dim with embed dim (deepset-ai#3090) * aligned messages across all docstores * aligned messages across all docstores (deepset-ai#3090) * aligned messages across all docstores (deepset-ai#3090)	2022-09-15 15:23:53 +02:00
Sara Zan	768583d00c	chore: disable Windows ES tests on CI (#3220 ) * disable Windows ES tests * Add comments	2022-09-15 15:18:29 +02:00
Daniel Bichuetti	df1f4205b6	feat: add public layout-base extraction support on PDFToTextConverter (#3137 ) * feat(PDFToTextConverter): add option to get text in physical layout order * test: add physical layout extraction test to PDFToTextConverter * refactor: change layout parameter attribution places * docs: manually trigger pre-commits * docs: generate new docs to comply with pydoc-markdown style	2022-09-13 16:55:21 +02:00
Kristof Herrmann	da1cc577ae	feat: exponential backoff with exp decreasing batch size for opensearch client (#3194 ) * Validate custom_mapping properly as an object * Remove related test * black * feat: exponential backoff with exp dec batch size * added docstring and split doc lsit * fix * fix mypy * fix * catch generic exception * added test * mypy ignore * fixed no attribute * added test * added tests * revert strange merge conflicts * revert merge conflict again * Update haystack/document_stores/elasticsearch.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * done * adjust test * remove not required caplog * fixed comments Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-09-13 14:30:30 +01:00
Sara Zan	b47c93989b	remove imports redirect (#3204 )	2022-09-13 11:16:39 +01:00
Sara Zan	49b1c8856e	test: lower low boundary for accuracy in `test_calculate_context_similarity_on_non_matching_contexts` (#3199 ) * Change min value * revert test change and pin rapidfuzz<2.8.0 * duplicate	2022-09-13 09:32:38 +02:00
Massimiliano Pippi	64b0c43885	refactoring: reimplement Docker strategy (#3162 ) * setup base images * add cpu flavor * use the same Dockerfile for cpu and gpu * better naming, add docs * add docker workflow * add missing image input * change cwd for bake * also push api images * try conditional tagging for releases * revert testing code * update docker readme * document variable override * use Python 3.10 * allow empty HAYSTACK_EXTRAS * Apply suggestions from code review Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * remove repo description step, can't make it work so far * add docs to the last step as it's tricky * manage tags for the newest images * tests are passing, checking in the last bit Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-09-12 16:33:56 +02:00
Bijay Gurung	21aedc644f	feat: Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers (#3164 ) * Add option to use MultipleNegativesRankingLoss Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers * Move out losses into separate retriever/_losses.py module * Remove unused import in retriever/_losses.py * Apply documentation suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-09-12 09:38:04 +02:00
Sebastian	fc07799206	feat: Updates docs and types for language param in PreProcessor (#3186 ) * Small update to language param docs in PreProcessor	2022-09-12 08:52:52 +02:00
Sara Zan	96bb9b5905	bug: validate `custom_mapping` as an object (#3189 ) * Validate custom_mapping properly as an object * Remove related test * black	2022-09-09 18:03:29 +02:00
Daniel Bichuetti	621e1af74c	refactor: improve support for dataclasses (#3142 ) * refactor: improve support for dataclasses * refactor: refactor class init * refactor: remove unused import * refactor: testing 3.7 diffs * refactor: checking meta where is Optional * refactor: reverting some changes on 3.7 * refactor: remove unused imports * build: manual pre-commit run * doc: run doc pre-commit manually * refactor: post initialization hack for 3.7-3.10 compat. TODO: investigate another method to improve 3.7 compatibility. * doc: force pre-commit * refactor: refactored for both Python 3.7 and 3.9 * docs: manually run pre-commit hooks * docs: run api docs manually * docs: fix wrong comment * refactor: change no type-checked test code * docs: update primitives * docs: api documentation * docs: api documentation * refactor: minor test refactoring * refactor: remova unused enumeration on test * refactor: remove unneeded dir in gitignore * refactor: exclude all private fields and change meta def * refactor: add pydantic comment * refactor : fix for mypy on Python 3.7 * refactor: revert custom init * docs: update docs to new pydoc-markdown style * Update test/nodes/test_generator.py Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-09-09 11:31:37 +02:00
Daniel Bichuetti	1a6cbca9b6	feat: add health check endpoint to rest api (#3168 ) * feat: add /health endpoint to rest api * refactor: adjust to new dir structure * fix: add new rest api dependency * docs: add new openapi schema * docs: manual black run * refactor: remove some sys-wide details * docs: minor description changes * docs: minor description changes * docs: generate openapi schemas * tests: improved tests * refactor: add cls method decorator	2022-09-08 18:24:16 +02:00
Vladimir Blagojevic	e0d73f3ae0	Replace torch.device(cuda) with torch.device(cuda:0) in devices initialization (#3184 )	2022-09-08 09:36:38 -04:00
Vladimir Blagojevic	20880c9d41	Add 15 min timeout for downloading cached HF models (#3179 )	2022-09-07 08:35:09 -04:00
Sebastian	62e7c19011	fix: Reduce GPU to CPU copies at inference (#3127 ) * Send matrix from gpu to cpu once instead of individual elements * Moved location of if statement so it would be triggered only when needed. Provides very modest speedup for large top_k_per_sample	2022-09-07 11:00:05 +02:00
Steven Haley	9a750f7032	docs: Fix the word length splitting; should be set to 100 not 1,000 (#3133 ) * Fix the word length splitting; should be set to 100 not 1,000 due to limitations of transformer models * Update documentation for tutorial change	2022-09-07 10:57:54 +02:00
Vladimir Blagojevic	84acb6584f	Type all parameter constructors, add model_version optional parameter where applicable (#3152 )	2022-09-06 05:05:42 -04:00
Sebastian	20c2320434	Fix for torch device (#3161 )	2022-09-06 09:03:52 +02:00
Massimiliano Pippi	6790eaf7d8	refactor: update package strategy in rest_api (#3148 ) * update packaging * fix author metadata * add newline * add empty readme * fix path to pipeline files * fix pylint job * fix metadata	2022-09-05 16:58:43 +02:00
Massimiliano Pippi	e2110644c4	docs: add tests types to CONTRIBUTING.md (#3158 ) * Update CONTRIBUTING.md Add the outcome of #2811 to the developers docs Ideally, newly added tests will follow those requirements while we progressively adapt the existing tests to the new model. * address review comments	2022-09-05 16:56:48 +02:00
Daniel Bichuetti	e1f399284f	refactor: update dependencies and remove pins (#3147 ) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version	2022-09-05 14:30:35 +02:00

... 44 45 46 47 48 ...

3803 Commits