haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-29 16:36:34 +00:00

Author	SHA1	Message	Date
Massimiliano Pippi	a9a4156731	[Weaviate] Exit the while loop when we query less documents than available (#2537 ) * exit the while loop when we query less documents than available in Weaviate * use monkeypatch fixture, remove unused markers * we know key is there, use brackets to get the value * use custom exception * add warning message when we hit the QUERY_MAXIMUM_RESULTS problem * restore pytest marker * removed unused import * make the warning message more clear	2022-05-20 09:07:03 +02:00
Sara Zan	fd2ca359fe	Validation for Ray pipelines (#2545 ) * Ray pipelines now validate * Update Documentation & Code Style * rename Ray pipeline in tests * Add extras:ray to the test pipeline * pylint Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-19 19:40:03 +02:00
Sara Zan	89bb1ca139	[CI refactoring] Improve `autoformat.yml` (#2556 ) * Restructure autoformat to run a single script * Reduce diff for autoforma.yml * Reduce diff on linux_ci.yml	2022-05-18 20:02:43 +02:00
tstadel	f6e3a63906	Prevent losing names of utilized components when loaded from config (#2525 ) * Prevent losing names of utilized components when loaded from config * Update Documentation & Code Style * update test * fix failing tests * Update Documentation & Code Style * fix even more tests * Update Documentation & Code Style * incorporate review feedback Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-18 14:17:54 +02:00
tstadel	110b9c2b0a	Warnings for write operations of `DeepsetCloudDocumentStore` (#2565 ) * log inputs to write operations * Update Documentation & Code Style * adjust tests * simplify by using decorator for write operation functions * Update Documentation & Code Style * fix comma * fix comma in test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-17 17:53:55 +02:00
Stefano Fiorucci	686a19b35d	added launch_tika method (#2567 ) * added launch_tika method * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-17 17:53:04 +02:00
Julian Risch	5a1e98e3ff	Update scriptrunner module path for streamlit ui (#2566 ) * Pin streamlit version to <1.9.0 * update scriptrunner module path for streamlit ui	2022-05-17 16:06:44 +02:00
Julian Risch	70ca1e9fc6	Smaller demo instance type (#2564 ) This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge. g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory which results in 75% lower costs with g4dn.2xlarge. I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.	2022-05-17 12:47:15 +02:00
MichelBartels	a952ba240f	Include meta data when computing embeddings in EmbeddingRetriever (#2559 ) * include meta data when calculating embeddings in EmbeddingRetriever * Update Documentation & Code Style * fix None meta field * remove default values * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-17 12:37:04 +02:00
Sara Zan	ff4303c51b	[CI refactoring] Categorize tests into folders (#2554 ) * Categorize tests into folders * Fix linux_ci.yml and an import * Wrong path	2022-05-17 09:55:53 +01:00
Sara Zan	81223f8cd1	[CI refactoring] Avoid `ray==1.12.0` on Windows (#2562 ) * Avoid ray 1.12.0 on windows (bug) * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-17 09:55:16 +01:00
MichelBartels	686e9d24ef	Documenting output score of JoinDocuments when using concatenation (#2561 ) * add documentation regarding the score of JoinDocuments when using concatenation * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-16 18:30:07 +02:00
Ivan Lopez	a2a99f79b1	Fix docker image tag with semantic version for releases (#2548 ) * Fix docker tag with semantic version for releases * Prepend latest docker tag with tagprefix in cache-from	2022-05-16 13:26:33 +02:00
ClaMnc	2b11981b08	set top_k to 5 in SAS to be consistent (#2550 ) * set top_k to 5 in SAS to be consistent * set top_k to 5 in SAS to be consistent	2022-05-16 10:29:03 +02:00
Sara Zan	00aa1f41d7	convert_files_to_docs typo (#2546 )	2022-05-13 16:38:43 +02:00
Agnieszka Marzec	2d03a26045	Minor lg changes (#2533 ) * Minor lg change * Update Documentation & Code Style * Fix missing articles * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-13 16:12:22 +02:00
Agnieszka Marzec	1ae5a1449b	Update run() and run_batch() params descriptions in API (#2542 ) * Update run() and run_batch() params descriptions * Update Documentation & Code Style * Update api params descriptions * Update Documentation & Code Style * Fix typo Co-authored-by: bogdankostic <bogdankostic@web.de> * Add Bogdan's suggestions * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-05-13 15:11:01 +02:00
bogdankostic	300ee1ac83	Upgrade torch version to 1.11 (#2538 ) * Bump torch version * Upgrade torch version in torch-scatter	2022-05-13 14:45:53 +02:00
MichelBartels	4f22942cb0	Handle transformers pipeline flattening lists of length 1 (#2531 ) * Handle transformers pipeline flattening lists of length 1 * Consider case where only one document is passed * Change position of fix * Make use of top_k_per_candidate in predict method * Fix predict method Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-05-12 16:11:38 +02:00
tstadel	771ed0bb1d	Remove wrong retriever top_1 metrics from `print_eval_report` (#2510 ) * remove wrong retriever top_1 metrics * Update Documentation & Code Style * don't show wrong examples frame when n_wrong_examples is 0 * Update Documentation & Code Style * Update Documentation & Code Style * only use farm reader during eval tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-12 12:34:11 +02:00
bogdankostic	738e008020	Add `run_batch` method to all nodes and `Pipeline` to allow batch querying (#2481 ) * Add run_batch methods for batch querying * Update Documentation & Code Style * Fix mypy * Update Documentation & Code Style * Fix mypy * Fix linter * Fix tests * Update Documentation & Code Style * Fix tests * Update Documentation & Code Style * Fix mypy * Fix rest api test * Update Documentation & Code Style * Add Doc strings * Update Documentation & Code Style * Add batch_size as attribute to nodes supporting batching * Adapt error messages * Adapt type of filters in retrievers * Revert change about truncation_warning in summarizer * Unify multiple_doc_lists tests * Use smaller models in extractor tests * Add return types to JoinAnswers and RouteDocuments * Adapt return statements in reader's run_batch method * Allow list of filters * Adapt error messages * Update Documentation & Code Style * Fix tests * Fix mypy * Adapt print_questions * Remove disabling warning about too many public methods * Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py * Add type check * Update Documentation & Code Style * Adapt tutorial 11 * Update Documentation & Code Style * Add query_batch method for DCDocStore * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-11 11:11:00 +02:00
bogdankostic	5378a9ab48	Fix tutorials 4, 7 and 8 (#2526 ) * Fix tutorials 4, 7 and 8 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-11 09:17:05 +02:00
bogdankostic	4581b91e83	Make `DeepsetCloudDocumentStore` work with non-existing index (#2513 ) * Make DeepsetCloudDocumentStore work with non-existing index * Update Documentation & Code Style * Add tests * Update Documentation & Code Style * Fix tests, adapt warning messages + lowercase deepset * Update Documentation & Code Style * Fix typo in test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-10 15:21:35 +02:00
Massimiliano Pippi	7595bb49ab	rearrange contributing guidelines (#2515 ) * rearrange contributing guidelines * revert unneeded change to README	2022-05-10 14:59:46 +02:00
Branden Chan	43bfea6f3d	Add sort arg to JoinAnswers (#2436 ) * Add sort arg to JoinAnswers * Update Documentation & Code Style * Change naming and docstring * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-10 11:47:00 +02:00
Sara Zan	15a9ff6f67	PR template mention of enabling Actions (#2523 ) * Update version to 1.4.1rc0 * Add hint of enabling action on the fork in the PR template * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-10 09:46:09 +02:00
Sara Zan	3d8bdf3cb6	Remove safe import from `ElasticsearchDocumentStore` (#2522 ) * Update version to 1.4.1rc0 * Elasticsearch is not an optional dependency * Fix import path * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-09 18:07:42 +02:00
Gabriel Altay	988568882a	fix small typo in Document doc string (#2520 ) * fix small typo in Document doc string Was going through the tutorial, then digging through the code and just noticed a small typo * generate markdown file changes from docstrings Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-05-09 18:04:33 +02:00
bogdankostic	bce84577c6	Upgrade transformers version to 4.18.0 (#2514 ) * Upgrade transformers version to 4.18.0 * Adapt tokenization test to upgrade * Adapt tokenization test to upgrade	2022-05-06 16:57:13 +02:00
Branden Chan	caf1336424	Adjust pydoc markdown config so methods shown with classes (#2511 ) * add_member_class_prefix: true * Update Documentation & Code Style * Trigger redeploy * Trigger redeploy * Fix pydoc param * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-06 16:00:08 +02:00
Sara Zan	1ed407cb5a	Update version to 1.4.1rc0 (#2509 ) * Update version to 1.4.1rc0 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-06 11:46:31 +02:00
Julian Risch	081b886aa1	Release v1.4.0 (#2502 ) * delete unneeded files of last release * add v1.4.0 docs with updated links * upgrade version number * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> v1.4.0	2022-05-05 12:24:45 +02:00
Sara Zan	f3e0ba4be9	Fix `OpenSearchDocumentStore`'s `__init__` (#2498 ) * Move super in OpenSearchDocumentStore and add small test * Update Documentation & Code Style * Add Opensearch container to the CI Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-05 10:38:09 +02:00
MichelBartels	c7e39e5225	Replace TableTextRetriever with EmbeddingRetriever in Tutorial 15 (#2479 ) * replace TableTextRetriever with EmbeddingRetriever in Tutorial 15 * Update Documentation & Code Style * fix bug * Update Documentation & Code Style * update tutorial 15 outputs Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-212.eu-west-1.compute.internal>	2022-05-05 10:12:44 +02:00
MichelBartels	5d98810a17	Raise error if torch-scatter is not installed or wrong version is installed (#2486 ) * automatically download correct torch-scatter version * raise error if torch-scatter is not installed * Update Documentation & Code Style * catch all import errors and fix linter * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-05 10:12:10 +02:00
Julian Risch	1418f0c603	change milvus links from 2.0.0 to 2.0.x (#2496 ) * change milvus links from 2.0.0 to 2.0.x * Update Documentation & Code Style * fix two broken links * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-04 18:30:50 +02:00
Julian Risch	fa277bcea8	Upgrade xpdf to 4.04 in Exception text (#2488 ) * Upgrade xpdf to 4.04 in Exception text * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-04 17:42:32 +02:00
Sara Zan	f8e02310bf	Validate YAML files without loading the nodes (#2438 ) * Remove BasePipeline and make a module for RayPipeline * Can load pipelines from yaml, plenty of issues left * Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it * Fix pipeline tests * Move some tests out of test_pipeline.py and create MockDenseRetriever * myoy and pylint (silencing too-many-public-methods) * Fix issue found in some yaml files and in schema files * Fix paths to YAML and fix some typos in Ray * Fix eval tests * Simplify MockDenseRetriever * Fix Ray test * Accidentally pushed merge coinflict, fixed * Typo in schemas * Typo in _json_schema.py * Slightly reduce noisyness of version validation warnings * Fix version logs tests * Fix version logs tests again * remove seemingly unused file * Add check and test to avoid adding the same node to the pipeline twice * Update Documentation & Code Style * Revert config to pipeline_config * Remo0ve unused import * Complete reverting to pipeline_config * Some more stray config= * Update Documentation & Code Style * Feedback * Move back other_nodes tests into pipeline tests temporarily * Update Documentation & Code Style * Fixing tests * Update Documentation & Code Style * Fixing ray and standard pipeline tests * Rename colliding load() methods in dense retrievers and faiss * Update Documentation & Code Style * Fix mypy on ray.py as well * Add check for no root node * Fix tests to use load_from_directory and load_index * Try to workaround the disabled add_node of RayPipeline * Update Documentation & Code Style * Fix Ray test * Fix FAISS tests * Relax class check in _add_node_to_pipeline_graph * Update Documentation & Code Style * Try to fix mypy in ray.py * unused import * Try another fix for Ray * Fix connector tests * Update Documentation & Code Style * Fix ray * Update Documentation & Code Style * use BaseComponent.load() in pipelines/base.py * another round of feedback * stray BaseComponent.load() * Update Documentation & Code Style * Fix FAISS tests too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-05-04 17:39:06 +02:00
Sara Zan	01ea4bf21f	Change default encoding for `PDFToTextConverter` from `Latin 1` to `UTF-8` (#2420 ) * Change default encoding for PDFToTextConverter * Update Documentation & Code Style * Improve docstring * Update Documentation & Code Style * Add list of ligatures to ignore and add the possibility to modify such list at need * Add docstring * Add tests * Rename parameter * Update Documentation & Code Style * Move implementation into the base converter to make mypy happier * Update Documentation & Code Style * mypy and pylint * mypy * move encoding parameter to init of PDFToTextConverter * Update Documentation & Code Style * make utf8 default and fix mypy * Update Documentation & Code Style * Update Documentation & Code Style * remove note on encoding in tutorial8 * Update Documentation & Code Style * skip OCRConverter and test converter.run * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-05-04 17:01:45 +02:00
bogdankostic	a4e603ce87	Deprecate `Milvus1DocumentStore` (#2495 ) * Add warning message * Update doc string * Update Documentation & Code Style * Change DeprecationWarning to FutureWarning Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-04 15:09:57 +02:00
Julian Risch	970c476615	Align TransformersReader defaults with FARMReader (#2490 ) * Align TransformersReader defaults with vFARMReader * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-04 10:04:18 +02:00
James Briggs	a0bf34036f	fix dot_product metric in pinecone (#2494 )	2022-05-04 09:24:10 +02:00
Ahmed Nabil	9cdd719a6d	Update `xpdfreader` package installation (#2491 ) This Update will fix this exception `Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite. ` Now, converting PDFs wouldn't have any issues.	2022-05-03 18:09:41 +02:00
Tuana Celik	b6e369d1ca	changing the name of the retrievers from es_retriever to retriever (#2487 ) * changing the name of the retrievers from es_retriever to retriever * Update Documentation & Code Style * name fix 2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-03 18:08:23 +02:00
tstadel	509944f47d	Add support for positional args in pipeline.get_config() (#2478 ) * add support for positional args in pipeline.get_config() * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-02 14:41:07 +02:00
tstadel	7d6b3fe954	Add flag to disable scaling scores to probabilities (#2454 ) * add scale_scores_to_probabilities flag * Update Documentation & Code Style * fix tests * fix sql mypy * Update Documentation & Code Style * fix responses * Update Documentation & Code Style * rename to scale_score_to_probability + docstrings * use BaseDocumentStore.score_to_probability in elasticsearch and milvus2 * Update Documentation & Code Style * fix tests * Update Documentation & Code Style * add tests * improve naming * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-02 13:35:07 +02:00
tstadel	32ee271225	fix reader.eval and reader.eval_on_file output (#2476 )	2022-04-29 13:52:24 +02:00
Tuana Celik	e2b85e2913	Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever (#2461 ) * Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever * adding missed init file * Update Documentation & Code Style * fixed docstring * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-29 10:16:02 +02:00
MichelBartels	0a80cc452c	Linearize tables in EmbeddingRetriever (#2462 ) * Linearize tables in EmbeddingRetriever * Update Documentation & Code Style * Fix typing * Update Documentation & Code Style * simplify table linearization method + make it private * Update Documentation & Code Style * fix typing Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-28 12:06:59 +02:00
Jonathan Gallon	25b87e8cf0	Add support for aliases in elasticsearch document store (#2448 ) * Add support for aliases in elasticsearch document store * Add alias support for OpenSearch * Missing variable index * Update Documentation & Code Style * Add unit test for elasticsearch alias support * Fix unit test when index is not compatible with haystack * Fix auto format conflict * Add comment explaining for loop for alias * Update Documentation & Code Style Co-authored-by: Jonathan Gallon <jonathan.gallon@totalenergies.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-04-28 10:10:37 +02:00

... 3 4 5 6 7 ...

1461 Commits