haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-22 00:11:14 +00:00

Author	SHA1	Message	Date
Sara Zan	8d7439c623	Move autoformat-check.yml into tests.yml (#2635 )	2022-06-10 18:22:16 +02:00
Sara Zan	e5423b1515	Fix markers in GPL tests (#2652 )	2022-06-10 06:42:19 -04:00
Sara Zan	33a51fa915	[CI Refactoring] Move unrelated tests out of `test_pipeline.py` (#2573 ) * move unrelated tests out of test_pipeline.py * Update Documentation & Code Style * fix fixture name * Typo * Make sure all docs are Documents in routedocuments tests * Fix tests * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 11:45:13 +02:00
Vladimir Blagojevic	b13c32eb9c	Add GPL API docs, unit tests update (#2634 ) * Update test_label_generator.py * GPL increase default batch size to 16 * GPL - API docs * GPL - split unit tests * Make devs aware of multilingual GPL * Create separate train/save test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 05:25:28 -04:00
Agnieszka Marzec	f90649fab1	Update docstrings for GPL (#2633 ) * Update docstrings * Update Documentation & Code Style * Update wrong param description Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 10:04:06 +02:00
Stefano Fiorucci	c178f60e3a	Make crawler extract also hidden text (#2642 ) * make crawler extract also hidden text * Update Documentation & Code Style * try to adapt test for extract_hidden_text * Update Documentation & Code Style * fix test bug * fix bug in test * added test for hidden text" * Update Documentation & Code Style * fix bug in test * Update Documentation & Code Style * fix test * Update Documentation & Code Style * fix other test bug Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-10 09:51:41 +02:00
tstadel	c8f9e1b76c	Create target folder if not exists in EvalResult.save() (#2647 ) * Create target folder if not exists in EvalResult.save() * log out dir	2022-06-09 19:26:12 +02:00
Sara Zan	9968c373d2	make 'ready for review' an event that triggers the tests (#2643 )	2022-06-09 09:23:38 +02:00
tstadel	293a3b53d2	Fix params being changed during pipeline.eval() (#2638 )	2022-06-08 19:43:09 +02:00
Massimiliano Pippi	374155fd5c	Move Opensearch document store in its own module (#2603 ) * move OpenSearchDocumentStore into its own Python module * Update Documentation & Code Style * mark test with (sigh) elasticsearch * skip opensearch tests on windows Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-08 16:37:23 +02:00
tstadel	df6ebeb087	Do not show success message on failed evalset upload (#2639 ) * Do not show success message on failed evalset upload * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-08 08:31:25 +02:00
Sara Zan	c17969e001	Fix failing `Crawler` test (#2640 ) * Make tests insensntive to ordering of crawled pages * fix docstring	2022-06-07 18:14:43 +02:00
Sara Zan	c2d2faf31e	Add directive in `tests.yml` (#2637 )	2022-06-07 13:31:19 +02:00
Sara Zan	59608ca474	[CI Refactoring] Workflow refactoring (#2576 ) * Unify CI tests (from #2466) * Update Documentation & Code Style * Change folder names * Fix markers list * Remove marker 'slow', replaced with 'integration' * Soften children check * Start ES first so it has time to boot while Python is setup * Run the full workflow * Try to make pip upgrade on Windows * Set KG tests as integration * Update Documentation & Code Style * typo * faster pylint * Make Pylint use the cache * filter diff files for pylint * debug pylint statement * revert pylint changes * Remove path from asserted log (fails on Windows) * Skip preprocessor test on Windows * Tackling Windows specific failures * Fix pytest command for windows suites * Remove \ from command * Move poppler test into integration * Skip opensearch test on windows * Add tolerance in reader sas score for Windows * Another pytorch approx * Raise time limit for unit tests :( * Skip poppler test on Windows CI * Specify to pull with FF only in docs check * temporarily run the docs check immediately * Allow merge commit for now * Try without fetch depth * Accelerating test * Accelerating test * Add repository and ref alongside fetch-depth * Separate out code&docs check from tests * Use setup-python cache * Delete custom action * Remove the pull step in the docs check, will find a way to run on bot commits * Add requirements.txt in .github for caching * Actually install dependencies * Change deps group for pylint * Unclear why the requirements.txt is still required :/ * Fix the code check python setup * Install all deps for pylint * Make the autoformat check depend on tests and doc updates workflows * Try installing dependencies in another order * Try again to install the deps * quoting the paths * Ad back the requirements * Try again to install rest_api and ui * Change deps group * Duplicate haystack install line * See if the cache is the problem * Disable also in mypy, who knows * split the install step * Split install step everywhere * Revert "Separate out code&docs check from tests" This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd. * Add back the action * Proactive support for audio (see text2speech branch) * Fix label generator tests * Remove install of libsndfile1 on win temporarily * exclude audio tests on win * install ffmpeg for integration tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-07 09:23:03 +02:00
Sara Zan	83648b9bc0	[CI refactoring] Rewrite `Crawler` tests (#2557 ) * Rewrite crawler tests (very slow) and fix small crawler bug * Update Documentation & Code Style * compile the regex only once * Factor out the html files & add content check to most tests * Clarify that even starting URLs can be excluded * Update Documentation & Code Style * Change signature * Fix failing test * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-06 17:52:37 +02:00
bogdankostic	0a4477d315	Fix streamlit version to <1.10 in UI dependencies (#2630 ) * Trigger code-and-docs-check * Upgrade azure-ai-formrecognizer to 3.2.0b4 * Revert "Upgrade azure-ai-formrecognizer to 3.2.0b4" This reverts commit 21c3fc7e9b79b94143fb2d6009544a5cae9cf560. * Fix streamlit version to <1.10 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-03 15:01:00 +02:00
Ryan Russell	c1b7948e10	Improve Docs Readability (#2617 ) Signed-off-by: Ryan Russell <git@ryanrussell.org>	2022-06-03 09:57:40 +02:00
Julian Risch	3c6fcc3e42	Bump version to next release candidate (#2627 ) * bump version to next release candidate * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-02 18:58:44 +02:00
Julian Risch	4ca331c0a7	Bump version to v1.5.0 and copy docs folder (#2625 ) * bump version to v1.5.0 and copy docs folder * Update Documentation & Code Style * update links to v1.5.0 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> v1.5.0	2022-06-02 17:20:42 +02:00
Vladimir Blagojevic	e10a3fba74	Add Generative Pseudo Labeling (#2388 )	2022-06-02 10:12:47 -04:00
bogdankostic	61d9429c25	Simplify loading of `EmbeddingRetriever` (#2619 ) * Infer model format for EmbeddingRetriever automatically * Update Documentation & Code Style * Adapt conftest to automatic inference of model_format * Update Documentation & Code Style * Fix tests * Update Documentation & Code Style * Fix tests * Adapt tutorials * Update Documentation & Code Style * Add test for similarity scores with sentence transformers * Adapt doc string and warning message * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-02 15:05:29 +02:00
Sara Zan	ca19521c25	Fix new PyLint errors (#2624 ) * unnecessary-lambda-assignment * consider-using-generator * implicit-str-concat * consider-using-generator * consider-using-generator * implicit-str-concat * consider-using-generator * disable unnecessary-lambda-assignment * implicit-str-concat * Update Documentation & Code Style * implicit-str-concat * Remove no-self-use Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-02 13:45:36 +02:00
bogdankostic	a617ab950b	Fix number of returned values in `get_metadata_values_by_key` (#2614 ) * Apply pagination in get_metdata_values_by_key * Update Documentation & Code Style * Adapt test * Fix test_eval.py by using pytest.approx Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-01 10:21:28 +02:00
tstadel	6b78990a38	Fix Pipeline.get_config() for forked pipelines (#2616 ) * Fix Pipeline.get_config() for forked pipelines * exclude root nodes * minor quickfix	2022-05-31 21:26:53 +02:00
tstadel	0efad96e08	DC SDK: Add possibility to upload evaluation sets to DC (#2610 ) * Add possibility to upload evaluation sets to DC * fix test_eval sas comparisons * quickwin docstring feedback changes * Add hint about annotation tool and mark optional and required columns * minor changes to docstrings	2022-05-31 17:08:19 +02:00
tstadel	fc25adf959	Create eval runs on deepset Cloud (#2534 ) * add EvaluationRunClient * Update Documentation & Code Style * temporarily resolve names to ids * Update Documentation & Code Style * add delete and update methods * minor fixes * add experiments facade * dummy implement start_run() * start eval runs added * Update Documentation & Code Style * fix merge * switch to names on api level * add create eval_run test * Update Documentation & Code Style * further tests added * update docstrings * add docstrings * add missing tags param, fix docstrings * refactor _get_evaluation_sets * fix mypy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-30 18:18:19 +02:00
bogdankostic	0395533a78	Add `run_batch` for standard pipelines (#2595 ) * Add run_batch for standard pipelines * Update Documentation & Code Style * Fix mypy * Remove code duplication * Fix linter Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-27 10:42:48 +02:00
Julian Risch	b2a2c10fae	Update milvus installation instructions to v2 (#2598 )	2022-05-25 17:22:04 +02:00
tstadel	dd8dc588b1	fix eval with context matching in table qa use cases (#2597 )	2022-05-25 16:26:29 +02:00
tstadel	b6986ea25d	avoid empty api_endpoint (#2588 )	2022-05-25 08:51:04 +02:00
tstadel	7caca41c5d	Support context matching in `pipeline.eval()` (#2482 ) * calculate context pred metrics * Update Documentation & Code Style * extend doc_relevance_col values * fix import order * Update Documentation & Code Style * fix mypy * fix typings literal import * add option for custom document_id_field * Update Documentation & Code Style * fix tests and dataframe col-order * Update Documentation & Code Style * rename content to context in eval dataframe * add backward compatibility to EvaluationResult.load() * Update Documentation & Code Style * add docstrings * Update Documentation & Code Style * support sas * Update Documentation & Code Style * add answer_scope param * Update Documentation & Code Style * rework doc_relevance_col and keep document_id col in case of custom_document_id_field * Update Documentation & Code Style * improve docstrings * Update Documentation & Code Style * rename document_relevance_criterion into document_scope * Update Documentation & Code Style * add document_scope and answer_scope to print_eval_report * support all new features in execute_eval_run() * fix imports * fix mypy * Update Documentation & Code Style * rename pred_label_sas_grid into pred_label_matrix * update dataframe schema and sorting * Update Documentation & Code Style * pass through context_matching params and extend document_scope test * Update Documentation & Code Style * add answer_scope tests * fix context_matching_threshold for document metrics * shorten dataframe apply calls * Update Documentation & Code Style * fix queries getting lost if nothing was retrieved * Update Documentation & Code Style * Update Documentation & Code Style * use document_id scopes * Update Documentation & Code Style * fix answer_scope literal * Update Documentation & Code Style * update the docs (lg changes) * Update Documentation & Code Style * update tutorial 5 * Update Documentation & Code Style * fix tests * Add minor lg updates * final docstring changes * fix single quotes in docstrings * Update Documentation & Code Style * dataframe scopes added for each column * better docstrings for context_matching params * Update Documentation & Code Style * fix summarizer eval test * Update Documentation & Code Style * fix test * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2022-05-24 18:11:52 +02:00
tstadel	a70c6a2d4f	Fix knn params for aws managed opensearch (#2581 )	2022-05-24 18:10:05 +02:00
bogdankostic	1ab2b977c0	Fix crawler (#2591 )	2022-05-24 12:34:31 +02:00
bogdankostic	867695ad0c	Change signature of queries param in batch methods (#2575 ) * Change signature of queries param in batch methods * Update Documentation & Code Style * Fix mypy * Remove unused import * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-24 12:33:45 +02:00
Julian Risch	075ed7fbcb	Remove encoding option from PDFToTextOCRConverter (#2553 ) * remove encoding option from PDFToTextOCRConverter * Update Documentation & Code Style * add unused 'encoding' param to PDFToTextOCRConverter * Update Documentation & Code Style * call run instead of convert to use ligature replacing * Update Documentation & Code Style * add text to check installed poppler version * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-24 11:31:32 +02:00
Sara Zan	7ab0239e31	Do not copy `_component_config` in `get_components_definitions` (#2574 ) * Do not deepcopy in get_components_definitions * Update Documentation & Code Style * comment * unused import * Add test to ensure env vars don't overwrite _component_config * Update Documentation & Code Style * Add test for get_config * Add test to show the rename is not sufficient * Update Documentation & Code Style * copy only if it's strictly necessary * Update Documentation & Code Style * Apply suggestions from code review Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> * review feedback Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-05-24 09:53:59 +02:00
dimitrisna	5bda63a6c0	Add training checkpoint in retriever trainer (#2543 ) * Update dense.py * Update dense.py * Update dense.py * Update dense.py * Update dense.py * Update dense.py * Update dense.py * Update dense.py * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-24 09:51:26 +02:00
Agnieszka Marzec	dd83f71a8f	Minor lg updates to doc strings (#2585 ) * Minor lg updates to doc strings * Update all models descriptions	2022-05-24 09:35:13 +02:00
Agnieszka Marzec	ebd54b225b	Update Ray pipeline docs with validation info (#2590 ) * Update Ray pipeline docs * Add Sara's suggestion * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-24 09:29:52 +02:00
tstadel	3ab4dac58d	Upload files to deepset Cloud (#2570 ) * added upload_files * Update Documentation & Code Style * expose file client via DeepsetCloud facade * Update Documentation & Code Style * tests added * Update Documentation & Code Style * always read file in binary mode and guess mimetype * add delete and list functions * fix method literals Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-23 17:05:56 +02:00
tstadel	0e83535108	Show search endpoint after deepset Cloud deployment (#2569 ) * show try-out-message after deployment * better messages * Update Documentation & Code Style * tests added * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-23 14:19:31 +02:00
MichelBartels	16b0fdd804	Add DeBERTaV2/V3 support (#2097 ) * add debertav2/v3 * update comments * Apply Black * assume support for fast deberta tokenizer * Apply Black * update required transformers version for deberta * fix mismatched vocab error * Update Documentation & Code Style * update debertav2 doc string Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-23 09:55:14 +02:00
Massimiliano Pippi	a9a4156731	[Weaviate] Exit the while loop when we query less documents than available (#2537 ) * exit the while loop when we query less documents than available in Weaviate * use monkeypatch fixture, remove unused markers * we know key is there, use brackets to get the value * use custom exception * add warning message when we hit the QUERY_MAXIMUM_RESULTS problem * restore pytest marker * removed unused import * make the warning message more clear	2022-05-20 09:07:03 +02:00
Sara Zan	fd2ca359fe	Validation for Ray pipelines (#2545 ) * Ray pipelines now validate * Update Documentation & Code Style * rename Ray pipeline in tests * Add extras:ray to the test pipeline * pylint Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-19 19:40:03 +02:00
Sara Zan	89bb1ca139	[CI refactoring] Improve `autoformat.yml` (#2556 ) * Restructure autoformat to run a single script * Reduce diff for autoforma.yml * Reduce diff on linux_ci.yml	2022-05-18 20:02:43 +02:00
tstadel	f6e3a63906	Prevent losing names of utilized components when loaded from config (#2525 ) * Prevent losing names of utilized components when loaded from config * Update Documentation & Code Style * update test * fix failing tests * Update Documentation & Code Style * fix even more tests * Update Documentation & Code Style * incorporate review feedback Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-18 14:17:54 +02:00
tstadel	110b9c2b0a	Warnings for write operations of `DeepsetCloudDocumentStore` (#2565 ) * log inputs to write operations * Update Documentation & Code Style * adjust tests * simplify by using decorator for write operation functions * Update Documentation & Code Style * fix comma * fix comma in test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-17 17:53:55 +02:00
Stefano Fiorucci	686a19b35d	added launch_tika method (#2567 ) * added launch_tika method * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-17 17:53:04 +02:00
Julian Risch	5a1e98e3ff	Update scriptrunner module path for streamlit ui (#2566 ) * Pin streamlit version to <1.9.0 * update scriptrunner module path for streamlit ui	2022-05-17 16:06:44 +02:00
Julian Risch	70ca1e9fc6	Smaller demo instance type (#2564 ) This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge. g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory which results in 75% lower costs with g4dn.2xlarge. I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.	2022-05-17 12:47:15 +02:00

... 49 50 51 52 53 ...

3803 Commits