1305 Commits

Author SHA1 Message Date
Massimiliano Pippi
bb729ab95f
wait for postgres to be ready before data migrations (#2654) 2022-06-10 19:30:57 +02:00
Sara Zan
54518ac790
[CI Refactoring] Refactor Document fixtures in tests (#2577)
* Refactor document fixtures

* Add embedding files

* Update Documentation & Code Style

* Indentation issue

* Update Documentation & Code Style

* Fix type conversion in conftest.py

* Update Documentation & Code Style

* mypy on sql.py

* mypy on crawler.py

* mypy on pinecone.py

* Adapt retriever tests

* Update Documentation & Code Style

* mypy on crawler.py

* Update Documentation & Code Style

* mypy on crawler.py again

* Update Documentation & Code Style

* mypy fix was too rough

* Fix some more tests

* Update Documentation & Code Style

* Skip meaningless test on FilterRetriever

* Make embedding values less specific

* Update Documentation & Code Style

* Use stable IDs in retriever tests that depend on it

* Remove needless fixtures

* docs_with_ids

* Update Documentation & Code Style

* Typo

* Fix retriever tests

* Fix reader tests

* Update Documentation & Code Style

* Workaround #2626

* Update Documentation & Code Style

* Fix label generator tests

* Reorder vectors

* remove print

* Update Documentation & Code Style

* Update Documentation & Code Style

* git tags leftover

* Update Documentation & Code Style

* fix last failing test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 18:22:48 +02:00
Sara Zan
8d7439c623
Move autoformat-check.yml into tests.yml (#2635) 2022-06-10 18:22:16 +02:00
Sara Zan
e5423b1515
Fix markers in GPL tests (#2652) 2022-06-10 06:42:19 -04:00
Sara Zan
33a51fa915
[CI Refactoring] Move unrelated tests out of test_pipeline.py (#2573)
* move unrelated tests out of test_pipeline.py

* Update Documentation & Code Style

* fix fixture name

* Typo

* Make sure all docs are Documents in routedocuments tests

* Fix tests

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 11:45:13 +02:00
Vladimir Blagojevic
b13c32eb9c
Add GPL API docs, unit tests update (#2634)
* Update test_label_generator.py

* GPL increase default batch size to 16

* GPL - API docs

* GPL - split unit tests

* Make devs aware of multilingual GPL

* Create separate train/save test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 05:25:28 -04:00
Agnieszka Marzec
f90649fab1
Update docstrings for GPL (#2633)
* Update docstrings

* Update Documentation & Code Style

* Update wrong param description

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 10:04:06 +02:00
Stefano Fiorucci
c178f60e3a
Make crawler extract also hidden text (#2642)
* make crawler extract also hidden text

* Update Documentation & Code Style

* try to adapt test for extract_hidden_text

* Update Documentation & Code Style

* fix test bug

* fix bug in test

* added test for hidden text"

* Update Documentation & Code Style

* fix bug in test

* Update Documentation & Code Style

* fix test

* Update Documentation & Code Style

* fix other test bug

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 09:51:41 +02:00
tstadel
c8f9e1b76c
Create target folder if not exists in EvalResult.save() (#2647)
* Create target folder if not exists in EvalResult.save()

* log out dir
2022-06-09 19:26:12 +02:00
Sara Zan
9968c373d2
make 'ready for review' an event that triggers the tests (#2643) 2022-06-09 09:23:38 +02:00
tstadel
293a3b53d2
Fix params being changed during pipeline.eval() (#2638) 2022-06-08 19:43:09 +02:00
Massimiliano Pippi
374155fd5c
Move Opensearch document store in its own module (#2603)
* move OpenSearchDocumentStore into its own Python module

* Update Documentation & Code Style

* mark test with (sigh) elasticsearch

* skip opensearch tests on windows

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-08 16:37:23 +02:00
tstadel
df6ebeb087
Do not show success message on failed evalset upload (#2639)
* Do not show success message on failed evalset upload

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-08 08:31:25 +02:00
Sara Zan
c17969e001
Fix failing Crawler test (#2640)
* Make tests insensntive to ordering of crawled pages

* fix docstring
2022-06-07 18:14:43 +02:00
Sara Zan
c2d2faf31e
Add directive in tests.yml (#2637) 2022-06-07 13:31:19 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring (#2576)
* Unify CI tests (from #2466)

* Update Documentation & Code Style

* Change folder names

* Fix markers list

* Remove marker 'slow', replaced with 'integration'

* Soften children check

* Start ES first so it has time to boot while Python is setup

* Run the full workflow

* Try to make pip upgrade on Windows

* Set KG tests as integration

* Update Documentation & Code Style

* typo

* faster pylint

* Make Pylint use the cache

* filter diff files for pylint

* debug pylint statement

* revert pylint changes

* Remove path from asserted log (fails on Windows)

* Skip preprocessor test on Windows

* Tackling Windows specific failures

* Fix pytest command for windows suites

* Remove \ from command

* Move poppler test into integration

* Skip opensearch test on windows

* Add tolerance in reader sas score for Windows

* Another pytorch approx

* Raise time limit for unit tests :(

* Skip poppler test on Windows CI

* Specify to pull with FF only in docs check

* temporarily run the docs check immediately

* Allow merge commit for now

* Try without fetch depth

* Accelerating test

* Accelerating test

* Add repository and ref alongside fetch-depth

* Separate out code&docs check from tests

* Use setup-python cache

* Delete custom action

* Remove the pull step in the docs check, will find a way to run on bot commits

* Add requirements.txt in .github for caching

* Actually install dependencies

* Change deps group for pylint

* Unclear why the requirements.txt is still required :/

* Fix the code check python setup

* Install all deps for pylint

* Make the autoformat check depend on tests and doc updates workflows

* Try installing dependencies in another order

* Try again to install the deps

* quoting the paths

* Ad back the requirements

* Try again to install rest_api and ui

* Change deps group

* Duplicate haystack install line

* See if the cache is the problem

* Disable also in mypy, who knows

* split the install step

* Split install step everywhere

* Revert "Separate out code&docs check from tests"

This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.

* Add back the action

* Proactive support for audio (see text2speech branch)

* Fix label generator tests

* Remove install of libsndfile1 on win temporarily

* exclude audio tests on win

* install ffmpeg for integration tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00
Sara Zan
83648b9bc0
[CI refactoring] Rewrite Crawler tests (#2557)
* Rewrite crawler tests (very slow) and fix small crawler bug

* Update Documentation & Code Style

* compile the regex only once

* Factor out the html files & add content check to most tests

* Clarify that even starting URLs can be excluded

* Update Documentation & Code Style

* Change signature

* Fix failing test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-06 17:52:37 +02:00
bogdankostic
0a4477d315
Fix streamlit version to <1.10 in UI dependencies (#2630)
* Trigger code-and-docs-check

* Upgrade azure-ai-formrecognizer to 3.2.0b4

* Revert "Upgrade azure-ai-formrecognizer to 3.2.0b4"

This reverts commit 21c3fc7e9b79b94143fb2d6009544a5cae9cf560.

* Fix streamlit version to <1.10

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-03 15:01:00 +02:00
Ryan Russell
c1b7948e10
Improve Docs Readability (#2617)
Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-06-03 09:57:40 +02:00
Julian Risch
3c6fcc3e42
Bump version to next release candidate (#2627)
* bump version to next release candidate

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 18:58:44 +02:00
Julian Risch
4ca331c0a7
Bump version to v1.5.0 and copy docs folder (#2625)
* bump version to v1.5.0 and copy docs folder

* Update Documentation & Code Style

* update links to v1.5.0

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.5.0
2022-06-02 17:20:42 +02:00
Vladimir Blagojevic
e10a3fba74
Add Generative Pseudo Labeling (#2388) 2022-06-02 10:12:47 -04:00
bogdankostic
61d9429c25
Simplify loading of EmbeddingRetriever (#2619)
* Infer model format for EmbeddingRetriever automatically

* Update Documentation & Code Style

* Adapt conftest to automatic inference of model_format

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Adapt tutorials

* Update Documentation & Code Style

* Add test for similarity scores with sentence transformers

* Adapt doc string and warning message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 15:05:29 +02:00
Sara Zan
ca19521c25
Fix new PyLint errors (#2624)
* unnecessary-lambda-assignment

* consider-using-generator

* implicit-str-concat

* consider-using-generator

* consider-using-generator

* implicit-str-concat

* consider-using-generator

* disable unnecessary-lambda-assignment

* implicit-str-concat

* Update Documentation & Code Style

* implicit-str-concat

* Remove no-self-use

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 13:45:36 +02:00
bogdankostic
a617ab950b
Fix number of returned values in get_metadata_values_by_key (#2614)
* Apply pagination in get_metdata_values_by_key

* Update Documentation & Code Style

* Adapt test

* Fix test_eval.py by using pytest.approx

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-01 10:21:28 +02:00
tstadel
6b78990a38
Fix Pipeline.get_config() for forked pipelines (#2616)
* Fix Pipeline.get_config() for forked pipelines

* exclude root nodes

* minor quickfix
2022-05-31 21:26:53 +02:00
tstadel
0efad96e08
DC SDK: Add possibility to upload evaluation sets to DC (#2610)
* Add possibility to upload evaluation sets to DC

* fix test_eval sas comparisons

* quickwin docstring feedback changes

* Add hint about annotation tool and mark optional and required columns

* minor changes to docstrings
2022-05-31 17:08:19 +02:00
tstadel
fc25adf959
Create eval runs on deepset Cloud (#2534)
* add EvaluationRunClient

* Update Documentation & Code Style

* temporarily resolve names to ids

* Update Documentation & Code Style

* add delete and update methods

* minor fixes

* add experiments facade

* dummy implement start_run()

* start eval runs added

* Update Documentation & Code Style

* fix merge

* switch to names on api level

* add create eval_run test

* Update Documentation & Code Style

* further tests added

* update docstrings

* add docstrings

* add missing tags param, fix docstrings

* refactor _get_evaluation_sets

* fix mypy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-30 18:18:19 +02:00
bogdankostic
0395533a78
Add run_batch for standard pipelines (#2595)
* Add run_batch for standard pipelines

* Update Documentation & Code Style

* Fix mypy

* Remove code duplication

* Fix linter

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-27 10:42:48 +02:00
Julian Risch
b2a2c10fae
Update milvus installation instructions to v2 (#2598) 2022-05-25 17:22:04 +02:00
tstadel
dd8dc588b1
fix eval with context matching in table qa use cases (#2597) 2022-05-25 16:26:29 +02:00
tstadel
b6986ea25d
avoid empty api_endpoint (#2588) 2022-05-25 08:51:04 +02:00
tstadel
7caca41c5d
Support context matching in pipeline.eval() (#2482)
* calculate context pred metrics

* Update Documentation & Code Style

* extend doc_relevance_col values

* fix import order

* Update Documentation & Code Style

* fix mypy

* fix typings literal import

* add option for custom document_id_field

* Update Documentation & Code Style

* fix tests and dataframe col-order

* Update Documentation & Code Style

* rename content to context in eval dataframe

* add backward compatibility to EvaluationResult.load()

* Update Documentation & Code Style

* add docstrings

* Update Documentation & Code Style

* support sas

* Update Documentation & Code Style

* add answer_scope param

* Update Documentation & Code Style

* rework doc_relevance_col and keep document_id col in case of custom_document_id_field

* Update Documentation & Code Style

* improve docstrings

* Update Documentation & Code Style

* rename document_relevance_criterion into document_scope

* Update Documentation & Code Style

* add document_scope and answer_scope to print_eval_report

* support all new features in execute_eval_run()

* fix imports

* fix mypy

* Update Documentation & Code Style

* rename pred_label_sas_grid into pred_label_matrix

* update dataframe schema and sorting

* Update Documentation & Code Style

* pass through context_matching params and extend document_scope test

* Update Documentation & Code Style

* add answer_scope tests

* fix context_matching_threshold for document metrics

* shorten dataframe apply calls

* Update Documentation & Code Style

* fix queries getting lost if nothing was retrieved

* Update Documentation & Code Style

* Update Documentation & Code Style

* use document_id scopes

* Update Documentation & Code Style

* fix answer_scope literal

* Update Documentation & Code Style

* update the docs (lg changes)

* Update Documentation & Code Style

* update tutorial 5

* Update Documentation & Code Style

* fix tests

* Add minor lg updates

* final docstring changes

* fix single quotes in docstrings

* Update Documentation & Code Style

* dataframe scopes added for each column

* better docstrings for context_matching params

* Update Documentation & Code Style

* fix summarizer eval test

* Update Documentation & Code Style

* fix test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2022-05-24 18:11:52 +02:00
tstadel
a70c6a2d4f
Fix knn params for aws managed opensearch (#2581) 2022-05-24 18:10:05 +02:00
bogdankostic
1ab2b977c0
Fix crawler (#2591) 2022-05-24 12:34:31 +02:00
bogdankostic
867695ad0c
Change signature of queries param in batch methods (#2575)
* Change signature of queries param in batch methods

* Update Documentation & Code Style

* Fix mypy

* Remove unused import

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 12:33:45 +02:00
Julian Risch
075ed7fbcb
Remove encoding option from PDFToTextOCRConverter (#2553)
* remove encoding option from PDFToTextOCRConverter

* Update Documentation & Code Style

* add unused 'encoding' param to PDFToTextOCRConverter

* Update Documentation & Code Style

* call run instead of convert to use ligature replacing

* Update Documentation & Code Style

* add text to check installed poppler version

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 11:31:32 +02:00
Sara Zan
7ab0239e31
Do not copy _component_config in get_components_definitions (#2574)
* Do not deepcopy in get_components_definitions

* Update Documentation & Code Style

* comment

* unused import

* Add test to ensure env vars don't overwrite _component_config

* Update Documentation & Code Style

* Add test for get_config

* Add test to show the rename is not sufficient

* Update Documentation & Code Style

* copy only if it's strictly necessary

* Update Documentation & Code Style

* Apply suggestions from code review

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>

* review feedback

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-24 09:53:59 +02:00
dimitrisna
5bda63a6c0
Add training checkpoint in retriever trainer (#2543)
* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update dense.py

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 09:51:26 +02:00
Agnieszka Marzec
dd83f71a8f
Minor lg updates to doc strings (#2585)
* Minor lg updates to doc strings

* Update all models descriptions
2022-05-24 09:35:13 +02:00
Agnieszka Marzec
ebd54b225b
Update Ray pipeline docs with validation info (#2590)
* Update Ray pipeline docs

* Add Sara's suggestion

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 09:29:52 +02:00
tstadel
3ab4dac58d
Upload files to deepset Cloud (#2570)
* added upload_files

* Update Documentation & Code Style

* expose file client via DeepsetCloud facade

* Update Documentation & Code Style

* tests added

* Update Documentation & Code Style

* always read file in binary mode and guess mimetype

* add delete and list functions

* fix method literals

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-23 17:05:56 +02:00
tstadel
0e83535108
Show search endpoint after deepset Cloud deployment (#2569)
* show try-out-message after deployment

* better messages

* Update Documentation & Code Style

* tests added

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-23 14:19:31 +02:00
MichelBartels
16b0fdd804
Add DeBERTaV2/V3 support (#2097)
* add debertav2/v3

* update comments

* Apply Black

* assume support for fast deberta tokenizer

* Apply Black

* update required transformers version for deberta

* fix mismatched vocab error

* Update Documentation & Code Style

* update debertav2 doc string

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-23 09:55:14 +02:00
Massimiliano Pippi
a9a4156731
[Weaviate] Exit the while loop when we query less documents than available (#2537)
* exit the while loop when we query less documents than available in Weaviate

* use monkeypatch fixture, remove unused markers

* we know key is there, use brackets to get the value

* use custom exception

* add warning message when we hit the QUERY_MAXIMUM_RESULTS problem

* restore pytest marker

* removed unused import

* make the warning message more clear
2022-05-20 09:07:03 +02:00
Sara Zan
fd2ca359fe
Validation for Ray pipelines (#2545)
* Ray pipelines now validate

* Update Documentation & Code Style

* rename Ray pipeline in tests

* Add extras:ray to the test pipeline

* pylint

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-19 19:40:03 +02:00
Sara Zan
89bb1ca139
[CI refactoring] Improve autoformat.yml (#2556)
* Restructure autoformat to run a single script

* Reduce diff for autoforma.yml

* Reduce diff on linux_ci.yml
2022-05-18 20:02:43 +02:00
tstadel
f6e3a63906
Prevent losing names of utilized components when loaded from config (#2525)
* Prevent losing names of utilized components when loaded from config

* Update Documentation & Code Style

* update test

* fix failing tests

* Update Documentation & Code Style

* fix even more tests

* Update Documentation & Code Style

* incorporate review feedback

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-18 14:17:54 +02:00
tstadel
110b9c2b0a
Warnings for write operations of DeepsetCloudDocumentStore (#2565)
* log inputs to write operations

* Update Documentation & Code Style

* adjust tests

* simplify by using decorator for write operation functions

* Update Documentation & Code Style

* fix comma

* fix comma in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 17:53:55 +02:00
Stefano Fiorucci
686a19b35d
added launch_tika method (#2567)
* added launch_tika method

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 17:53:04 +02:00