Massimiliano Pippi
bb729ab95f
wait for postgres to be ready before data migrations ( #2654 )
2022-06-10 19:30:57 +02:00
Sara Zan
54518ac790
[CI Refactoring] Refactor Document
fixtures in tests ( #2577 )
...
* Refactor document fixtures
* Add embedding files
* Update Documentation & Code Style
* Indentation issue
* Update Documentation & Code Style
* Fix type conversion in conftest.py
* Update Documentation & Code Style
* mypy on sql.py
* mypy on crawler.py
* mypy on pinecone.py
* Adapt retriever tests
* Update Documentation & Code Style
* mypy on crawler.py
* Update Documentation & Code Style
* mypy on crawler.py again
* Update Documentation & Code Style
* mypy fix was too rough
* Fix some more tests
* Update Documentation & Code Style
* Skip meaningless test on FilterRetriever
* Make embedding values less specific
* Update Documentation & Code Style
* Use stable IDs in retriever tests that depend on it
* Remove needless fixtures
* docs_with_ids
* Update Documentation & Code Style
* Typo
* Fix retriever tests
* Fix reader tests
* Update Documentation & Code Style
* Workaround #2626
* Update Documentation & Code Style
* Fix label generator tests
* Reorder vectors
* remove print
* Update Documentation & Code Style
* Update Documentation & Code Style
* git tags leftover
* Update Documentation & Code Style
* fix last failing test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 18:22:48 +02:00
Sara Zan
8d7439c623
Move autoformat-check.yml into tests.yml ( #2635 )
2022-06-10 18:22:16 +02:00
Sara Zan
e5423b1515
Fix markers in GPL tests ( #2652 )
2022-06-10 06:42:19 -04:00
Sara Zan
33a51fa915
[CI Refactoring] Move unrelated tests out of test_pipeline.py
( #2573 )
...
* move unrelated tests out of test_pipeline.py
* Update Documentation & Code Style
* fix fixture name
* Typo
* Make sure all docs are Documents in routedocuments tests
* Fix tests
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 11:45:13 +02:00
Vladimir Blagojevic
b13c32eb9c
Add GPL API docs, unit tests update ( #2634 )
...
* Update test_label_generator.py
* GPL increase default batch size to 16
* GPL - API docs
* GPL - split unit tests
* Make devs aware of multilingual GPL
* Create separate train/save test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 05:25:28 -04:00
Agnieszka Marzec
f90649fab1
Update docstrings for GPL ( #2633 )
...
* Update docstrings
* Update Documentation & Code Style
* Update wrong param description
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 10:04:06 +02:00
Stefano Fiorucci
c178f60e3a
Make crawler extract also hidden text ( #2642 )
...
* make crawler extract also hidden text
* Update Documentation & Code Style
* try to adapt test for extract_hidden_text
* Update Documentation & Code Style
* fix test bug
* fix bug in test
* added test for hidden text"
* Update Documentation & Code Style
* fix bug in test
* Update Documentation & Code Style
* fix test
* Update Documentation & Code Style
* fix other test bug
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 09:51:41 +02:00
tstadel
c8f9e1b76c
Create target folder if not exists in EvalResult.save() ( #2647 )
...
* Create target folder if not exists in EvalResult.save()
* log out dir
2022-06-09 19:26:12 +02:00
Sara Zan
9968c373d2
make 'ready for review' an event that triggers the tests ( #2643 )
2022-06-09 09:23:38 +02:00
tstadel
293a3b53d2
Fix params being changed during pipeline.eval() ( #2638 )
2022-06-08 19:43:09 +02:00
Massimiliano Pippi
374155fd5c
Move Opensearch document store in its own module ( #2603 )
...
* move OpenSearchDocumentStore into its own Python module
* Update Documentation & Code Style
* mark test with (sigh) elasticsearch
* skip opensearch tests on windows
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-08 16:37:23 +02:00
tstadel
df6ebeb087
Do not show success message on failed evalset upload ( #2639 )
...
* Do not show success message on failed evalset upload
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-08 08:31:25 +02:00
Sara Zan
c17969e001
Fix failing Crawler
test ( #2640 )
...
* Make tests insensntive to ordering of crawled pages
* fix docstring
2022-06-07 18:14:43 +02:00
Sara Zan
c2d2faf31e
Add directive in tests.yml
( #2637 )
2022-06-07 13:31:19 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring ( #2576 )
...
* Unify CI tests (from #2466 )
* Update Documentation & Code Style
* Change folder names
* Fix markers list
* Remove marker 'slow', replaced with 'integration'
* Soften children check
* Start ES first so it has time to boot while Python is setup
* Run the full workflow
* Try to make pip upgrade on Windows
* Set KG tests as integration
* Update Documentation & Code Style
* typo
* faster pylint
* Make Pylint use the cache
* filter diff files for pylint
* debug pylint statement
* revert pylint changes
* Remove path from asserted log (fails on Windows)
* Skip preprocessor test on Windows
* Tackling Windows specific failures
* Fix pytest command for windows suites
* Remove \ from command
* Move poppler test into integration
* Skip opensearch test on windows
* Add tolerance in reader sas score for Windows
* Another pytorch approx
* Raise time limit for unit tests :(
* Skip poppler test on Windows CI
* Specify to pull with FF only in docs check
* temporarily run the docs check immediately
* Allow merge commit for now
* Try without fetch depth
* Accelerating test
* Accelerating test
* Add repository and ref alongside fetch-depth
* Separate out code&docs check from tests
* Use setup-python cache
* Delete custom action
* Remove the pull step in the docs check, will find a way to run on bot commits
* Add requirements.txt in .github for caching
* Actually install dependencies
* Change deps group for pylint
* Unclear why the requirements.txt is still required :/
* Fix the code check python setup
* Install all deps for pylint
* Make the autoformat check depend on tests and doc updates workflows
* Try installing dependencies in another order
* Try again to install the deps
* quoting the paths
* Ad back the requirements
* Try again to install rest_api and ui
* Change deps group
* Duplicate haystack install line
* See if the cache is the problem
* Disable also in mypy, who knows
* split the install step
* Split install step everywhere
* Revert "Separate out code&docs check from tests"
This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.
* Add back the action
* Proactive support for audio (see text2speech branch)
* Fix label generator tests
* Remove install of libsndfile1 on win temporarily
* exclude audio tests on win
* install ffmpeg for integration tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00
Sara Zan
83648b9bc0
[CI refactoring] Rewrite Crawler
tests ( #2557 )
...
* Rewrite crawler tests (very slow) and fix small crawler bug
* Update Documentation & Code Style
* compile the regex only once
* Factor out the html files & add content check to most tests
* Clarify that even starting URLs can be excluded
* Update Documentation & Code Style
* Change signature
* Fix failing test
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-06 17:52:37 +02:00
bogdankostic
0a4477d315
Fix streamlit version to <1.10 in UI dependencies ( #2630 )
...
* Trigger code-and-docs-check
* Upgrade azure-ai-formrecognizer to 3.2.0b4
* Revert "Upgrade azure-ai-formrecognizer to 3.2.0b4"
This reverts commit 21c3fc7e9b79b94143fb2d6009544a5cae9cf560.
* Fix streamlit version to <1.10
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-03 15:01:00 +02:00
Ryan Russell
c1b7948e10
Improve Docs Readability ( #2617 )
...
Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-06-03 09:57:40 +02:00
Julian Risch
3c6fcc3e42
Bump version to next release candidate ( #2627 )
...
* bump version to next release candidate
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 18:58:44 +02:00
Julian Risch
4ca331c0a7
Bump version to v1.5.0 and copy docs folder ( #2625 )
...
* bump version to v1.5.0 and copy docs folder
* Update Documentation & Code Style
* update links to v1.5.0
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.5.0
2022-06-02 17:20:42 +02:00
Vladimir Blagojevic
e10a3fba74
Add Generative Pseudo Labeling ( #2388 )
2022-06-02 10:12:47 -04:00
bogdankostic
61d9429c25
Simplify loading of EmbeddingRetriever
( #2619 )
...
* Infer model format for EmbeddingRetriever automatically
* Update Documentation & Code Style
* Adapt conftest to automatic inference of model_format
* Update Documentation & Code Style
* Fix tests
* Update Documentation & Code Style
* Fix tests
* Adapt tutorials
* Update Documentation & Code Style
* Add test for similarity scores with sentence transformers
* Adapt doc string and warning message
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 15:05:29 +02:00
Sara Zan
ca19521c25
Fix new PyLint errors ( #2624 )
...
* unnecessary-lambda-assignment
* consider-using-generator
* implicit-str-concat
* consider-using-generator
* consider-using-generator
* implicit-str-concat
* consider-using-generator
* disable unnecessary-lambda-assignment
* implicit-str-concat
* Update Documentation & Code Style
* implicit-str-concat
* Remove no-self-use
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 13:45:36 +02:00
bogdankostic
a617ab950b
Fix number of returned values in get_metadata_values_by_key
( #2614 )
...
* Apply pagination in get_metdata_values_by_key
* Update Documentation & Code Style
* Adapt test
* Fix test_eval.py by using pytest.approx
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-01 10:21:28 +02:00
tstadel
6b78990a38
Fix Pipeline.get_config() for forked pipelines ( #2616 )
...
* Fix Pipeline.get_config() for forked pipelines
* exclude root nodes
* minor quickfix
2022-05-31 21:26:53 +02:00
tstadel
0efad96e08
DC SDK: Add possibility to upload evaluation sets to DC ( #2610 )
...
* Add possibility to upload evaluation sets to DC
* fix test_eval sas comparisons
* quickwin docstring feedback changes
* Add hint about annotation tool and mark optional and required columns
* minor changes to docstrings
2022-05-31 17:08:19 +02:00
tstadel
fc25adf959
Create eval runs on deepset Cloud ( #2534 )
...
* add EvaluationRunClient
* Update Documentation & Code Style
* temporarily resolve names to ids
* Update Documentation & Code Style
* add delete and update methods
* minor fixes
* add experiments facade
* dummy implement start_run()
* start eval runs added
* Update Documentation & Code Style
* fix merge
* switch to names on api level
* add create eval_run test
* Update Documentation & Code Style
* further tests added
* update docstrings
* add docstrings
* add missing tags param, fix docstrings
* refactor _get_evaluation_sets
* fix mypy
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-30 18:18:19 +02:00
bogdankostic
0395533a78
Add run_batch
for standard pipelines ( #2595 )
...
* Add run_batch for standard pipelines
* Update Documentation & Code Style
* Fix mypy
* Remove code duplication
* Fix linter
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-27 10:42:48 +02:00
Julian Risch
b2a2c10fae
Update milvus installation instructions to v2 ( #2598 )
2022-05-25 17:22:04 +02:00
tstadel
dd8dc588b1
fix eval with context matching in table qa use cases ( #2597 )
2022-05-25 16:26:29 +02:00
tstadel
b6986ea25d
avoid empty api_endpoint ( #2588 )
2022-05-25 08:51:04 +02:00
tstadel
7caca41c5d
Support context matching in pipeline.eval()
( #2482 )
...
* calculate context pred metrics
* Update Documentation & Code Style
* extend doc_relevance_col values
* fix import order
* Update Documentation & Code Style
* fix mypy
* fix typings literal import
* add option for custom document_id_field
* Update Documentation & Code Style
* fix tests and dataframe col-order
* Update Documentation & Code Style
* rename content to context in eval dataframe
* add backward compatibility to EvaluationResult.load()
* Update Documentation & Code Style
* add docstrings
* Update Documentation & Code Style
* support sas
* Update Documentation & Code Style
* add answer_scope param
* Update Documentation & Code Style
* rework doc_relevance_col and keep document_id col in case of custom_document_id_field
* Update Documentation & Code Style
* improve docstrings
* Update Documentation & Code Style
* rename document_relevance_criterion into document_scope
* Update Documentation & Code Style
* add document_scope and answer_scope to print_eval_report
* support all new features in execute_eval_run()
* fix imports
* fix mypy
* Update Documentation & Code Style
* rename pred_label_sas_grid into pred_label_matrix
* update dataframe schema and sorting
* Update Documentation & Code Style
* pass through context_matching params and extend document_scope test
* Update Documentation & Code Style
* add answer_scope tests
* fix context_matching_threshold for document metrics
* shorten dataframe apply calls
* Update Documentation & Code Style
* fix queries getting lost if nothing was retrieved
* Update Documentation & Code Style
* Update Documentation & Code Style
* use document_id scopes
* Update Documentation & Code Style
* fix answer_scope literal
* Update Documentation & Code Style
* update the docs (lg changes)
* Update Documentation & Code Style
* update tutorial 5
* Update Documentation & Code Style
* fix tests
* Add minor lg updates
* final docstring changes
* fix single quotes in docstrings
* Update Documentation & Code Style
* dataframe scopes added for each column
* better docstrings for context_matching params
* Update Documentation & Code Style
* fix summarizer eval test
* Update Documentation & Code Style
* fix test
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2022-05-24 18:11:52 +02:00
tstadel
a70c6a2d4f
Fix knn params for aws managed opensearch ( #2581 )
2022-05-24 18:10:05 +02:00
bogdankostic
1ab2b977c0
Fix crawler ( #2591 )
2022-05-24 12:34:31 +02:00
bogdankostic
867695ad0c
Change signature of queries param in batch methods ( #2575 )
...
* Change signature of queries param in batch methods
* Update Documentation & Code Style
* Fix mypy
* Remove unused import
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 12:33:45 +02:00
Julian Risch
075ed7fbcb
Remove encoding option from PDFToTextOCRConverter ( #2553 )
...
* remove encoding option from PDFToTextOCRConverter
* Update Documentation & Code Style
* add unused 'encoding' param to PDFToTextOCRConverter
* Update Documentation & Code Style
* call run instead of convert to use ligature replacing
* Update Documentation & Code Style
* add text to check installed poppler version
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 11:31:32 +02:00
Sara Zan
7ab0239e31
Do not copy _component_config
in get_components_definitions
( #2574 )
...
* Do not deepcopy in get_components_definitions
* Update Documentation & Code Style
* comment
* unused import
* Add test to ensure env vars don't overwrite _component_config
* Update Documentation & Code Style
* Add test for get_config
* Add test to show the rename is not sufficient
* Update Documentation & Code Style
* copy only if it's strictly necessary
* Update Documentation & Code Style
* Apply suggestions from code review
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* review feedback
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-24 09:53:59 +02:00
dimitrisna
5bda63a6c0
Add training checkpoint in retriever trainer ( #2543 )
...
* Update dense.py
* Update dense.py
* Update dense.py
* Update dense.py
* Update dense.py
* Update dense.py
* Update dense.py
* Update dense.py
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 09:51:26 +02:00
Agnieszka Marzec
dd83f71a8f
Minor lg updates to doc strings ( #2585 )
...
* Minor lg updates to doc strings
* Update all models descriptions
2022-05-24 09:35:13 +02:00
Agnieszka Marzec
ebd54b225b
Update Ray pipeline docs with validation info ( #2590 )
...
* Update Ray pipeline docs
* Add Sara's suggestion
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 09:29:52 +02:00
tstadel
3ab4dac58d
Upload files to deepset Cloud ( #2570 )
...
* added upload_files
* Update Documentation & Code Style
* expose file client via DeepsetCloud facade
* Update Documentation & Code Style
* tests added
* Update Documentation & Code Style
* always read file in binary mode and guess mimetype
* add delete and list functions
* fix method literals
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-23 17:05:56 +02:00
tstadel
0e83535108
Show search endpoint after deepset Cloud deployment ( #2569 )
...
* show try-out-message after deployment
* better messages
* Update Documentation & Code Style
* tests added
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-23 14:19:31 +02:00
MichelBartels
16b0fdd804
Add DeBERTaV2/V3 support ( #2097 )
...
* add debertav2/v3
* update comments
* Apply Black
* assume support for fast deberta tokenizer
* Apply Black
* update required transformers version for deberta
* fix mismatched vocab error
* Update Documentation & Code Style
* update debertav2 doc string
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-23 09:55:14 +02:00
Massimiliano Pippi
a9a4156731
[Weaviate] Exit the while loop when we query less documents than available ( #2537 )
...
* exit the while loop when we query less documents than available in Weaviate
* use monkeypatch fixture, remove unused markers
* we know key is there, use brackets to get the value
* use custom exception
* add warning message when we hit the QUERY_MAXIMUM_RESULTS problem
* restore pytest marker
* removed unused import
* make the warning message more clear
2022-05-20 09:07:03 +02:00
Sara Zan
fd2ca359fe
Validation for Ray pipelines ( #2545 )
...
* Ray pipelines now validate
* Update Documentation & Code Style
* rename Ray pipeline in tests
* Add extras:ray to the test pipeline
* pylint
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-19 19:40:03 +02:00
Sara Zan
89bb1ca139
[CI refactoring] Improve autoformat.yml
( #2556 )
...
* Restructure autoformat to run a single script
* Reduce diff for autoforma.yml
* Reduce diff on linux_ci.yml
2022-05-18 20:02:43 +02:00
tstadel
f6e3a63906
Prevent losing names of utilized components when loaded from config ( #2525 )
...
* Prevent losing names of utilized components when loaded from config
* Update Documentation & Code Style
* update test
* fix failing tests
* Update Documentation & Code Style
* fix even more tests
* Update Documentation & Code Style
* incorporate review feedback
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-18 14:17:54 +02:00
tstadel
110b9c2b0a
Warnings for write operations of DeepsetCloudDocumentStore
( #2565 )
...
* log inputs to write operations
* Update Documentation & Code Style
* adjust tests
* simplify by using decorator for write operation functions
* Update Documentation & Code Style
* fix comma
* fix comma in test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 17:53:55 +02:00
Stefano Fiorucci
686a19b35d
added launch_tika method ( #2567 )
...
* added launch_tika method
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-17 17:53:04 +02:00