542 Commits

Author SHA1 Message Date
Sara Zan
11cf94a965
Pipeline's YAML: syntax validation (#2226)
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes

* Make error composition work properly

* Clarify typing

* Help mypy a bit more

* Update Documentation & Code Style

* Enable autogenerated docs for Milvus1 and 2 separately

* Revert "Enable autogenerated docs for Milvus1 and 2 separately"

This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.

* Update Documentation & Code Style

* Re-enable 'additionalProperties: False'

* Add pipeline.type to JSON Schema, was somehow forgotten

* Disable additionalProperties on the pipeline properties too

* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)

* Cal super in PipelineValidationError

* Improve _read_pipeline_config_from_yaml's error handling

* Fix generate_json_schema.py to include document stores

* Fix json schemas (retro-fix 1.1.0 again)

* Improve custom errors printing, add link to docs

* Add function in BaseComponent to list its subclasses in a module

* Make some document stores base classes abstract

* Add marker 'integration' in pytest flags

* Slighly improve validation of pipelines at load

* Adding tests for YAML loading and validation

* Make custom_query Optional for validation issues

* Fix bug in _read_pipeline_config_from_yaml

* Improve error handling in BasePipeline and Pipeline and add DAG check

* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)

* Simplify errors slightly

* Add some YAML validation tests

* Remove load_from_config from BasePipeline, it was never used anyway

* Improve tests

* Include json-schemas in package

* Fix conftest imports

* Make BasePipeline abstract

* Improve mocking by making the test independent from the YAML version

* Add exportable_to_yaml decorator to forget about set_config on mock nodes

* Fix mypy errors

* Comment out one monkeypatch

* Fix typing again

* Improve error message for validation

* Add required properties to pipelines

* Fix YAML version for REST API YAMLs to 1.2.0

* Fix load_from_yaml call in load_from_deepset_cloud

* fix HaystackError.__getattr__

* Add super().__init__()in most nodes and docstore, comment set_config

* Remove type from REST API pipelines

* Remove useless init from doc2answers

* Call super in Seq3SeqGenerator

* Typo in deepsetcloud.py

* Fix rest api indexing error mismatch and mock version of JSON schema in all tests

* Working on pipeline tests

* Improve errors printing slightly

* Add back test_pipeline.yaml

* _json_schema.py supports different versions with identical schemas

* Add type to 0.7 schema for backwards compatibility

* Fix small bug in _json_schema.py

* Try alternative to generate json schemas on the CI

* Update Documentation & Code Style

* Make linux CI match autoformat CI

* Fix super-init-not-called

* Accidentally committed file

* Update Documentation & Code Style

* fix test_summarizer_translation.py's import

* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args

* Fix json schema for ray tests too

* Update Documentation & Code Style

* Reintroduce validation

* Usa unstable version in tests and rest api

* Make unstable support the latest versions

* Update Documentation & Code Style

* Remove needless fixture

* Make type in pipeline optional in the strings validation

* Fix schemas

* Fix string validation for pipeline type

* Improve validate_config_strings

* Remove type from test p[ipelines

* Update Documentation & Code Style

* Fix test_pipeline

* Removing more type from pipelines

* Temporary CI patc

* Fix issue with exportable_to_yaml never invoking the wrapped init

* rm stray file

* pipeline tests are green again

* Linux CI now needs .[all] to generate the schema

* Bugfixes, pipeline tests seems to be green

* Typo in version after merge

* Implement missing methods in Weaviate

* Trying to avoid FAISS tests from running in the Milvus1 test suite

* Fix some stray test paths and faiss index dumping

* Fix pytest markers list

* Temporarily disable cache to be able to see tests failures

* Fix pyproject.toml syntax

* Use only tmp_path

* Fix preprocessor signature after merge

* Fix faiss bug

* Fix Ray test

* Fix documentation issue by removing quotes from faiss type

* Update Documentation & Code Style

* use document properly in preprocessor tests

* Update Documentation & Code Style

* make preprocessor capable of handling documents

* import document

* Revert support for documents in preprocessor, do later

* Fix bug in _json_schema.py that was breaking validation

* re-enable cache

* Update Documentation & Code Style

* Simplify calling _json_schema.py from the CI

* Remove redundant ABC inheritance

* Ensure exportable_to_yaml works only on implementations

* Rename subclass to class_ in Meta

* Make run() and get_config() abstract in BasePipeline

* Revert unintended change in preprocessor

* Move outgoing_edges_input_node check inside try block

* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX

* Add check for a RecursionError on validate_config_strings

* Address usages of _pipeline_config in data silo and elasticsearch

* Rename _pipeline_config into _init_parameters

* Fix pytest marker and remove unused imports

* Remove most redundant ABCs

* Rename _init_parameters into _component_configuration

* Remove set_config and type from _component_configuration's dict

* Remove last instances of set_config and replace with super().__init__()

* Implement __init_subclass__ approach

* Simplify checks on the existence of _component_configuration

* Fix faiss issue

* Dynamic generation of node schemas & weed out old schemas

* Add debatable test

* Add docstring to debatable test

* Positive diff between schemas implemented

* Improve diff printing

* Rename REST API YAML files to trigger IDE validation

* Fix typing issues

* Fix more typing

* Typo in YAML filename

* Remove needless type:ignore

* Add tests

* Fix tests & validation feedback for accessory classes in custom nodes

* Refactor RAGeneratorType out

* Fix broken import in conftest

* Improve source error handling

* Remove unused import in test_eval.py breaking tests

* Fix changed error message in tests matches too

* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions

* Fix path to generate_openapi_specs.py in autoformat.yml

* Update Documentation & Code Style

* Add test for FAISSDocumentStore-like situations (superclass with init params)

* Update Documentation & Code Style

* Fix indentation

* Remove commented set_config

* Store model_name_or_path in FARMReader to use in DistillationDataSilo

* Rename _component_configuration into _component_config

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:17:26 +01:00
Sara Zan
982ec4435e
Make windows CI more resistant to cache misses (#2263) 2022-03-10 15:11:34 +01:00
Sara Zan
18a6545055
Create milvus2 containers outside of haystack/ (#2300) 2022-03-10 14:55:15 +01:00
tstadel
dde9d59271
fix pip backtracking issue (#2281)
* fix pip backtracking issue

* restrict azure-core version

* Remove the trailing comma

* Add skip_magic_trailing_comma in pyproject.toml for pydoc compatibility

* Pin pydoc-markdown _again_

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-07 19:25:33 +01:00
MichelBartels
2c423ba063
Introduce support for pymilvus>=2.0.0 (#2126)
* update remaining occurences of get_connection

* fix milvus2 import and fix wrong extra references

* change MilvusDocumentStore to Milvus1DocumentStore

* update milvus docstrings to reflect updated dependency management

* enable milvus 2 tests

* fix milvus2 env variable processing

* fix dropping collections for each milvus 2 test

* make Milvus 2 doc store tests work

* allow user to specify consistency level

* Fist attempt at running Milvus2 in the CI

* Install the correct pymilvus

* add batch deletion for milvus2

* change default from milvus 1 to milvus 2

* make milvus2 the default in the docstores extra

* Switch milvus1 and milvus2 in base test run on CI

* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'

* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py

* Enable autogenerated docs for Milvus1 and 2 separately

* Partial fix to docstring of Milvus2DocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-24 17:43:38 +01:00
Sara Zan
15c70bdb9f
Generate haystack-pipeline-1.2.0.schema.json (#2239)
* Trigger generation of the json schema for 1.2.0

* Remove path filters for `autoformat.yml`

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-24 11:45:21 +01:00
Sara Zan
d1b7761504
Generate JSON schema index for Schemastore (#2225)
* Generate JSON schema index

* Add index file

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 09:29:44 +01:00
Sara Zan
8de1aa3e43
Pylint: solve or silence locally rare warnings (#2170)
* Remove invalid-envvar-default and logging-too-many-args

* Remove import-self, access-member-before-definition and deprecated-argument

* Remove used-before-assignment by restructuring type import

* Remove unneeded-not

* Silence unnecessary-lambda (it's necessary)

* Remove pointless-string-statement

* Update Documentation & Code Style

* Silenced unsupported-membership-test (probably a real bug, can't fix though)

* Remove trailing-newlines

* Remove super-init-not-called and slience invalid-sequence-index (it's valid)

* Remove invalid-envvar-default in ui

* Remove some more warnings from pyproject.toml than actually solrted in code, CI will fail

* Linting all modules together is more readable

* Update Documentation & Code Style

* Typo in pylint disable comment

* Simplify long boolean statement

* Simplify init call in FAISS

* Fix inconsistent-return-statements

* Fix useless-super-delegation

* Fix useless-else-on-loop

* Fix another inconsistent-return-statements

* Move back pylint disable comment moved by black

* Fix consider-using-set-comprehension

* Fix another consider-using-set-comprehension

* Silence non-parent-init-called

* Update pylint exclusion list

* Update Documentation & Code Style

* Resolve unnecessary-else-after-break

* Fix superfluous-parens

* Fix no-else-break

* Remove is_correctly_retrieved along with its pylint issue

* Update exclusions list

* Silence constructor issue in squad_data.py (method is already broken)

* Fix too-many-return-statements

* Fix use-dict-literal

* Fix consider-using-from-import and useless-object-inheritance

* Update exclusion list

* Fix simplifiable-if-statements

* Fix one consider-using-dict-items

* Fix another consider-using-dict-items

* Fix a third consider-using-dict-items

* Fix last consider-using-dict-items

* Fix three use-a-generator

* Silence import errors on numba, tensorboardX and apex, but add comments & logs

* Fix couple of mypy issues

* Fix another typing issue

* Silence mypy, was conflicting with more meaningful pylint issue

* Fix no-else-continue

* Silence unsubscriptable-object and fix an import error with importlib.metadata

* Update Documentation & Code Style

* Fix all no-else-raise

* Update Documentation & Code Style

* Fix inverted parameters in simplified if switch

* Change [test] to [all] in some jobs (for typing and linting)

* Add comment in haystack/schema.py on pydantic's dataclasses

* Move comment from get_documents_by_id into _convert_weaviate_result_to_document in weaviate.py

* Add comment on pylint silencing

* Fix bug introduced rest_api/controller/search.py

* Update Documentation & Code Style

* Add ADR about Pydantic dataclasses

* Update pydantic-dataclasses.md

* Add link to Pydantic docs on Dataclasses

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 20:16:14 +01:00
Kristof Herrmann
9096ddad34
Adding a minimal haystack gpu build (#2185)
* added minimal gpu image

* Update Documentation & Code Style

* removed old installations

* build minimal images in gh action

* quotes to single quotes

* switched repos

* fix

* fix ordering

* move to deepset dockerhub acc

* update nvidia/cuda image to match newest torch+cu

* refactor Dockerfile-GPU-minimal to optimize build

* Remove spurious doc changes from tutorial6

* install ocr dependencies and pdftotext

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Ivan Lopez <ivan@askai.net>
2022-02-21 13:34:44 +01:00
Sara Zan
2a840ee248
YAML versioning (#2209)
* Make YAML files get the same version as Haystack and throw warning at load in case of mismatch

* Update version of most YAMLs in the codebase (aesthethic chamge, only to avoid the warning).

* Remove quotes from version in tests

* Fix version in generate_json_schema.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 12:22:37 +01:00
Sara Zan
abc1057869
Disable autoformat.yml on master (#2198)
* disable autoformat.yml on master

* Add a note in CONTRIBUTING.md about branch name
2022-02-16 16:58:12 +01:00
Sara Zan
4e940be859
Allow Linux CI to push changes to forks (#2182)
* Add explicit reference to repo name to allow CI to push code back

* Run test matrix only on tested code changes

* Isolate the bot to check if it works

* Clarify situation with a comment

* Simplify autoformat.yml

* Add code and docs check

* Add git pull to make sure to fetch changes if they were created

* Add cache to autoformat.yml too

* Add information on forks in CONTRIBUTING.md

* Add a not about code quality tools in CONTRIBUTING.md

* Add image file types to the CI exclusion list

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 16:28:55 +01:00
tstadel
1bdd1f48fd
Fix windows ci tests (#2144)
* move commandline args to global conftest

* correct test exclude paths

* Update Documentation & Code Style

* exclude test_generator_pipeline_with_translator from windows ci

* exclude further oom tests

* enable log_cli

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 21:29:05 +01:00
Sara Zan
40328a57b6
Introduce pylint & other improvements on the CI (#2130)
* Make mypy check also ui and rest_api, fix ui

* Remove explicit type packages from extras, mypy now downloads them

* Make pylint and mypy run on every file except tests

* Rename tasks

* Change cache key

* Fix mypy errors in rest_api

* Normalize python versions to avoid cache misses

* Add all exclusions to make pylint pass

* Run mypy on rest_api and ui as well

* test if installing the package really changes outcome

* Comment out installation of packages

* Experiment: randomize tests

* Add fallback installation steps on cache misses

* Remove randomization

* Add comment on cache

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 18:27:12 +01:00
Sara Zan
9dc89d2bd2
Fix dependency related build issues in Dockerfiles (#2135)
* Fix a path issue in Dockerfile-GPU

* Fix paths in Dockerfile-GPU

* Add workflow_dispatch to docker build task

* Remove reference to optional component from ui/, not needed anymore

* Move pytorch installation last to avoid replacing it later

* Remove optional import from rest_api too, no more needed

* Change path in ui/Dockerfile

* ui container works again

* Complete review of import paths

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 17:35:18 +01:00
Sara Zan
a095aea21e
Reintroduce push on master trigger for Linux CI (#2127)
* Reintroduce push on master trigger with Linux CI

* Reintroduce trigger for freshly opened PRs too
2022-02-04 18:06:23 +01:00
Sara Zan
957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
Sara Zan
767f0025c6
Make ui and rest proper packages (#2098)
* Adding simple setup.py to ui/ and rest_api and remove respective extras from main setup.cfg

* Make 'pip install rest_api/' fetch the local Haystack instead of downloading from pypi

* Add some comments to the new setup.py files and fix the Dockerfiles

* Add version info to 'farm-haystack-ui'

* Fix the OpenAPI Specs workflow

* Install rest_api and ui properly on the CI too

* Make the workflow see changes on every setup file

* Fix workflow cache keys

* Add license to rest_api and ui
2022-02-02 16:14:12 +01:00
Sara Zan
009c89fc53
Revert "Make the docstring bot work only on master" (#2114)
* Revert "Make the docstring bot work only on master (#2078)"

This reverts commit 649d07405770cd59696d0120107a3b2f0aafe7c2.

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-02 16:08:34 +01:00
Sebastián Ramírez
3c768071d5
Add JSON Schema autogeneration for Pipeline YAML files (#2020)
* 🎨 Update type annotations to allow their extraction for JSON Schema

*  Add main script doing all the work to generate the JSON Schema

*  Add GitHub Action dependency to generate JSON Schema

*  Update JSON Schema generation script to allow easily generating the schema without making a PR

* 👷 Add GitHub Action to generate JSON Schema

* 💚 Fix CI GitHub Action

* 💚 Update GitHub Action environment variables

*  Add initial JSON Schema

* Add latest docstring and tutorial changes

* 🐛 Do not allow extra params not defined in each model

* ♻️ Make any additional properties invalid

*  Make other additional properties invalid in all the levels in pipelines

* ♻️ Do not include Base classes as possible nodes

* 🍱 Update JSON Schema

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-02 15:00:41 +01:00
Sara Zan
649d074057
Make the docstring bot work only on master (#2078) 2022-02-01 14:09:55 +01:00
Sara Zan
07cf3c614a
Disable cache on the CI (#2083)
* Disable cache on the CI

* Reintroduce paths

* Add most files to the cache key

* remove date and path from cache key

* Try double install with cache

* Try to cache more stuff, on a per-commit basis

* Fix windows CI too

* Add comment on how to speed up the CI with better caching
2022-01-28 17:21:23 +01:00
tstadel
1b1e44e771
install haystack in editable mode for ci (#2082) 2022-01-28 09:59:28 +01:00
Sara Zan
713771095b
Autogenerate OpenAPI specs file (#2047)
* Add docstrings to the REST API endpoint to have them included in the OpenAPI specs

* Attempt at make GitHub CI generate the OpenAPI specs

* Missing __init__.py was breaking rest_api import

* Add comment on dummy pipeline

* Create separate workflow file for the OpenAPI specs generation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2022-01-27 13:06:01 +01:00
Sara Zan
3c02aa50d0
Remove run_docker_gpu.sh (#2003)
* Remove run_docker_gpu.sh

* remove shell formatting check from CI

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2022-01-27 12:20:43 +01:00
Sara Zan
d470b9d0bd
Improve dependency management (#1994)
* Fist attempt at using setup.cfg for dependency management

* Trying the new package on the CI and in Docker too

* Add composite extras_require

* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports

* Fix bug on class import and rephrase error message

* Introduce typing for optional modules and add type: ignore in sparse.py

* Include importlib_metadata backport for py3.7

* Add colab group to extra_requires

* Fix pillow version

* Fix grpcio

* Separate out the crawler as another extra

* Make paths relative in rest_api and ui

* Update the test matrix in the CI

* Add try catch statements around the optional imports too to account for direct imports

* Never mix direct deps with self-references and add ES deps to the base install

* Refactor several paths in tests to make them insensitive to the execution path

* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix

* Wrap pdf conversion utils into safe_import

* Update some tutorials and rever Milvus1 as default for now, see #2067

* Fix mypy config


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 18:12:55 +01:00
oryx1729
cb881b6fa9
Disable pip cache for Dockerfiles (#2015) 2022-01-19 10:26:17 +01:00
oryx1729
854af92dc5
Update docker_build.yml 2022-01-04 17:46:34 +01:00
oryx1729
2910f67718
Use long Commit ID for Docker tags (#1946) 2022-01-04 17:39:49 +01:00
oryx1729
00c823cdff
Add GitHub Action for Docker Build for GPU (#1916) 2022-01-04 14:33:13 +01:00
bogdankostic
39573cf0a9
Add ParsrConverter (#1931)
* Add ParsrConverter

* Fix typing error + add Parsr to Linux CI

* Fix valid_language for all converters + fix context generation for ParsrConverter

* Remove ParsrConverter test from WindowsCI

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-12-30 10:15:11 +01:00
bogdankostic
74c80e0c71
Set mypy version to 0.910 (#1899) 2021-12-16 14:02:04 +01:00
bogdankostic
4edec04c2c
Add improvements to AzureConverter (#1896)
* Add some improvements to AzureConverter

* Adapt docstring + use Path instead of str

* Fix mypy version to 0.910
2021-12-16 12:45:24 +01:00
Ivan Lopez
86f5688f47
fix wrong branch and repo, add cloudwatch agent (#1877) 2021-12-13 20:32:25 +01:00
Sara Zan
de71b944d7
Fix typo in the Windows CI UI deps (#1876)
* Fix typo in the WindowsCI UI deps

* Force a deps cache miss
2021-12-13 15:49:44 +01:00
Fabrice Depaulis
77d52ad215
Rely api healthcheck on status code rather than json decoding (#1871)
* Rely api healthcheck on status code rather than json decoding

* Install UI dependencies on the Linux and Windows CI

Co-authored-by: Fabrice Depaulis <fabrice.depaulis@orange.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2021-12-10 18:05:23 +01:00
Ivan Lopez
4f6dc36869
Deploy demo (#1837)
* Add GH Actions workflow for demo deployment

* update demo ec2 instance type

* remove redundant docker-compose build

* add custom demo command and env vars

* deploy demo on updates to workflow resources
2021-12-03 15:58:47 +01:00
Malte Pietsch
90ced1b246
Update release.yml 2021-12-03 13:23:55 +01:00
Malte Pietsch
e5599bd337
Extend categories for release notes (#1841) 2021-12-03 13:19:45 +01:00
Malte Pietsch
4e76129004
Add config for github release notes (#1840) 2021-12-03 12:27:58 +01:00
bogdankostic
a19a9f548b
Upgrade torch to v1.10.0 (#1789)
* Upgrade torch to v1.10.0

* Adapt torch version for torch-scatter in TableQA tutorial

* Add latest docstring and tutorial changes

* Make torch version more flexible

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-23 11:49:46 +01:00
tstadel
0021668394
exclude test_summarizer_translation.py for windows_ci (#1759) 2021-11-16 10:13:16 +01:00
tstadel
956d5bba43
Split summarizer tests in order to make windows CI work again (#1757)
* separate testfile for summarizer with translation

* Add latest docstring and tutorial changes

* import SPLIT_DOCS from test_summarizer

* add workflow_dispatch to windows_ci

* add worflow_dispatch to linux_ci

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-15 18:49:49 +01:00
Julian Risch
892ce4a760
Make weaviate more compliant to other doc stores (UUIDs and dummy embedddings) (#1656)
* create uuid and dummy embeddding in weaviate doc store

* handle and test for duplicate non-uuid-formatted ids in weaviate

* add uuid and dummy embedding to doc strings

* Add latest docstring and tutorial changes

* Upgrade weaviate

* Include weaviate in common doc store test cases

* Add latest docstring and tutorial changes

* Exclude weaviate doc store from eval tests

* Incorporate index name in uuid generation

* Ignore mypy error

* Fix typo

* Restore DOCS without uuid and embeddings generated by weaviate

* Supply docs for retriever tests as fixture

* Limit scope of fixture to function instead of session

* Add comments

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-04 09:27:12 +01:00
Lalit Pagaria
e5b4b62d75
Add CI for windows runner (#1458)
* Feat: Removing use of temp file while downloading archive from url along with adding CI for windows and mac platform

* Windows CI by default installing pytorch gpu hence updating CI to pick cpu version

* fixing mac cache build issue

* updating windows pip install command for torch

* another attempt

* updating ci

* Adding sudo

* fixing ls failure on windows

* another attempt to fix build issue

* Saving env variable of test files

* Adding debug log

* Github action differ on windows

* adding debug

* anohter attempt

* Windows have different ways to receive env

* fixing template

* minor fx

* Adding debug

* Removing use of json

* Adding back fromJson

* addin toJson

* removing print

* anohter attempt

* disabling parallel run at least for testing

* installing docker for mac runner

* correcting docker install command

* Linux dockers are not suported in windows

* Removing mac changes

* Upgrading pytorch

* using lts pytorch

* Separating win and ubuntu

* Install java 11

* enabling linux container env

* docker cli command

* docker cli command

* start elastic service

* List all service

* correcting service name

* Attempt to fix multiple test run

* convert to json

* another attempt to check

* Updating build cache step

* attempt

* Add tika

* Separating windows CI

* Changing CI name

* Skipping test which does not work in windows

* Skipping tests for windows

* create cleanup function in conftest

* adding skipif marker on tests

* Run windows PR on only push to master

* Addressing review comments

* Enabling windows ci for this PR

* Tika init is being called when importing tika function

* handling tika import issue

* handling tika import issue in test

* Fixing import issue

* removing tika fixure

* Removing fixture from tests

* Disable windows ci on pull request

* Add back extra pytorch install step

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-29 10:22:28 +02:00
Malte Pietsch
3d58e81b5e
Switch from dataclass to pydantic dataclass & Fix Swagger API Docs (#1598)
* test pydantic dataclasses

* Add latest docstring and tutorial changes

* enable pydantic mypy plugin

* switch to pydentic dataclasses and implement custom to_json from_json

* clean up

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-18 14:38:14 +02:00
bogdankostic
655d721371
Add Table Reader (#1446)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* Add first draft of TableReader

* renam label.question -> label.query. Allow sorting of Answers.

* Add calculation of answer scores

* WIP primitives

* Adapt input and output to new primitives

* Add doc strings

* Add tests

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* Adapt answer format

* Add latest docstring and tutorial changes

* keep meta data of answers in reader.run()

* Fix mypy

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* Allow inference on GPU

* Remove automatic aggregation

* Add automatic aggregation

* Add latest docstring and tutorial changes

* Add torch-scatter dependency

* Add wheel to torch-scatter dependency

* Fix requirements

* Fix requirements

* Fix requirements

* Adapt setup.py to allow for wheels

* Fix requirements

* Fix requirements

* Add type hints and code snippet

* Add latest docstring and tutorial changes

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 16:34:48 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document, Answer, Label (#1398)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* renam label.question -> label.query. Allow sorting of Answers.

* WIP primitives

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* keep meta data of answers in reader.run()

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* make document=None for open domain labels

* add import

* fix print utils

* fix rest api

* adress review feedback

* Add latest docstring and tutorial changes

* fix mypy

Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Markus Paff
5b1b875374
fixed workflow conflict with introducing new one (#1472) 2021-09-17 23:44:45 +02:00