75 Commits

Author SHA1 Message Date
Sara Zan
96a538b182
Pylint (import related warnings) and REST API improvements (#2326)
* remove duplicate imports

* fix ungrouped-imports

* Fix wrong-import-position

* Fix unused-import

* pyproject.toml

* Working on wrong-import-order

* Solve wrong-import-order

* fix Pool import

* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py

* remove Converter from modeling

* Fix mypy issues on adaptive_model.py

* create es_converter.py

* remove converter import

* change import path in tests

* Restructure REST API to not rely on global vars from search.apy and improve tests

* Fix openapi generator

* Move variable initialization

* Change type of FilterRequest.filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 16:41:05 +02:00
Sara Zan
5454d57bfa
Fix YAML pipeline paths in docker-compose.yml (#2335)
* Rename YAML files in docker-compose files

* Make read_pipeline_config_from_yaml fail on wrong path

* Validate indexing config in rest api

* Update Documentation & Code Style

* Add note about autocompletion of YAML

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 14:47:04 +01:00
Julian Risch
ac5617e757
Add basic telemetry features (#2314)
* add basic telemetry features

* change pipeline_config to _component_config

* Update Documentation & Code Style

* add super().__init__() calls to error classes

* make posthog mock work with python 3.7

* Update Documentation & Code Style

* update link to docs web page

* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)

* add comment on send_event in BaseComponent.init() and fix mypy

* mock NonPrivateParameters and fix pylint undefined-variable

* Update Documentation & Code Style

* check model path contains multiple /

* add test for writing to file

* add test for en-/disable telemetry

* Update Documentation & Code Style

* merge file deletion methods and ignore pylint global statement

* Update Documentation & Code Style

* set env variable in demo to activate telemetry

* fix mock of HAYSTACK_TELEMETRY_ENABLED

* fix mypy and linter

* add CI as env variable to execution contexts

* remove threading, add test for custom error event

* Update Documentation & Code Style

* simplify config/log file deletion

* add test for final event being sent

* force writing config file in test

* make test compatible with python 3.7

* switch to posthog production server

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 11:58:51 +01:00
Sara Zan
11cf94a965
Pipeline's YAML: syntax validation (#2226)
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes

* Make error composition work properly

* Clarify typing

* Help mypy a bit more

* Update Documentation & Code Style

* Enable autogenerated docs for Milvus1 and 2 separately

* Revert "Enable autogenerated docs for Milvus1 and 2 separately"

This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.

* Update Documentation & Code Style

* Re-enable 'additionalProperties: False'

* Add pipeline.type to JSON Schema, was somehow forgotten

* Disable additionalProperties on the pipeline properties too

* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)

* Cal super in PipelineValidationError

* Improve _read_pipeline_config_from_yaml's error handling

* Fix generate_json_schema.py to include document stores

* Fix json schemas (retro-fix 1.1.0 again)

* Improve custom errors printing, add link to docs

* Add function in BaseComponent to list its subclasses in a module

* Make some document stores base classes abstract

* Add marker 'integration' in pytest flags

* Slighly improve validation of pipelines at load

* Adding tests for YAML loading and validation

* Make custom_query Optional for validation issues

* Fix bug in _read_pipeline_config_from_yaml

* Improve error handling in BasePipeline and Pipeline and add DAG check

* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)

* Simplify errors slightly

* Add some YAML validation tests

* Remove load_from_config from BasePipeline, it was never used anyway

* Improve tests

* Include json-schemas in package

* Fix conftest imports

* Make BasePipeline abstract

* Improve mocking by making the test independent from the YAML version

* Add exportable_to_yaml decorator to forget about set_config on mock nodes

* Fix mypy errors

* Comment out one monkeypatch

* Fix typing again

* Improve error message for validation

* Add required properties to pipelines

* Fix YAML version for REST API YAMLs to 1.2.0

* Fix load_from_yaml call in load_from_deepset_cloud

* fix HaystackError.__getattr__

* Add super().__init__()in most nodes and docstore, comment set_config

* Remove type from REST API pipelines

* Remove useless init from doc2answers

* Call super in Seq3SeqGenerator

* Typo in deepsetcloud.py

* Fix rest api indexing error mismatch and mock version of JSON schema in all tests

* Working on pipeline tests

* Improve errors printing slightly

* Add back test_pipeline.yaml

* _json_schema.py supports different versions with identical schemas

* Add type to 0.7 schema for backwards compatibility

* Fix small bug in _json_schema.py

* Try alternative to generate json schemas on the CI

* Update Documentation & Code Style

* Make linux CI match autoformat CI

* Fix super-init-not-called

* Accidentally committed file

* Update Documentation & Code Style

* fix test_summarizer_translation.py's import

* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args

* Fix json schema for ray tests too

* Update Documentation & Code Style

* Reintroduce validation

* Usa unstable version in tests and rest api

* Make unstable support the latest versions

* Update Documentation & Code Style

* Remove needless fixture

* Make type in pipeline optional in the strings validation

* Fix schemas

* Fix string validation for pipeline type

* Improve validate_config_strings

* Remove type from test p[ipelines

* Update Documentation & Code Style

* Fix test_pipeline

* Removing more type from pipelines

* Temporary CI patc

* Fix issue with exportable_to_yaml never invoking the wrapped init

* rm stray file

* pipeline tests are green again

* Linux CI now needs .[all] to generate the schema

* Bugfixes, pipeline tests seems to be green

* Typo in version after merge

* Implement missing methods in Weaviate

* Trying to avoid FAISS tests from running in the Milvus1 test suite

* Fix some stray test paths and faiss index dumping

* Fix pytest markers list

* Temporarily disable cache to be able to see tests failures

* Fix pyproject.toml syntax

* Use only tmp_path

* Fix preprocessor signature after merge

* Fix faiss bug

* Fix Ray test

* Fix documentation issue by removing quotes from faiss type

* Update Documentation & Code Style

* use document properly in preprocessor tests

* Update Documentation & Code Style

* make preprocessor capable of handling documents

* import document

* Revert support for documents in preprocessor, do later

* Fix bug in _json_schema.py that was breaking validation

* re-enable cache

* Update Documentation & Code Style

* Simplify calling _json_schema.py from the CI

* Remove redundant ABC inheritance

* Ensure exportable_to_yaml works only on implementations

* Rename subclass to class_ in Meta

* Make run() and get_config() abstract in BasePipeline

* Revert unintended change in preprocessor

* Move outgoing_edges_input_node check inside try block

* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX

* Add check for a RecursionError on validate_config_strings

* Address usages of _pipeline_config in data silo and elasticsearch

* Rename _pipeline_config into _init_parameters

* Fix pytest marker and remove unused imports

* Remove most redundant ABCs

* Rename _init_parameters into _component_configuration

* Remove set_config and type from _component_configuration's dict

* Remove last instances of set_config and replace with super().__init__()

* Implement __init_subclass__ approach

* Simplify checks on the existence of _component_configuration

* Fix faiss issue

* Dynamic generation of node schemas & weed out old schemas

* Add debatable test

* Add docstring to debatable test

* Positive diff between schemas implemented

* Improve diff printing

* Rename REST API YAML files to trigger IDE validation

* Fix typing issues

* Fix more typing

* Typo in YAML filename

* Remove needless type:ignore

* Add tests

* Fix tests & validation feedback for accessory classes in custom nodes

* Refactor RAGeneratorType out

* Fix broken import in conftest

* Improve source error handling

* Remove unused import in test_eval.py breaking tests

* Fix changed error message in tests matches too

* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions

* Fix path to generate_openapi_specs.py in autoformat.yml

* Update Documentation & Code Style

* Add test for FAISSDocumentStore-like situations (superclass with init params)

* Update Documentation & Code Style

* Fix indentation

* Remove commented set_config

* Store model_name_or_path in FARMReader to use in DistillationDataSilo

* Rename _component_configuration into _component_config

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:17:26 +01:00
tstadel
f7a01624e0
Refactor Pipeline peripherals (#2253)
* move peripheral stuff to utils, add more and better tests

* Update Documentation & Code Style

* move config related peripherals to config module, fix tests

* Update Documentation & Code Style

* remove unnecessary list comprehensions

* apply ZanSara's feedback

* remove classes in pipeline utils

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-03 14:14:42 +01:00
tstadel
fe03ca70de
Fix Pipeline.components (#2215)
* add components property, improve get_document_store()

* Update Documentation & Code Style

* use pipeline.get_document_store() instead of retriever.document_store

* add tests

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-22 15:01:07 +01:00
Sara Zan
8de1aa3e43
Pylint: solve or silence locally rare warnings (#2170)
* Remove invalid-envvar-default and logging-too-many-args

* Remove import-self, access-member-before-definition and deprecated-argument

* Remove used-before-assignment by restructuring type import

* Remove unneeded-not

* Silence unnecessary-lambda (it's necessary)

* Remove pointless-string-statement

* Update Documentation & Code Style

* Silenced unsupported-membership-test (probably a real bug, can't fix though)

* Remove trailing-newlines

* Remove super-init-not-called and slience invalid-sequence-index (it's valid)

* Remove invalid-envvar-default in ui

* Remove some more warnings from pyproject.toml than actually solrted in code, CI will fail

* Linting all modules together is more readable

* Update Documentation & Code Style

* Typo in pylint disable comment

* Simplify long boolean statement

* Simplify init call in FAISS

* Fix inconsistent-return-statements

* Fix useless-super-delegation

* Fix useless-else-on-loop

* Fix another inconsistent-return-statements

* Move back pylint disable comment moved by black

* Fix consider-using-set-comprehension

* Fix another consider-using-set-comprehension

* Silence non-parent-init-called

* Update pylint exclusion list

* Update Documentation & Code Style

* Resolve unnecessary-else-after-break

* Fix superfluous-parens

* Fix no-else-break

* Remove is_correctly_retrieved along with its pylint issue

* Update exclusions list

* Silence constructor issue in squad_data.py (method is already broken)

* Fix too-many-return-statements

* Fix use-dict-literal

* Fix consider-using-from-import and useless-object-inheritance

* Update exclusion list

* Fix simplifiable-if-statements

* Fix one consider-using-dict-items

* Fix another consider-using-dict-items

* Fix a third consider-using-dict-items

* Fix last consider-using-dict-items

* Fix three use-a-generator

* Silence import errors on numba, tensorboardX and apex, but add comments & logs

* Fix couple of mypy issues

* Fix another typing issue

* Silence mypy, was conflicting with more meaningful pylint issue

* Fix no-else-continue

* Silence unsubscriptable-object and fix an import error with importlib.metadata

* Update Documentation & Code Style

* Fix all no-else-raise

* Update Documentation & Code Style

* Fix inverted parameters in simplified if switch

* Change [test] to [all] in some jobs (for typing and linting)

* Add comment in haystack/schema.py on pydantic's dataclasses

* Move comment from get_documents_by_id into _convert_weaviate_result_to_document in weaviate.py

* Add comment on pylint silencing

* Fix bug introduced rest_api/controller/search.py

* Update Documentation & Code Style

* Add ADR about Pydantic dataclasses

* Update pydantic-dataclasses.md

* Add link to Pydantic docs on Dataclasses

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 20:16:14 +01:00
Sara Zan
4e940be859
Allow Linux CI to push changes to forks (#2182)
* Add explicit reference to repo name to allow CI to push code back

* Run test matrix only on tested code changes

* Isolate the bot to check if it works

* Clarify situation with a comment

* Simplify autoformat.yml

* Add code and docs check

* Add git pull to make sure to fetch changes if they were created

* Add cache to autoformat.yml too

* Add information on forks in CONTRIBUTING.md

* Add a not about code quality tools in CONTRIBUTING.md

* Add image file types to the CI exclusion list

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 16:28:55 +01:00
Sara Zan
b0d82e9fbb
Update url in POST /file-uploads (#2193) 2022-02-16 13:06:05 +01:00
Sara Zan
13a9bc6a99
Fix bug on REST API for queries on empty document stores (#2161)
* Handle no answers and no documents scenarios in '_process_request'

* Fix tests

* Change return type in '_process_request'

* Return to use dicts

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 11:10:59 +01:00
Sara Zan
00795bd71e
Add type check for meta on REST API & add tests (#2184)
* Add type check for meta & add tests

* Improve tests

* Handle properly the ValueError ad an HTTPException

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-16 10:32:22 +01:00
Sara Zan
be8f50c9e3
Add DELETE /feedback for testing and make the label's id generate server-side (#2159)
* Add DELETE /feedback for testing and make the ID generate server-side

* Make sure to delete only user generated labels

* Reduce fixture scope, was too broad

* Make test a bit more generic

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-14 11:43:26 +01:00
Sara Zan
40328a57b6
Introduce pylint & other improvements on the CI (#2130)
* Make mypy check also ui and rest_api, fix ui

* Remove explicit type packages from extras, mypy now downloads them

* Make pylint and mypy run on every file except tests

* Rename tasks

* Change cache key

* Fix mypy errors in rest_api

* Normalize python versions to avoid cache misses

* Add all exclusions to make pylint pass

* Run mypy on rest_api and ui as well

* test if installing the package really changes outcome

* Comment out installation of packages

* Experiment: randomize tests

* Add fallback installation steps on cache misses

* Remove randomization

* Add comment on cache

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-09 18:27:12 +01:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
Kristof Herrmann
7764b6992c
DC SDK - load pipeline from deepset cloud (#2013)
* initial load_from_dc

* typo

* adjusted api endpoint

* removed kwargs

* added _load_from_dict

* refactor pipeline loading mechanism

* renaming load_from_dc api

* renaming

* fixed errors

* fix comments and environment variable overrides

* Add latest docstring and tutorial changes

* fix outdated YAML examples

* Add latest docstring and tutorial changes

* Introduce readonly DCDocumentStore (without labels support) (#1991)

* minimal DCDocumentStore

* support filters

* implement get_documents_by_id

* handle not existing documents

* add docstrings

* auth added

* add tests

* generate docs

* Add latest docstring and tutorial changes

* add responses to dev dependencies

* fix tests

* support query() and quey_by_embedding()

* Add latest docstring and tutorial changes

* query tests added

* read api_key and api_endpoint from env

* Add latest docstring and tutorial changes

* support query() and quey_by_embedding()

* query tests added

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* support dynamic similarity and return_embedding values

* Add latest docstring and tutorial changes

* adjust KeywordDocumentStore description

* refactoring

* Add latest docstring and tutorial changes

* implement get_document_count and raise on all not implemented methods

* Add latest docstring and tutorial changes

* don't use abbreviation DC in comments and errors

* Add latest docstring and tutorial changes

* docstring added to KeywordDocumentStore

* Add latest docstring and tutorial changes

* enhanced api key set

* split tests into two parts

* change setup.py in order to work around build cache

* added link

* Add latest docstring and tutorial changes

* rename DCDocumentStore to DeepsetCloudDocumentStore

* Add latest docstring and tutorial changes

* remove dc.py

* reinsert link to docs

* fix imports

* Add latest docstring and tutorial changes

* better test structure

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>

* introduce DeepsetCloudAdapter

* Add latest docstring and tutorial changes

* introduce DeepsetCloudClient

* Add latest docstring and tutorial changes

* use json api for pipeline_config

* indexing pipeline test added

* pseudo change to force cache eviction

* revert pseudo change to force cache eviction

* remove conftest duplicates

* minor formatting and docstring fixes

* fix tests when MOCK_DC=False

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-01-28 17:32:56 +01:00
Sara Zan
713771095b
Autogenerate OpenAPI specs file (#2047)
* Add docstrings to the REST API endpoint to have them included in the OpenAPI specs

* Attempt at make GitHub CI generate the OpenAPI specs

* Missing __init__.py was breaking rest_api import

* Add comment on dummy pipeline

* Create separate workflow file for the OpenAPI specs generation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
2022-01-27 13:06:01 +01:00
Julian Risch
5079c6847a
Convert doc embedding from ndarray to list of float for REST API (#1901)
* convert ndarray doc embedding to list of float

* check type of embedding of each doc individually

* Fix in case documents is None
2022-01-26 18:20:44 +01:00
Sara Zan
d470b9d0bd
Improve dependency management (#1994)
* Fist attempt at using setup.cfg for dependency management

* Trying the new package on the CI and in Docker too

* Add composite extras_require

* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports

* Fix bug on class import and rephrase error message

* Introduce typing for optional modules and add type: ignore in sparse.py

* Include importlib_metadata backport for py3.7

* Add colab group to extra_requires

* Fix pillow version

* Fix grpcio

* Separate out the crawler as another extra

* Make paths relative in rest_api and ui

* Update the test matrix in the CI

* Add try catch statements around the optional imports too to account for direct imports

* Never mix direct deps with self-references and add ES deps to the base install

* Refactor several paths in tests to make them insensitive to the execution path

* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix

* Wrap pdf conversion utils into safe_import

* Update some tutorials and rever Milvus1 as default for now, see #2067

* Fix mypy config


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 18:12:55 +01:00
Sara Zan
983b20f28d
Demo UI fix debug info (#1846)
* Fix debug info

* Make enter to run work better

* Reintroduce default question in the eval dataset

* Outputting valid json instead of a Python dict
2021-12-06 18:55:39 +01:00
Sara Zan
935689e630
Demo UI add env vars & other small fixes (#1828)
* Add more env vars to the streamlit ui

* Add some more questions to the random ones

* Relax a statuscode check and rename env vars

* Make query error message more descriptive

* Add log message

* Align docker-compose with and without GPU

* Typo in pipeline filename

* Remove prefix from var in docker_compose

* Align docker-compose.yml and add small sleep to the initialized poller to prevent spamming

* Fix the name of the dockerfile used to build the GPU image
2021-11-30 18:11:54 +01:00
Sara Zan
c29f960c47
Fix UI demo feedback (#1816)
* Fix the feedback function of the demo with a workaround

* Some docstring

* Update tests and rename methods in feedback.py

* Fix tests

* Remove operation_ids

* Add a couple of status code checks
2021-11-29 17:03:54 +01:00
Julian Risch
845905e418
ignore empty filters parameter (#1783)
* ignore empty filters parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-22 09:36:14 +01:00
Sara Zan
d81897535e
Public demo (#1747)
* Queries now run only when pressing RUN. File upload hidden. Question is not sent if the textbox is empty.

* Add latest docstring and tutorial changes

* Tidy up: remove needless state, add comments, fix minor bugs

* Had to add results to the status to avoid some bugs in eval mode

* Added 'credits'

* Add footers, update requirements, some random questions for the evaluation

* Add requested changes

* Temporary rollback the UI to the old GoT dataset

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-19 11:34:32 +01:00
Malte Pietsch
c0892717a0
Fix usage of filters in /query endpoint in REST API (#1774)
* WIP filter refactoring

* fix filter formatting

* remove inplace modification of filters
2021-11-18 18:13:03 +01:00
Malte Pietsch
b28dd823ef
Improve open api spec (#1700)
* improve open api spec

* move to automatic generation of better operationIDs
2021-11-11 09:40:58 +01:00
Julian Risch
c8df4763f8
disable file upload for InMemoryDocStore (#1677) 2021-11-01 10:39:13 +01:00
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
Sara Zan
96c05c34e4
Pipeline node names validation (#1601)
* Add node names validation

* Add tests

* Improve test and test that params exists before validating

* Fix the REST API

* Use minilm-uncased-squad2 instead of roberta-base-squad2

* Use roberta model for test_pipeline.yaml

* Turn off TOKENIZERS_PARALLELISM in generator tests (#1605)

* Account for non-targeted parameters

* Restore previous parameters handling in the rest api

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2021-10-19 15:22:44 +02:00
Malte Pietsch
3d58e81b5e
Switch from dataclass to pydantic dataclass & Fix Swagger API Docs (#1598)
* test pydantic dataclasses

* Add latest docstring and tutorial changes

* enable pydantic mypy plugin

* switch to pydentic dataclasses and implement custom to_json from_json

* clean up

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-18 14:38:14 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document, Answer, Label (#1398)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* renam label.question -> label.query. Allow sorting of Answers.

* WIP primitives

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* keep meta data of answers in reader.run()

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* make document=None for open domain labels

* add import

* fix print utils

* fix rest api

* adress review feedback

* Add latest docstring and tutorial changes

* fix mypy

Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Sara Zan
6354528336
Add /documents/get_by_filters endpoint (#1580)
* Add endpoint to get documents by filter

* Add test for /documents/get_by_filter and extend the delete documents test

* Add rest_api/file-upload to .gitignore

* Make sure the document store is empty for each test

* Improve docstrings of delete_documents_by_filters and get_documents_by_filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 10:53:54 +02:00
Sara Zan
3539e6b041
Fix circular import in the REST API (#1556)
* Fix circular import in the REST API

* remove unneeded import in test

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-04 21:18:23 +02:00
Sara Zan
af4a44fcbd
WIP Add rest api endpoint to delete documents by filter (#1546)
* Add rest api endpoint to delete documents by filter.

* Remove parametrization of rest api tests

* Make the paths in rest_api/config.py absolute

* Fix path to pipelines.yaml

* Restructuring test_rest_api.py to be able to test only my endpoint (and to make the suite more structured)

* Convert DELETE /documents into POST /documents/delete_by_filters

Co-authored by:  sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
2021-10-04 11:21:00 +02:00
Sara Zan
2de5385ac2
Add "API is loading" message in the UI (#1493)
* Create the /initialized endpoint

* Now showing an error message if the connection fails, and a 'Haystack is loading' message while workers are starting up

* Improve the appearance of the various messages

* Newline at the end of file
2021-09-27 16:40:25 +02:00
oryx1729
3deff26b60
Fix Search REST API when filters are None (#1431) 2021-09-10 14:47:34 +02:00
oryx1729
1f859694f1
Add support for Dense Retrievers in REST API Indexing Pipeline (#1430) 2021-09-10 11:53:32 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Malte Pietsch
2a226daac4
Add simple docs2answer node to allow FAQ style QA / Doc search in API (#1361)
* minimal docs2answer node

* enable logs again
2021-08-20 17:01:55 +02:00
Ikram Ali
29e140196b
[pipeline] Allow for batch indexing when using Pipelines fix #1168 (#1231)
* [pipeline] Allow for batch indexing when using Pipelines fix #1168

* [pipeline] Test case fixed fix #1168

* [file_converter] Path.suffix updated #1168

* [file_converter] meta can be one of these three cases:
                 A single dict that is applied to all files
                 One dict for each file being converted
                 None #1168

* [file_converter] mypy error fixed.

* [file_converter] mypy error fixed.

* [rest_api] batch file upload introduced in indexing API.

* [test_case] Test_api file upload parameter name updated.

* [ui] Streamlit file upload parameter updated.
2021-06-30 14:13:46 +02:00
Guillim
73a4f9825a
Add env var CONCURRENT_REQUEST_PER_WORKER (#1235)
* we create an env var `CONCURRENT_REQUEST_PER_WORKER` following your naming convention, (I came a few commit backwards to find the original name)

* default to 4
2021-06-29 07:44:25 +02:00
Malte Pietsch
2c964db62d
Relax typing for meta data in REST API (#1224) 2021-06-24 12:34:42 +02:00
Malte Pietsch
2caeea000e
Small UI and REST API fixes (#1223)
* small fixes

* change default question
2021-06-24 09:53:08 +02:00
oryx1729
afee4f36ce
Add scaffold for defining custom components for Pipelines (#1205) 2021-06-23 12:01:54 +02:00
Bhadresh Savani
37a72d2f45
Add File Upload Functionality in UI (#995) 2021-04-30 10:46:30 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
Malte Pietsch
0eaae3c0dd
Fix UI when API returns fewer answers than expected (#828)
* fix ui for few answers from api. add top_k_per_sample env

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-15 14:27:17 +01:00
Malte Pietsch
6798192d40
Add API endpoint to export accuracy metrics from user feedback + created_at timestamp (#803)
* WIP feedback metrics

* fix filters and zero division

* add created_at and model_name fields to labels

* add created_at value

* remove debug log level

* fix attribute init

* move timestamp creation down to docstore / db level

* fix import
2021-02-15 10:48:59 +01:00
Tanay Soni
f95b70df38
Fix file upload API (#808) 2021-02-05 12:17:38 +01:00
Malte Pietsch
e9b5439b00
Rename label id field for elastic & add UPDATE_EXISTING_DOCUMENTS to API config (#728)
* rename label id field for elastic

* add UPDATE_EXISTING_DOCUMENTS param to API config
2021-01-12 13:00:56 +01:00
Malte Pietsch
fcc052b554
Pass custom label index name in api config (#724) 2021-01-11 12:24:09 +01:00