haystack/test/test_weaviate.py
Sara Zan 11cf94a965
Pipeline's YAML: syntax validation (#2226)
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes

* Make error composition work properly

* Clarify typing

* Help mypy a bit more

* Update Documentation & Code Style

* Enable autogenerated docs for Milvus1 and 2 separately

* Revert "Enable autogenerated docs for Milvus1 and 2 separately"

This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.

* Update Documentation & Code Style

* Re-enable 'additionalProperties: False'

* Add pipeline.type to JSON Schema, was somehow forgotten

* Disable additionalProperties on the pipeline properties too

* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)

* Cal super in PipelineValidationError

* Improve _read_pipeline_config_from_yaml's error handling

* Fix generate_json_schema.py to include document stores

* Fix json schemas (retro-fix 1.1.0 again)

* Improve custom errors printing, add link to docs

* Add function in BaseComponent to list its subclasses in a module

* Make some document stores base classes abstract

* Add marker 'integration' in pytest flags

* Slighly improve validation of pipelines at load

* Adding tests for YAML loading and validation

* Make custom_query Optional for validation issues

* Fix bug in _read_pipeline_config_from_yaml

* Improve error handling in BasePipeline and Pipeline and add DAG check

* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)

* Simplify errors slightly

* Add some YAML validation tests

* Remove load_from_config from BasePipeline, it was never used anyway

* Improve tests

* Include json-schemas in package

* Fix conftest imports

* Make BasePipeline abstract

* Improve mocking by making the test independent from the YAML version

* Add exportable_to_yaml decorator to forget about set_config on mock nodes

* Fix mypy errors

* Comment out one monkeypatch

* Fix typing again

* Improve error message for validation

* Add required properties to pipelines

* Fix YAML version for REST API YAMLs to 1.2.0

* Fix load_from_yaml call in load_from_deepset_cloud

* fix HaystackError.__getattr__

* Add super().__init__()in most nodes and docstore, comment set_config

* Remove type from REST API pipelines

* Remove useless init from doc2answers

* Call super in Seq3SeqGenerator

* Typo in deepsetcloud.py

* Fix rest api indexing error mismatch and mock version of JSON schema in all tests

* Working on pipeline tests

* Improve errors printing slightly

* Add back test_pipeline.yaml

* _json_schema.py supports different versions with identical schemas

* Add type to 0.7 schema for backwards compatibility

* Fix small bug in _json_schema.py

* Try alternative to generate json schemas on the CI

* Update Documentation & Code Style

* Make linux CI match autoformat CI

* Fix super-init-not-called

* Accidentally committed file

* Update Documentation & Code Style

* fix test_summarizer_translation.py's import

* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args

* Fix json schema for ray tests too

* Update Documentation & Code Style

* Reintroduce validation

* Usa unstable version in tests and rest api

* Make unstable support the latest versions

* Update Documentation & Code Style

* Remove needless fixture

* Make type in pipeline optional in the strings validation

* Fix schemas

* Fix string validation for pipeline type

* Improve validate_config_strings

* Remove type from test p[ipelines

* Update Documentation & Code Style

* Fix test_pipeline

* Removing more type from pipelines

* Temporary CI patc

* Fix issue with exportable_to_yaml never invoking the wrapped init

* rm stray file

* pipeline tests are green again

* Linux CI now needs .[all] to generate the schema

* Bugfixes, pipeline tests seems to be green

* Typo in version after merge

* Implement missing methods in Weaviate

* Trying to avoid FAISS tests from running in the Milvus1 test suite

* Fix some stray test paths and faiss index dumping

* Fix pytest markers list

* Temporarily disable cache to be able to see tests failures

* Fix pyproject.toml syntax

* Use only tmp_path

* Fix preprocessor signature after merge

* Fix faiss bug

* Fix Ray test

* Fix documentation issue by removing quotes from faiss type

* Update Documentation & Code Style

* use document properly in preprocessor tests

* Update Documentation & Code Style

* make preprocessor capable of handling documents

* import document

* Revert support for documents in preprocessor, do later

* Fix bug in _json_schema.py that was breaking validation

* re-enable cache

* Update Documentation & Code Style

* Simplify calling _json_schema.py from the CI

* Remove redundant ABC inheritance

* Ensure exportable_to_yaml works only on implementations

* Rename subclass to class_ in Meta

* Make run() and get_config() abstract in BasePipeline

* Revert unintended change in preprocessor

* Move outgoing_edges_input_node check inside try block

* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX

* Add check for a RecursionError on validate_config_strings

* Address usages of _pipeline_config in data silo and elasticsearch

* Rename _pipeline_config into _init_parameters

* Fix pytest marker and remove unused imports

* Remove most redundant ABCs

* Rename _init_parameters into _component_configuration

* Remove set_config and type from _component_configuration's dict

* Remove last instances of set_config and replace with super().__init__()

* Implement __init_subclass__ approach

* Simplify checks on the existence of _component_configuration

* Fix faiss issue

* Dynamic generation of node schemas & weed out old schemas

* Add debatable test

* Add docstring to debatable test

* Positive diff between schemas implemented

* Improve diff printing

* Rename REST API YAML files to trigger IDE validation

* Fix typing issues

* Fix more typing

* Typo in YAML filename

* Remove needless type:ignore

* Add tests

* Fix tests & validation feedback for accessory classes in custom nodes

* Refactor RAGeneratorType out

* Fix broken import in conftest

* Improve source error handling

* Remove unused import in test_eval.py breaking tests

* Fix changed error message in tests matches too

* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions

* Fix path to generate_openapi_specs.py in autoformat.yml

* Update Documentation & Code Style

* Add test for FAISSDocumentStore-like situations (superclass with init params)

* Update Documentation & Code Style

* Fix indentation

* Remove commented set_config

* Store model_name_or_path in FARMReader to use in DistillationDataSilo

* Rename _component_configuration into _component_config

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:17:26 +01:00

108 lines
3.8 KiB
Python

import numpy as np
import pytest
from haystack.schema import Document
from .conftest import get_document_store
import uuid
embedding_dim = 768
def get_uuid():
return str(uuid.uuid4())
DOCUMENTS = [
{"content": "text1", "id": "not a correct uuid", "key": "a"},
{"content": "text2", "id": get_uuid(), "key": "b", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
{"content": "text3", "id": get_uuid(), "key": "b", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
{"content": "text4", "id": get_uuid(), "key": "b", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
{"content": "text5", "id": get_uuid(), "key": "b", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
]
DOCUMENTS_XS = [
# current "dict" format for a document
{
"content": "My name is Carla and I live in Berlin",
"id": get_uuid(),
"meta": {"metafield": "test1", "name": "filename1"},
"embedding": np.random.rand(embedding_dim).astype(np.float32),
},
# meta_field at the top level for backward compatibility
{
"content": "My name is Paul and I live in New York",
"id": get_uuid(),
"metafield": "test2",
"name": "filename2",
"embedding": np.random.rand(embedding_dim).astype(np.float32),
},
# Document object for a doc
Document(
content="My name is Christelle and I live in Paris",
id=get_uuid(),
meta={"metafield": "test3", "name": "filename3"},
embedding=np.random.rand(embedding_dim).astype(np.float32),
),
]
@pytest.fixture(params=["weaviate"])
def document_store_with_docs(request, tmp_path):
document_store = get_document_store(request.param, tmp_path=tmp_path)
document_store.write_documents(DOCUMENTS_XS)
yield document_store
document_store.delete_documents()
@pytest.fixture(params=["weaviate"])
def document_store(request, tmp_path):
document_store = get_document_store(request.param, tmp_path=tmp_path)
yield document_store
document_store.delete_documents()
@pytest.mark.weaviate
@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
@pytest.mark.parametrize("batch_size", [2])
def test_weaviate_write_docs(document_store, batch_size):
# Write in small batches
for i in range(0, len(DOCUMENTS), batch_size):
document_store.write_documents(DOCUMENTS[i : i + batch_size])
documents_indexed = document_store.get_all_documents()
assert len(documents_indexed) == len(DOCUMENTS)
documents_indexed = document_store.get_all_documents(batch_size=batch_size)
assert len(documents_indexed) == len(DOCUMENTS)
@pytest.mark.weaviate
@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
def test_query_by_embedding(document_store_with_docs):
docs = document_store_with_docs.query_by_embedding(np.random.rand(embedding_dim).astype(np.float32))
assert len(docs) == 3
docs = document_store_with_docs.query_by_embedding(np.random.rand(embedding_dim).astype(np.float32), top_k=1)
assert len(docs) == 1
docs = document_store_with_docs.query_by_embedding(
np.random.rand(embedding_dim).astype(np.float32), filters={"name": ["filename2"]}
)
assert len(docs) == 1
@pytest.mark.weaviate
@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
def test_query(document_store_with_docs):
query_text = "My name is Carla and I live in Berlin"
with pytest.raises(Exception):
docs = document_store_with_docs.query(query_text)
docs = document_store_with_docs.query(filters={"name": ["filename2"]})
assert len(docs) == 1
docs = document_store_with_docs.query(filters={"content": [query_text.lower()]})
assert len(docs) == 1
docs = document_store_with_docs.query(filters={"content": ["live"]})
assert len(docs) == 3