Make weaviate more compliant to other doc stores (UUIDs and dummy embedddings) (#1656)

* create uuid and dummy embeddding in weaviate doc store * handle and test for duplicate non-uuid-formatted ids in weaviate * add uuid and dummy embedding to doc strings * Add latest docstring and tutorial changes * Upgrade weaviate * Include weaviate in common doc store test cases * Add latest docstring and tutorial changes * Exclude weaviate doc store from eval tests * Incorporate index name in uuid generation * Ignore mypy error * Fix typo * Restore DOCS without uuid and embeddings generated by weaviate * Supply docs for retriever tests as fixture * Limit scope of fixture to function instead of session * Add comments Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-01-05 19:47:45 +00:00 · 2021-11-04 09:27:12 +01:00 · 2021-11-04 09:27:12 +01:00 · 892ce4a760
commit 892ce4a760
parent 4ca1937775
12 changed files with 267 additions and 410 deletions
--- a/.github/workflows/linux_ci.yml
+++ b/.github/workflows/linux_ci.yml
@ -78,7 +78,7 @@ jobs:
      run: docker run -d -p 19530:19530 -p 19121:19121 milvusdb/milvus:1.1.0-cpu-d050721-5e559c

    - name: Run Weaviate
-      run: docker run -d -p 8080:8080 --name haystack_test_weaviate --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.0
+      run: docker run -d -p 8080:8080 --name haystack_test_weaviate --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.2

    - name: Run GraphDB
      run: docker run -d -p 7200:7200 --name haystack_test_graphdb deepset/graphdb-free:9.4.1-adoptopenjdk11
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -33,7 +33,7 @@ You can launch them like this:
 ```
 docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms128m -Xmx128m" elasticsearch:7.9.2
 docker run -d -p 19530:19530 -p 19121:19121 milvusdb/milvus:1.1.0-cpu-d050721-5e559c
-docker run -d -p 8080:8080 --name haystack_test_weaviate --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.0
+docker run -d -p 8080:8080 --name haystack_test_weaviate --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.2
 docker run -d -p 7200:7200 --name haystack_test_graphdb deepset/graphdb-free:9.4.1-adoptopenjdk11
 docker run -d -p 9998:9998 -e "TIKA_CHILD_JAVA_OPTS=-JXms128m" -e "TIKA_CHILD_JAVA_OPTS=-JXmx128m" apache/tika:1.24.1
 ```
--- a/docs/_src/api/api/document_store.md
+++ b/docs/_src/api/api/document_store.md
@ -1677,7 +1677,8 @@ Weaviate is a cloud-native, modular, real-time vector search engine built to sca
 Some of the key differences in contrast to FAISS & Milvus:
 1. Stores everything in one place: documents, meta data and vectors - so less network overhead when scaling this up
 2. Allows combination of vector search and scalar filtering, i.e. you can filter for a certain tag and do dense retrieval on that subset 
-3. Has less variety of ANN algorithms, as of now only HNSW.  
+3. Has less variety of ANN algorithms, as of now only HNSW.
+4. Requires document ids to be in uuid-format. If wrongly formatted ids are provided at indexing time they will be replaced with uuids automatically.

 Weaviate python client is used to connect to the server, more details are here
 https://weaviate-python-client.readthedocs.io/en/docs/weaviate.html
@ -1735,7 +1736,7 @@ The current implementation is not supporting the storage of labels, so you canno
 | get_document_by_id(id: str, index: Optional[str] = None) -> Optional[Document]
 ```

-Fetch a document by specifying its text id string
+Fetch a document by specifying its uuid string

 <a name="weaviate.WeaviateDocumentStore.get_documents_by_id"></a>
 #### get\_documents\_by\_id
@ -1744,7 +1745,7 @@ Fetch a document by specifying its text id string
 | get_documents_by_id(ids: List[str], index: Optional[str] = None, batch_size: int = 10_000) -> List[Document]
 ```

-Fetch documents by specifying a list of text id strings.
+Fetch documents by specifying a list of uuid strings.

 <a name="weaviate.WeaviateDocumentStore.write_documents"></a>
 #### write\_documents
@ -1757,8 +1758,7 @@ Add new documents to the DocumentStore.

 **Arguments**:

- `documents`: List of `Dicts` or List of `Documents`. Passing an Embedding/Vector is mandatory in case weaviate is not
-                configured with a module. If a module is configured, the embedding is automatically generated by Weaviate.
+- `documents`: List of `Dicts` or List of `Documents`. A dummy embedding vector for each document is automatically generated if it is not provided. The document id needs to be in uuid format. Otherwise a correctly formatted uuid will be automatically generated based on the provided id.
 - `index`: index name for storing the docs and metadata
 - `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
 - `duplicate_documents`: Handle duplicates document based on parameter options.
@ -1785,6 +1785,15 @@ None

 Update the metadata dictionary of a document by specifying its string id.

+<a name="weaviate.WeaviateDocumentStore.get_embedding_count"></a>
+#### get\_embedding\_count
+
+```python
+ | get_embedding_count(filters: Optional[Dict[str, List[str]]] = None, index: Optional[str] = None) -> int
+```
+
+Return the number of embeddings in the document store, which is the same as the number of documents since every document has a default embedding
+
 <a name="weaviate.WeaviateDocumentStore.get_document_count"></a>
 #### get\_document\_count

--- a/docs/_src/usage/usage/document_store.md
+++ b/docs/_src/usage/usage/document_store.md
@ -130,7 +130,7 @@ document_store = SQLDocumentStore()
 The `WeaviateDocumentStore` requires a running Weaviate Server. 
 You can start a basic instance like this (see the [Weaviate docs](https://www.semi.technology/developers/weaviate/current/) for details):
 ```
-    docker run -d -p 8080:8080 --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.0
+    docker run -d -p 8080:8080 --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.2
 ```
  
 Afterwards, you can use it in Haystack:
--- a/haystack/document_stores/weaviate.py
+++ b/haystack/document_stores/weaviate.py
@ -1,3 +1,6 @@
+import hashlib
+import re
+import uuid
 from typing import Dict, Generator, List, Optional, Union

 import logging
@ -13,6 +16,7 @@ from weaviate import ObjectsBatchRequest


 logger = logging.getLogger(__name__)
+UUID_PATTERN = re.compile(r'^[\da-f]{8}-([\da-f]{4}-){3}[\da-f]{12}$', re.IGNORECASE)


 class WeaviateDocumentStore(BaseDocumentStore):
@ -24,7 +28,8 @@ class WeaviateDocumentStore(BaseDocumentStore):
    Some of the key differences in contrast to FAISS & Milvus:
    1. Stores everything in one place: documents, meta data and vectors - so less network overhead when scaling this up
    2. Allows combination of vector search and scalar filtering, i.e. you can filter for a certain tag and do dense retrieval on that subset 
-    3. Has less variety of ANN algorithms, as of now only HNSW.  
+    3. Has less variety of ANN algorithms, as of now only HNSW.
+    4. Requires document ids to be in uuid-format. If wrongly formatted ids are provided at indexing time they will be replaced with uuids automatically.

    Weaviate python client is used to connect to the server, more details are here
    https://weaviate-python-client.readthedocs.io/en/docs/weaviate.html
@ -120,7 +125,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
                f"Initial connection to Weaviate failed. Make sure you run Weaviate instance "
                f"at `{weaviate_url}` and that it has finished the initial ramp up (can take > 30s)."
            )
-        self.index = index
+        self.index = self._sanitize_index_name(index)
        self.embedding_dim = embedding_dim
        self.content_field = content_field
        self.name_field = name_field
@ -133,6 +138,15 @@ class WeaviateDocumentStore(BaseDocumentStore):
        self.duplicate_documents = duplicate_documents

        self._create_schema_and_index_if_not_exist(self.index)
+        self.uuid_format_warning_raised = False
+
+    def _sanitize_index_name(self, index: Optional[str]) -> Optional[str]:
+        if index is None:
+            return None
+        elif "_" in index:
+            return ''.join(x.capitalize() for x in index.split('_'))
+        else:
+            return index[0].upper() + index[1:]

    def _create_schema_and_index_if_not_exist(
        self,
@ -142,7 +156,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        Create a new index (schema/class in Weaviate) for storing documents in case if an 
        index (schema) with the name doesn't exist already.
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index

        if self.custom_schema:
            schema = self.custom_schema
@ -239,7 +253,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        }

    def get_document_by_id(self, id: str, index: Optional[str] = None) -> Optional[Document]:
-        """Fetch a document by specifying its text id string"""
+        """Fetch a document by specifying its uuid string"""
        # Sample result dict from a get method
        '''{'class': 'Document',
         'creationTimeUnix': 1621075584724,
@ -248,8 +262,11 @@ class WeaviateDocumentStore(BaseDocumentStore):
          'name': 'name_5',
          'content': 'text_5'},
         'vector': []}'''
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
        document = None
+
+        id = self._sanitize_id(id=id, index=index)
+
        result = self.weaviate_client.data_object.get_by_id(id, with_vector=True)
        if result:
            document = self._convert_weaviate_result_to_document(result, return_embedding=True)
@ -258,23 +275,40 @@ class WeaviateDocumentStore(BaseDocumentStore):
    def get_documents_by_id(self, ids: List[str], index: Optional[str] = None,
                            batch_size: int = 10_000) -> List[Document]:
        """
-        Fetch documents by specifying a list of text id strings.
+        Fetch documents by specifying a list of uuid strings.
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
        documents = []
        #TODO: better implementation with multiple where filters instead of chatty call below?
        for id in ids:
+            id = self._sanitize_id(id=id, index=index)
            result = self.weaviate_client.data_object.get_by_id(id, with_vector=True)
            if result:
                document = self._convert_weaviate_result_to_document(result, return_embedding=True)
                documents.append(document)
        return documents

+    def _sanitize_id(self, id: str, index: Optional[str] = None) -> str:
+        """
+        Generate a valid uuid if the provided id is not in uuid format.
+        Two documents with the same provided id and index name will get the same uuid.
+        """
+        index = self._sanitize_index_name(index) or self.index
+        if not UUID_PATTERN.match(id):
+            hashed_id = hashlib.sha256((id+index).encode('utf-8')) #type: ignore
+            generated_uuid = str(uuid.UUID(hashed_id.hexdigest()[::2]))
+            if not self.uuid_format_warning_raised:
+                logger.warning(
+                    f"Document id {id} is not in uuid format. Such ids will be replaced by uuids, in this case {generated_uuid}.")
+                self.uuid_format_warning_raised = True
+            id = generated_uuid
+        return id
+
    def _get_current_properties(self, index: Optional[str] = None) -> List[str]:
        """
        Get all the existing properties in the schema.
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
        cur_properties = []
        for class_item in self.weaviate_client.schema.get()['classes']:
            if class_item['class'] == index:
@ -309,7 +343,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        """
        Updates the schema with a new property.
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
        property_dict = {
            "dataType": [
                "string"
@ -331,8 +365,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        """
        Add new documents to the DocumentStore.

-        :param documents: List of `Dicts` or List of `Documents`. Passing an Embedding/Vector is mandatory in case weaviate is not
-                        configured with a module. If a module is configured, the embedding is automatically generated by Weaviate.
+        :param documents: List of `Dicts` or List of `Documents`. A dummy embedding vector for each document is automatically generated if it is not provided. The document id needs to be in uuid format. Otherwise a correctly formatted uuid will be automatically generated based on the provided id.
        :param index: index name for storing the docs and metadata
        :param batch_size: When working with large number of documents, batching can help reduce memory footprint.
        :param duplicate_documents: Handle duplicates document based on parameter options.
@ -344,7 +377,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        :raises DuplicateDocumentError: Exception trigger on duplicate document
        :return: None
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
        self._create_schema_and_index_if_not_exist(index)
        field_map = self._create_document_field_map()

@ -361,9 +394,30 @@ class WeaviateDocumentStore(BaseDocumentStore):
        current_properties = self._get_current_properties(index)

        document_objects = [Document.from_dict(d, field_map=field_map) if isinstance(d, dict) else d for d in documents]
+
+        # Weaviate has strict requirements for what ids can be used.
+        # We check the id format and sanitize it if no uuid was provided.
+        # Duplicate document ids will be mapped to the same generated uuid.
+        for do in document_objects:
+            do.id = self._sanitize_id(id=do.id, index=index)
+
        document_objects = self._handle_duplicate_documents(documents=document_objects,
                                                            index=index,
                                                            duplicate_documents=duplicate_documents)
+
+        # Weaviate requires that documents contain a vector in order to be indexed. These lines add a
+        # dummy vector so that indexing can still happen
+        dummy_embed_warning_raised = False
+        for do in document_objects:
+            if do.embedding is None:
+                dummy_embedding = np.random.rand(self.embedding_dim).astype(np.float32)
+                do.embedding = dummy_embedding
+                if not dummy_embed_warning_raised:
+                    logger.warning("No embedding found in Document object being written into Weaviate. A dummy "
+                                 "embedding is being supplied so that indexing can still take place. This "
+                                 "embedding should be overwritten in order to perform vector similarity searches.")
+                    dummy_embed_warning_raised = True
+
        batched_documents = get_batches_from_generator(document_objects, batch_size)
        with tqdm(total=len(document_objects), disable=not self.progress_bar) as progress_bar:
            for document_batch in batched_documents:
@ -417,11 +471,17 @@ class WeaviateDocumentStore(BaseDocumentStore):
        """
        self.weaviate_client.data_object.update(meta, class_name=self.index, uuid=id)

+    def get_embedding_count(self, filters: Optional[Dict[str, List[str]]] = None, index: Optional[str] = None) -> int:
+        """
+        Return the number of embeddings in the document store, which is the same as the number of documents since every document has a default embedding
+        """
+        return self.get_document_count(filters=filters, index=index)
+
    def get_document_count(self, filters: Optional[Dict[str, List[str]]] = None, index: Optional[str] = None) -> int:
        """
        Return the number of documents in the document store.
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
        doc_count = 0
        if filters:
            filter_dict = self._build_filter_clause(filters=filters)
@ -457,7 +517,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        :param return_embedding: Whether to return the document embeddings.
        :param batch_size: When working with large number of documents, batching can help reduce memory footprint.
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
        result = self.get_all_documents_generator(
            index=index, filters=filters, return_embedding=return_embedding, batch_size=batch_size
        )
@ -474,7 +534,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        """
        Return all documents in a specific index in the document store
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index

        # Build the properties to retrieve from Weaviate
        properties = self._get_current_properties(index)
@ -516,8 +576,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        :param batch_size: When working with large number of documents, batching can help reduce memory footprint.
        """

-        if index is None:
-            index = self.index
+        index = self._sanitize_index_name(index) or self.index

        if return_embedding is None:
            return_embedding = self.return_embedding
@ -546,7 +605,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
                            https://www.semi.technology/developers/weaviate/current/graphql-references/filters.html
        :param index: The name of the index in the DocumentStore from which to retrieve documents
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index

        # Build the properties to retrieve from Weaviate
        properties = self._get_current_properties(index)
@ -597,7 +656,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        """
        if return_embedding is None:
            return_embedding = self.return_embedding
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index

        # Build the properties to retrieve from Weaviate
        properties = self._get_current_properties(index)
@ -658,8 +717,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
        :param batch_size: When working with large number of documents, batching can help reduce memory footprint.
        :return: None
        """
-        if index is None:
-            index = self.index
+        index = self._sanitize_index_name(index) or self.index

        if not self.embedding_field:
            raise RuntimeError("Specify the arg `embedding_field` when initializing WeaviateDocumentStore()")
@ -718,7 +776,11 @@ class WeaviateDocumentStore(BaseDocumentStore):
            have their ID in the list).
        :return: None
        """
-        index = index or self.index
+        index = self._sanitize_index_name(index) or self.index
+
+        # create index if it doesn't exist yet
+        self._create_schema_and_index_if_not_exist(index)
+
        if not filters and not ids:
            self.weaviate_client.schema.delete_class(index)
            self._create_schema_and_index_if_not_exist(index)
@ -728,7 +790,3 @@ class WeaviateDocumentStore(BaseDocumentStore):
                docs_to_delete = [doc for doc in docs_to_delete if doc.id in ids]
            for doc in docs_to_delete:
                self.weaviate_client.data_object.delete(doc.id)
-            
-
-
-
--- a/haystack/utils/init.py
+++ b/haystack/utils/init.py
@ -9,6 +9,7 @@ from haystack.utils.doc_store import (
    launch_milvus,
    launch_open_distro_es,
    launch_opensearch,
+    launch_weaviate,
    stop_opensearch,
    stop_service,
 )
--- a/haystack/utils/doc_store.py
+++ b/haystack/utils/doc_store.py
@ -50,6 +50,20 @@ def launch_opensearch(sleep=15):
        time.sleep(sleep)


+def launch_weaviate(sleep=15):
+    # Start a Weaviate server via Docker
+
+    logger.info("Starting Weaviate ...")
+    status = subprocess.run(
+        ["docker run -d -p 8080:8080 --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.2"], shell=True
+    )
+    if status.returncode:
+        logger.warning("Tried to start Weaviate through Docker but this failed. "
+                       "It is likely that there is already an existing Weaviate instance running. ")
+    else:
+        time.sleep(sleep)
+
+
 def stop_opensearch():
    logger.info("Stopping OpenSearch...")
    status = subprocess.run(['docker stop opensearch'], shell=True)
--- a/test/conftest.py
+++ b/test/conftest.py
@ -57,13 +57,6 @@ def pytest_generate_tests(metafunc):
            break
    # for all others that don't have explicit parametrization, we add the ones from the CLI arg
    if 'document_store' in metafunc.fixturenames and not found_mark_parametrize_document_store:
-        # TODO: Remove the following if-condition once weaviate is fully compliant
-        # Background: Currently, weaviate is not fully compliant (e.g. "_" in "meta_field", problems with uuids ...)
-        # Therefore, we have separate tests in test_weaviate.py and we don't want to parametrize our generic
-        # tests (e.g. in test_document_store.py) with the weaviate fixture. However, we still need the weaviate option
-        # in the CLI arg as we want to skip test_weaviate.py if weaviate is not selected from CLI
-        if "weaviate" in selected_doc_stores:
-            selected_doc_stores.remove("weaviate")
        metafunc.parametrize("document_store", selected_doc_stores, indirect=True)


@ -172,7 +165,7 @@ def weaviate_fixture():
            shell=True
        )
        status = subprocess.run(
-            ['docker run -d --name haystack_test_weaviate -p 8080:8080 semitechnologies/weaviate:1.4.0'],
+            ['docker run -d --name haystack_test_weaviate -p 8080:8080 semitechnologies/weaviate:1.7.2'],
            shell=True
        )
        if status.returncode:
@ -447,8 +440,7 @@ def get_retriever(retriever_type, document_store):
    return retriever


-@pytest.fixture(params=["elasticsearch", "faiss", "memory", "milvus"])
-# @pytest.fixture(params=["memory"])
+@pytest.fixture(params=["elasticsearch", "faiss", "memory", "milvus", "weaviate"])
 def document_store_with_docs(request, test_docs_xs):
    document_store = get_document_store(request.param)
    document_store.write_documents(test_docs_xs)
@ -516,7 +508,7 @@ def get_document_store(document_store_type, embedding_dim=768, embedding_field="
    elif document_store_type == "weaviate":
        document_store = WeaviateDocumentStore(
            weaviate_url="http://localhost:8080",
-            index=index.replace('_','').title(),
+            index=index,
            similarity=similarity
        )
        document_store.weaviate_client.schema.delete_all()
--- a/test/test_document_store.py
+++ b/test/test_document_store.py
@ -4,6 +4,8 @@ import pytest
 from elasticsearch import Elasticsearch

 from conftest import get_document_store
+from haystack.document_stores import WeaviateDocumentStore
+from haystack.errors import DuplicateDocumentError
 from haystack.schema import Document, Label, Answer, Span
 from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
 from haystack.document_stores.faiss import FAISSDocumentStore
@ -49,6 +51,7 @@ def test_write_with_duplicate_doc_ids(document_store):
        document_store.write_documents(documents, duplicate_documents="fail")


+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus", "weaviate"], indirect=True)
 def test_write_with_duplicate_doc_ids_custom_index(document_store):
    documents = [
        Document(
@ -62,9 +65,24 @@ def test_write_with_duplicate_doc_ids_custom_index(document_store):
    ]
    document_store.delete_documents(index="haystack_custom_test")
    document_store.write_documents(documents, index="haystack_custom_test", duplicate_documents="skip")
-    with pytest.raises(Exception):
+    with pytest.raises(DuplicateDocumentError):
        document_store.write_documents(documents, index="haystack_custom_test", duplicate_documents="fail")

+    # Weaviate manipulates document objects in-place when writing them to an index.
+    # It generates a uuid based on the provided id and the index name where the document is added to.
+    # We need to get rid of these generated uuids for this test and therefore reset the document objects.
+    # As a result, the documents will receive a fresh uuid based on their id_hash_keys and a different index name.
+    if isinstance(document_store, WeaviateDocumentStore):
+        documents = [
+            Document(
+                content="Doc1",
+                id_hash_keys=["key1"]
+            ),
+            Document(
+                content="Doc2",
+                id_hash_keys=["key1"]
+            )
+        ]
    # writing to the default, empty index should still work
    document_store.write_documents(documents, duplicate_documents="fail")

@ -220,13 +238,13 @@ def test_write_document_index(document_store):
        {"content": "text1", "id": "1"},
        {"content": "text2", "id": "2"},
    ]
-    document_store.write_documents([documents[0]], index="haystack_test_1")
-    assert len(document_store.get_all_documents(index="haystack_test_1")) == 1
+    document_store.write_documents([documents[0]], index="haystack_test_one")
+    assert len(document_store.get_all_documents(index="haystack_test_one")) == 1

-    document_store.write_documents([documents[1]], index="haystack_test_2")
-    assert len(document_store.get_all_documents(index="haystack_test_2")) == 1
+    document_store.write_documents([documents[1]], index="haystack_test_two")
+    assert len(document_store.get_all_documents(index="haystack_test_two")) == 1

-    assert len(document_store.get_all_documents(index="haystack_test_1")) == 1
+    assert len(document_store.get_all_documents(index="haystack_test_one")) == 1
    assert len(document_store.get_all_documents()) == 0


@ -237,13 +255,15 @@ def test_document_with_embeddings(document_store):
        {"content": "text3", "id": "3", "embedding": np.random.rand(768).astype(np.float32).tolist()},
        {"content": "text4", "id": "4", "embedding": np.random.rand(768).astype(np.float32)},
    ]
-    document_store.write_documents(documents, index="haystack_test_1")
-    assert len(document_store.get_all_documents(index="haystack_test_1")) == 4
+    document_store.write_documents(documents, index="haystack_test_one")
+    assert len(document_store.get_all_documents(index="haystack_test_one")) == 4

-    documents_without_embedding = document_store.get_all_documents(index="haystack_test_1", return_embedding=False)
-    assert documents_without_embedding[0].embedding is None
+    if not isinstance(document_store, WeaviateDocumentStore):
+        # weaviate is excluded because it would return dummy vectors instead of None
+        documents_without_embedding = document_store.get_all_documents(index="haystack_test_one", return_embedding=False)
+        assert documents_without_embedding[0].embedding is None

-    documents_with_embedding = document_store.get_all_documents(index="haystack_test_1", return_embedding=True)
+    documents_with_embedding = document_store.get_all_documents(index="haystack_test_one", return_embedding=True)
    assert isinstance(documents_with_embedding[0].embedding, (list, np.ndarray))


@ -254,15 +274,15 @@ def test_update_embeddings(document_store, retriever):
        documents.append({"content": f"text_{i}", "id": str(i), "meta_field": f"value_{i}"})
    documents.append({"content": "text_0", "id": "6", "meta_field": "value_0"})

-    document_store.write_documents(documents, index="haystack_test_1")
-    document_store.update_embeddings(retriever, index="haystack_test_1", batch_size=3)
-    documents = document_store.get_all_documents(index="haystack_test_1", return_embedding=True)
+    document_store.write_documents(documents, index="haystack_test_one")
+    document_store.update_embeddings(retriever, index="haystack_test_one", batch_size=3)
+    documents = document_store.get_all_documents(index="haystack_test_one", return_embedding=True)
    assert len(documents) == 7
    for doc in documents:
        assert type(doc.embedding) is np.ndarray

    documents = document_store.get_all_documents(
-        index="haystack_test_1",
+        index="haystack_test_one",
        filters={"meta_field": ["value_0"]},
        return_embedding=True,
    )
@ -272,53 +292,57 @@ def test_update_embeddings(document_store, retriever):
    np.testing.assert_array_almost_equal(documents[0].embedding, documents[1].embedding, decimal=4)

    documents = document_store.get_all_documents(
-        index="haystack_test_1",
+        index="haystack_test_one",
        filters={"meta_field": ["value_0", "value_5"]},
        return_embedding=True,
    )
+    documents_with_value_0 = [doc for doc in documents if doc.meta["meta_field"] == "value_0"]
+    documents_with_value_5 = [doc for doc in documents if doc.meta["meta_field"] == "value_5"]
    np.testing.assert_raises(
        AssertionError,
        np.testing.assert_array_equal,
-        documents[0].embedding,
-        documents[1].embedding
+        documents_with_value_0[0].embedding,
+        documents_with_value_5[0].embedding
    )

    doc = {"content": "text_7", "id": "7", "meta_field": "value_7",
           "embedding": retriever.embed_queries(texts=["a random string"])[0]}
-    document_store.write_documents([doc], index="haystack_test_1")
+    document_store.write_documents([doc], index="haystack_test_one")

    documents = []
    for i in range(8, 11):
        documents.append({"content": f"text_{i}", "id": str(i), "meta_field": f"value_{i}"})
-    document_store.write_documents(documents, index="haystack_test_1")
+    document_store.write_documents(documents, index="haystack_test_one")

-    doc_before_update = document_store.get_all_documents(index="haystack_test_1", filters={"meta_field": ["value_7"]})[0]
+    doc_before_update = document_store.get_all_documents(index="haystack_test_one", filters={"meta_field": ["value_7"]})[0]
    embedding_before_update = doc_before_update.embedding

    # test updating only documents without embeddings
-    document_store.update_embeddings(retriever, index="haystack_test_1", batch_size=3, update_existing_embeddings=False)
-    doc_after_update = document_store.get_all_documents(index="haystack_test_1", filters={"meta_field": ["value_7"]})[0]
-    embedding_after_update = doc_after_update.embedding
-    np.testing.assert_array_equal(embedding_before_update, embedding_after_update)
+    if not isinstance(document_store, WeaviateDocumentStore):
+        # All the documents in Weaviate store have an embedding by default. "update_existing_embeddings=False" is not allowed
+        document_store.update_embeddings(retriever, index="haystack_test_one", batch_size=3, update_existing_embeddings=False)
+        doc_after_update = document_store.get_all_documents(index="haystack_test_one", filters={"meta_field": ["value_7"]})[0]
+        embedding_after_update = doc_after_update.embedding
+        np.testing.assert_array_equal(embedding_before_update, embedding_after_update)

    # test updating with filters
    if isinstance(document_store, FAISSDocumentStore):
        with pytest.raises(Exception):
            document_store.update_embeddings(
-                retriever, index="haystack_test_1", update_existing_embeddings=True, filters={"meta_field": ["value"]}
+                retriever, index="haystack_test_one", update_existing_embeddings=True, filters={"meta_field": ["value"]}
            )
    else:
        document_store.update_embeddings(
-            retriever, index="haystack_test_1", batch_size=3, filters={"meta_field": ["value_0", "value_1"]}
+            retriever, index="haystack_test_one", batch_size=3, filters={"meta_field": ["value_0", "value_1"]}
        )
-        doc_after_update = document_store.get_all_documents(index="haystack_test_1", filters={"meta_field": ["value_7"]})[0]
+        doc_after_update = document_store.get_all_documents(index="haystack_test_one", filters={"meta_field": ["value_7"]})[0]
        embedding_after_update = doc_after_update.embedding
        np.testing.assert_array_equal(embedding_before_update, embedding_after_update)

    # test update all embeddings
-    document_store.update_embeddings(retriever, index="haystack_test_1", batch_size=3, update_existing_embeddings=True)
-    assert document_store.get_embedding_count(index="haystack_test_1") == 11
-    doc_after_update = document_store.get_all_documents(index="haystack_test_1", filters={"meta_field": ["value_7"]})[0]
+    document_store.update_embeddings(retriever, index="haystack_test_one", batch_size=3, update_existing_embeddings=True)
+    assert document_store.get_embedding_count(index="haystack_test_one") == 11
+    doc_after_update = document_store.get_all_documents(index="haystack_test_one", filters={"meta_field": ["value_7"]})[0]
    embedding_after_update = doc_after_update.embedding
    np.testing.assert_raises(AssertionError, np.testing.assert_array_equal, embedding_before_update, embedding_after_update)

@ -326,9 +350,12 @@ def test_update_embeddings(document_store, retriever):
    documents = []
    for i in range(12, 15):
        documents.append({"content": f"text_{i}", "id": str(i), "meta_field": f"value_{i}"})
-    document_store.write_documents(documents, index="haystack_test_1")
-    document_store.update_embeddings(retriever, index="haystack_test_1", batch_size=3, update_existing_embeddings=False)
-    assert document_store.get_embedding_count(index="haystack_test_1") == 14
+    document_store.write_documents(documents, index="haystack_test_one")
+
+    if not isinstance(document_store, WeaviateDocumentStore):
+        # All the documents in Weaviate store have an embedding by default. "update_existing_embeddings=False" is not allowed
+        document_store.update_embeddings(retriever, index="haystack_test_one", batch_size=3, update_existing_embeddings=False)
+        assert document_store.get_embedding_count(index="haystack_test_one") == 14


@pytest.mark.parametrize("retriever", ["table_text_retriever"], indirect=True)
@ -354,16 +381,16 @@ def test_update_embeddings_table_text_retriever(document_store, retriever):
                      "meta_field": "value_table_0",
                      "content_type": "table"})

-    document_store.write_documents(documents, index="haystack_test_1")
-    document_store.update_embeddings(retriever, index="haystack_test_1", batch_size=3)
-    documents = document_store.get_all_documents(index="haystack_test_1", return_embedding=True)
+    document_store.write_documents(documents, index="haystack_test_one")
+    document_store.update_embeddings(retriever, index="haystack_test_one", batch_size=3)
+    documents = document_store.get_all_documents(index="haystack_test_one", return_embedding=True)
    assert len(documents) == 8
    for doc in documents:
        assert type(doc.embedding) is np.ndarray

    # Check if Documents with same content (text) get same embedding
    documents = document_store.get_all_documents(
-        index="haystack_test_1",
+        index="haystack_test_one",
        filters={"meta_field": ["value_text_0"]},
        return_embedding=True,
    )
@ -374,7 +401,7 @@ def test_update_embeddings_table_text_retriever(document_store, retriever):

    # Check if Documents with same content (table) get same embedding
    documents = document_store.get_all_documents(
-        index="haystack_test_1",
+        index="haystack_test_one",
        filters={"meta_field": ["value_table_0"]},
        return_embedding=True,
    )
@ -385,7 +412,7 @@ def test_update_embeddings_table_text_retriever(document_store, retriever):

    # Check if Documents wih different content (text) get different embedding
    documents = document_store.get_all_documents(
-        index="haystack_test_1",
+        index="haystack_test_one",
        filters={"meta_field": ["value_text_1", "value_text_2"]},
        return_embedding=True,
    )
@ -398,7 +425,7 @@ def test_update_embeddings_table_text_retriever(document_store, retriever):

    # Check if Documents with different content (table) get different embeddings
    documents = document_store.get_all_documents(
-        index="haystack_test_1",
+        index="haystack_test_one",
        filters={"meta_field": ["value_table_1", "value_table_2"]},
        return_embedding=True,
    )
@ -411,7 +438,7 @@ def test_update_embeddings_table_text_retriever(document_store, retriever):

    # Check if Documents with different content (table + text) get different embeddings
    documents = document_store.get_all_documents(
-        index="haystack_test_1",
+        index="haystack_test_one",
        filters={"meta_field": ["value_text_1", "value_table_1"]},
        return_embedding=True,
    )
@ -438,17 +465,25 @@ def test_delete_documents(document_store_with_docs):
    documents = document_store_with_docs.get_all_documents()
    assert len(documents) == 0

-    
-def test_delete_documents_by_id(document_store_with_docs):
-    doc_ids = [doc.id for doc in document_store_with_docs.get_all_documents()]
-    assert len(doc_ids) == 3
-    docs_to_delete = doc_ids[0:2]
-
-    document_store_with_docs.delete_documents(ids=docs_to_delete)

+def test_delete_documents_with_filters(document_store_with_docs):
+    document_store_with_docs.delete_documents(filters={"meta_field": ["test1", "test2"]})
    documents = document_store_with_docs.get_all_documents()
    assert len(documents) == 1
-    assert documents[0].id == doc_ids[2]
+    assert documents[0].meta["meta_field"] == "test3"
+
+
+def test_delete_documents_by_id(document_store_with_docs):
+    docs_to_delete = document_store_with_docs.get_all_documents(filters={"meta_field": ["test1", "test2"]})
+    docs_not_to_delete = document_store_with_docs.get_all_documents(filters={"meta_field": ["test3"]})
+
+    document_store_with_docs.delete_documents(ids=[doc.id for doc in docs_to_delete])
+    all_docs_left = document_store_with_docs.get_all_documents()
+    assert len(all_docs_left) == 1
+    assert all_docs_left[0].meta["meta_field"] == "test3"
+
+    all_ids_left = [doc.id for doc in all_docs_left]
+    assert all(doc.id in all_ids_left for doc in docs_not_to_delete)


 def test_delete_documents_by_id_with_filters(document_store_with_docs):
@ -464,14 +499,9 @@ def test_delete_documents_by_id_with_filters(document_store_with_docs):
    all_ids_left = [doc.id for doc in all_docs_left]
    assert all(doc.id in all_ids_left for doc in docs_not_to_delete)

-    
-def test_delete_documents_with_filters(document_store_with_docs):
-    document_store_with_docs.delete_documents(filters={"meta_field": ["test1", "test2"]})
-    documents = document_store_with_docs.get_all_documents()
-    assert len(documents) == 1
-    assert documents[0].meta["meta_field"] == "test3"
-

+# exclude weaviate because it does not support storing labels
+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus"], indirect=True)
 def test_labels(document_store):
    label = Label(
        query="question1",
@ -556,6 +586,8 @@ def test_labels(document_store):
    assert len(labels) == 0


+# exclude weaviate because it does not support storing labels
+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus"], indirect=True)
 def test_multilabel(document_store):
    labels =[
        Label(
@ -666,8 +698,9 @@ def test_multilabel(document_store):
    assert len(docs) == 0


+# exclude weaviate because it does not support storing labels
+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus"], indirect=True)
 def test_multilabel_no_answer(document_store):
-
    labels = [
        Label(
            query="question",
--- a/test/test_eval.py
+++ b/test/test_eval.py
@ -4,6 +4,8 @@ from haystack.nodes.preprocessor import PreProcessor
 from haystack.nodes.evaluator import EvalAnswers, EvalDocuments
 from haystack.pipelines.base import Pipeline

+
+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus"], indirect=True)
@pytest.mark.parametrize("batch_size", [None, 20])
 def test_add_eval_data(document_store, batch_size):
    # add eval data (SQUAD format)
@ -47,6 +49,7 @@ def test_add_eval_data(document_store, batch_size):
    assert doc.content[start:end] == "France"


+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus"], indirect=True)
@pytest.mark.parametrize("reader", ["farm"], indirect=True)
 def test_eval_reader(reader, document_store: BaseDocumentStore):
    # add eval data (SQUAD format)
@ -136,6 +139,7 @@ def test_eval_pipeline(document_store: BaseDocumentStore, reader, retriever):
    assert eval_reader.top_k_em == eval_reader_vanila.top_k_em


+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus"], indirect=True)
 def test_eval_data_split_word(document_store):
    # splitting by word
    preprocessor = PreProcessor(
@ -160,6 +164,7 @@ def test_eval_data_split_word(document_store):
    assert len(set(labels[0].document_ids)) == 2


+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss", "memory", "milvus"], indirect=True)
 def test_eval_data_split_passage(document_store):
    # splitting by passage
    preprocessor = PreProcessor(
--- a/test/test_retriever.py
+++ b/test/test_retriever.py
@ -4,6 +4,8 @@ import numpy as np
 import pandas as pd
 import pytest
 from elasticsearch import Elasticsearch
+
+from haystack.document_stores import WeaviateDocumentStore
 from haystack.schema import Document
 from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
 from haystack.document_stores.faiss import FAISSDocumentStore
@ -13,32 +15,35 @@ from haystack.nodes.retriever.sparse import ElasticsearchRetriever, Elasticsearc
 from transformers import DPRContextEncoderTokenizerFast, DPRQuestionEncoderTokenizerFast


-DOCS = [
-    Document(
-        content="""Aaron Aaron ( or ; ""Ahärôn"") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother's spokesman (""prophet"") to the Pharaoh. Part of the Law (Torah) that Moses received from""",
-        meta={"name": "0"},
-        id="1",
-    ),
-    Document(
-        content="""Democratic Republic of the Congo to the south. Angola's capital, Luanda, lies on the Atlantic coast in the northwest of the country. Angola, although located in a tropical zone, has a climate that is not characterized for this region, due to the confluence of three factors: As a result, Angola's climate is characterized by two seasons: rainfall from October to April and drought, known as ""Cacimbo"", from May to August, drier, as the name implies, and with lower temperatures. On the other hand, while the coastline has high rainfall rates, decreasing from North to South and from to , with""",
-        id="2",
-    ),
-    Document(
-        content="""Schopenhauer, describing him as an ultimately shallow thinker: ""Schopenhauer has quite a crude mind ... where real depth starts, his comes to an end."" His friend Bertrand Russell had a low opinion on the philosopher, and attacked him in his famous ""History of Western Philosophy"" for hypocritically praising asceticism yet not acting upon it. On the opposite isle of Russell on the foundations of mathematics, the Dutch mathematician L. E. J. Brouwer incorporated the ideas of Kant and Schopenhauer in intuitionism, where mathematics is considered a purely mental activity, instead of an analytic activity wherein objective properties of reality are""",
-        meta={"name": "1"},
-        id="3",
-    ),
-    Document(
-        content="""The Dothraki vocabulary was created by David J. Peterson well in advance of the adaptation. HBO hired the Language Creatio""",
-        meta={"name": "2"},
-        id="4",
-    ),
-    Document(
-        content="""The title of the episode refers to the Great Sept of Baelor, the main religious building in King's Landing, where the episode's pivotal scene takes place. In the world created by George R. R. Martin""",
-        meta={},
-        id="5",
-    ),
-]
+@pytest.fixture()
+def docs():
+    documents = [
+        Document(
+            content="""Aaron Aaron ( or ; ""Ahärôn"") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother's spokesman (""prophet"") to the Pharaoh. Part of the Law (Torah) that Moses received from""",
+            meta={"name": "0"},
+            id="1",
+        ),
+        Document(
+            content="""Democratic Republic of the Congo to the south. Angola's capital, Luanda, lies on the Atlantic coast in the northwest of the country. Angola, although located in a tropical zone, has a climate that is not characterized for this region, due to the confluence of three factors: As a result, Angola's climate is characterized by two seasons: rainfall from October to April and drought, known as ""Cacimbo"", from May to August, drier, as the name implies, and with lower temperatures. On the other hand, while the coastline has high rainfall rates, decreasing from North to South and from to , with""",
+            id="2",
+        ),
+        Document(
+            content="""Schopenhauer, describing him as an ultimately shallow thinker: ""Schopenhauer has quite a crude mind ... where real depth starts, his comes to an end."" His friend Bertrand Russell had a low opinion on the philosopher, and attacked him in his famous ""History of Western Philosophy"" for hypocritically praising asceticism yet not acting upon it. On the opposite isle of Russell on the foundations of mathematics, the Dutch mathematician L. E. J. Brouwer incorporated the ideas of Kant and Schopenhauer in intuitionism, where mathematics is considered a purely mental activity, instead of an analytic activity wherein objective properties of reality are""",
+            meta={"name": "1"},
+            id="3",
+        ),
+        Document(
+            content="""The Dothraki vocabulary was created by David J. Peterson well in advance of the adaptation. HBO hired the Language Creatio""",
+            meta={"name": "2"},
+            id="4",
+        ),
+        Document(
+            content="""The title of the episode refers to the Great Sept of Baelor, the main religious building in King's Landing, where the episode's pivotal scene takes place. In the world created by George R. R. Martin""",
+            meta={},
+            id="5",
+        ),
+    ]
+    return documents

 #TODO check if we this works with only "memory" arg
@pytest.mark.parametrize(
@ -142,10 +147,10 @@ def test_elasticsearch_custom_query():

@pytest.mark.slow
@pytest.mark.parametrize("retriever", ["dpr"], indirect=True)
-def test_dpr_embedding(document_store, retriever):
+def test_dpr_embedding(document_store, retriever, docs):

    document_store.return_embedding = True
-    document_store.write_documents(DOCS)
+    document_store.write_documents(docs)
    document_store.update_embeddings(retriever=retriever)
    time.sleep(1)

@ -165,10 +170,19 @@ def test_dpr_embedding(document_store, retriever):
@pytest.mark.slow
@pytest.mark.parametrize("retriever", ["retribert"], indirect=True)
@pytest.mark.vector_dim(128)
-def test_retribert_embedding(document_store, retriever):
-
+def test_retribert_embedding(document_store, retriever, docs):
+    if isinstance(document_store, WeaviateDocumentStore):
+        # Weaviate sets the embedding dimension to 768 as soon as it is initialized.
+        # We need 128 here and therefore initialize a new WeaviateDocumentStore.
+        document_store = WeaviateDocumentStore(
+            weaviate_url="http://localhost:8080",
+            index="haystack_test",
+            embedding_dim=128
+        )
+        document_store.weaviate_client.schema.delete_all()
+        document_store._create_schema_and_index_if_not_exist()
    document_store.return_embedding = True
-    document_store.write_documents(DOCS)
+    document_store.write_documents(docs)
    document_store.update_embeddings(retriever=retriever)
    time.sleep(1)

@ -184,10 +198,10 @@ def test_retribert_embedding(document_store, retriever):
@pytest.mark.parametrize("retriever", ["table_text_retriever"], indirect=True)
@pytest.mark.parametrize("document_store", ["elasticsearch"], indirect=True)
@pytest.mark.vector_dim(512)
-def test_table_text_retriever_embedding(document_store, retriever):
+def test_table_text_retriever_embedding(document_store, retriever, docs):

    document_store.return_embedding = True
-    document_store.write_documents(DOCS)
+    document_store.write_documents(docs)
    table_data = {
        "Mountain": ["Mount Everest", "K2", "Kangchenjunga", "Lhotse", "Makalu"],
        "Height": ["8848m", "8,611 m", "8 586m", "8 516 m", "8,485m"]
--- a/test/test_weaviate.py
+++ b/test/test_weaviate.py
@ -6,11 +6,13 @@ import uuid

 embedding_dim = 768

+
 def get_uuid():
    return str(uuid.uuid4())

+
 DOCUMENTS = [
-    {"content": "text1", "id":get_uuid(), "key": "a", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
+    {"content": "text1", "id":"not a correct uuid", "key": "a"},
    {"content": "text2", "id":get_uuid(), "key": "b", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
    {"content": "text3", "id":get_uuid(), "key": "b", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
    {"content": "text4", "id":get_uuid(), "key": "b", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
@ -26,6 +28,7 @@ DOCUMENTS_XS = [
        Document(content="My name is Christelle and I live in Paris", id=get_uuid(), meta={"metafield": "test3", "name": "filename3"}, embedding=np.random.rand(embedding_dim).astype(np.float32))
    ]

+
@pytest.fixture(params=["weaviate"])
 def document_store_with_docs(request):
    document_store = get_document_store(request.param)
@ -33,56 +36,13 @@ def document_store_with_docs(request):
    yield document_store
    document_store.delete_documents()

+
@pytest.fixture(params=["weaviate"])
 def document_store(request):
    document_store = get_document_store(request.param)
    yield document_store
    document_store.delete_documents()

-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
-def test_get_all_documents_without_filters(document_store_with_docs):
-    documents = document_store_with_docs.get_all_documents()
-    assert all(isinstance(d, Document) for d in documents)
-    assert len(documents) == 3
-    assert {d.meta["name"] for d in documents} == {"filename1", "filename2", "filename3"}
-    assert {d.meta["metafield"] for d in documents} == {"test1", "test2", "test3"}
-
-@pytest.mark.weaviate
-def test_get_all_documents_with_correct_filters(document_store_with_docs):
-    documents = document_store_with_docs.get_all_documents(filters={"metafield": ["test2"]})
-    assert len(documents) == 1
-    assert documents[0].meta["name"] == "filename2"
-
-    documents = document_store_with_docs.get_all_documents(filters={"metafield": ["test1", "test3"]})
-    assert len(documents) == 2
-    assert {d.meta["name"] for d in documents} == {"filename1", "filename3"}
-    assert {d.meta["metafield"] for d in documents} == {"test1", "test3"}
-
-@pytest.mark.weaviate
-def test_get_all_documents_with_incorrect_filter_name(document_store_with_docs):
-    documents = document_store_with_docs.get_all_documents(filters={"incorrectmetafield": ["test2"]})
-    assert len(documents) == 0
-
-@pytest.mark.weaviate
-def test_get_all_documents_with_incorrect_filter_value(document_store_with_docs):
-    documents = document_store_with_docs.get_all_documents(filters={"metafield": ["incorrect_value"]})
-    assert len(documents) == 0
-
-@pytest.mark.weaviate
-def test_get_documents_by_id(document_store_with_docs):
-    documents = document_store_with_docs.get_all_documents()
-    doc = document_store_with_docs.get_document_by_id(documents[0].id)
-    assert doc.id == documents[0].id
-    assert doc.content == documents[0].content
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-def test_get_document_count(document_store):
-    document_store.write_documents(DOCUMENTS)
-    assert document_store.get_document_count() == 5
-    assert document_store.get_document_count(filters={"key": ["a"]}) == 1
-    assert document_store.get_document_count(filters={"key": ["b"]}) == 4

@pytest.mark.weaviate
@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
@ -98,189 +58,6 @@ def test_weaviate_write_docs(document_store, batch_size):
    documents_indexed = document_store.get_all_documents(batch_size=batch_size)
    assert len(documents_indexed) == len(DOCUMENTS)

-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-def test_get_all_document_filter_duplicate_value(document_store):
-    documents = [
-        Document(
-            content="Doc1",
-            meta={"fone": "f0"},
-            id = get_uuid(),
-            embedding= np.random.rand(embedding_dim).astype(np.float32)
-        ),
-        Document(
-            content="Doc1",
-            meta={"fone": "f1", "metaid": "0"},
-            id = get_uuid(),
-            embedding = np.random.rand(embedding_dim).astype(np.float32)
-        ),
-        Document(
-            content="Doc2",
-            meta={"fthree": "f0"},
-            id = get_uuid(),
-            embedding=np.random.rand(embedding_dim).astype(np.float32)
-        )
-    ]
-    document_store.write_documents(documents)
-    documents = document_store.get_all_documents(filters={"fone": ["f1"]})
-    assert documents[0].content == "Doc1"
-    assert len(documents) == 1
-    assert {d.meta["metaid"] for d in documents} == {"0"}
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-def test_get_all_documents_generator(document_store):
-    document_store.write_documents(DOCUMENTS)
-    assert len(list(document_store.get_all_documents_generator(batch_size=2))) == 5
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-def test_write_with_duplicate_doc_ids(document_store):
-    id = get_uuid()
-    documents = [
-        Document(
-            content="Doc1",
-            id=id,
-            embedding=np.random.rand(embedding_dim).astype(np.float32)
-        ),
-        Document(
-            content="Doc2",
-            id=id,
-            embedding=np.random.rand(embedding_dim).astype(np.float32)
-        )
-    ]
-    document_store.write_documents(documents, duplicate_documents="skip")
-    with pytest.raises(Exception):
-        document_store.write_documents(documents, duplicate_documents="fail")
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-@pytest.mark.parametrize("update_existing_documents", [True, False])
-def test_update_existing_documents(document_store, update_existing_documents):
-    id = uuid.uuid4()
-    original_docs = [
-        {"content": "text1_orig", "id": id, "metafieldforcount": "a", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
-    ]
-
-    updated_docs = [
-        {"content": "text1_new", "id": id, "metafieldforcount": "a", "embedding": np.random.rand(embedding_dim).astype(np.float32)},
-    ]
-
-    document_store.update_existing_documents = update_existing_documents
-    document_store.write_documents(original_docs)
-    assert document_store.get_document_count() == 1
-
-    if update_existing_documents:
-        document_store.write_documents(updated_docs, duplicate_documents="overwrite")
-    else:
-        with pytest.raises(Exception):
-            document_store.write_documents(updated_docs, duplicate_documents="fail")
-
-    stored_docs = document_store.get_all_documents()
-    assert len(stored_docs) == 1
-    if update_existing_documents:
-        assert stored_docs[0].content == updated_docs[0]["content"]
-    else:
-        assert stored_docs[0].content == original_docs[0]["content"]
-
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-def test_write_document_meta(document_store):
-    uid1 = get_uuid()
-    uid2 = get_uuid()
-    uid3 = get_uuid()
-    uid4 = get_uuid()
-    documents = [
-        {"content": "dict_without_meta", "id": uid1, "embedding": np.random.rand(embedding_dim).astype(np.float32)},
-        {"content": "dict_with_meta", "metafield": "test2", "name": "filename2", "id": uid2, "embedding": np.random.rand(embedding_dim).astype(np.float32)},
-        Document(content="document_object_without_meta", id=uid3, embedding=np.random.rand(embedding_dim).astype(np.float32)),
-        Document(content="document_object_with_meta", meta={"metafield": "test4", "name": "filename3"}, id=uid4, embedding=np.random.rand(embedding_dim).astype(np.float32)),
-    ]
-    document_store.write_documents(documents)
-    documents_in_store = document_store.get_all_documents()
-    assert len(documents_in_store) == 4
-
-    assert not document_store.get_document_by_id(uid1).meta
-    assert document_store.get_document_by_id(uid2).meta["metafield"] == "test2"
-    assert not document_store.get_document_by_id(uid3).meta
-    assert document_store.get_document_by_id(uid4).meta["metafield"] == "test4"
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-def test_write_document_index(document_store):
-    documents = [
-        {"content": "text1", "id": uuid.uuid4(), "embedding": np.random.rand(embedding_dim).astype(np.float32)},
-        {"content": "text2", "id": uuid.uuid4(), "embedding": np.random.rand(embedding_dim).astype(np.float32)},
-    ]
-
-    document_store.write_documents([documents[0]], index="Haystackone")
-    assert len(document_store.get_all_documents(index="Haystackone")) == 1
-
-    document_store.write_documents([documents[1]], index="Haystacktwo")
-    assert len(document_store.get_all_documents(index="Haystacktwo")) == 1
-
-    assert len(document_store.get_all_documents(index="Haystackone")) == 1
-    assert len(document_store.get_all_documents()) == 0
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("retriever", ["dpr", "embedding"], indirect=True)
-@pytest.mark.parametrize("document_store", ["weaviate"], indirect=True)
-def test_update_embeddings(document_store, retriever):
-    documents = []
-    for i in range(6):
-        documents.append({"content": f"text_{i}", "id": str(uuid.uuid4()), "metafield": f"value_{i}", "embedding": np.random.rand(embedding_dim).astype(np.float32)})
-    documents.append({"content": "text_0", "id": str(uuid.uuid4()), "metafield": "value_0", "embedding": np.random.rand(embedding_dim).astype(np.float32)})
-
-    document_store.write_documents(documents, index="HaystackTestOne")
-    document_store.update_embeddings(retriever, index="HaystackTestOne", batch_size=3)
-    documents = document_store.get_all_documents(index="HaystackTestOne", return_embedding=True)
-    assert len(documents) == 7
-    for doc in documents:
-        assert type(doc.embedding) is np.ndarray
-
-    documents = document_store.get_all_documents(
-        index="HaystackTestOne",
-        filters={"metafield": ["value_0"]},
-        return_embedding=True,
-    )
-    assert len(documents) == 2
-    for doc in documents:
-        assert doc.meta["metafield"] == "value_0"
-    np.testing.assert_array_almost_equal(documents[0].embedding, documents[1].embedding, decimal=4)
-
-    documents = document_store.get_all_documents(
-        index="HaystackTestOne",
-        filters={"metafield": ["value_1", "value_5"]},
-        return_embedding=True,
-    )
-    np.testing.assert_raises(
-        AssertionError,
-        np.testing.assert_array_equal,
-        documents[0].embedding,
-        documents[1].embedding
-    )
-
-    doc = {"content": "text_7", "id": str(uuid.uuid4()), "metafield": "value_7",
-           "embedding": retriever.embed_queries(texts=["a random string"])[0]}
-    document_store.write_documents([doc], index="HaystackTestOne")
-
-    doc_before_update = document_store.get_all_documents(index="HaystackTestOne", filters={"metafield": ["value_7"]})[0]
-    embedding_before_update = doc_before_update.embedding
-
-    document_store.update_embeddings(
-        retriever, index="HaystackTestOne", batch_size=3, filters={"metafield": ["value_0", "value_1"]}
-    )
-    doc_after_update = document_store.get_all_documents(index="HaystackTestOne", filters={"metafield": ["value_7"]})[0]
-    embedding_after_update = doc_after_update.embedding
-    np.testing.assert_array_equal(embedding_before_update, embedding_after_update)
-
-    # test update all embeddings
-    document_store.update_embeddings(retriever, index="HaystackTestOne", batch_size=3, update_existing_embeddings=True)
-    assert document_store.get_document_count(index="HaystackTestOne") == 8
-    doc_after_update = document_store.get_all_documents(index="HaystackTestOne", filters={"metafield": ["value_7"]})[0]
-    embedding_after_update = doc_after_update.embedding
-    np.testing.assert_raises(AssertionError, np.testing.assert_array_equal, embedding_before_update, embedding_after_update)

@pytest.mark.weaviate
@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
@ -311,49 +88,3 @@ def test_query(document_store_with_docs):

    docs = document_store_with_docs.query(filters={"content":['live']})
    assert len(docs) == 3
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
-def test_delete_documents(document_store_with_docs):
-    assert len(document_store_with_docs.get_all_documents()) == 3
-
-    document_store_with_docs.delete_documents()
-    documents = document_store_with_docs.get_all_documents()
-    assert len(documents) == 0
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
-def test_delete_documents_with_filters(document_store_with_docs):
-    assert len(document_store_with_docs.get_all_documents()) == 3
-
-    document_store_with_docs.delete_documents(filters={"metafield": ["test1", "test2"]})
-    documents = document_store_with_docs.get_all_documents()
-    assert len(documents) == 1
-    assert documents[0].meta["metafield"] == "test3"
-
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
-def test_delete_documents_by_id(document_store_with_docs):
-    assert len(document_store_with_docs.get_all_documents()) == 3
-    ids_to_delete = [doc.id for doc in document_store_with_docs.get_all_documents()[0:2]]
-
-    document_store_with_docs.delete_documents(ids=ids_to_delete)
-    documents = document_store_with_docs.get_all_documents()
-    assert len(documents) == 1
-    assert documents[0].id not in ids_to_delete
-
-
-@pytest.mark.weaviate
-@pytest.mark.parametrize("document_store_with_docs", ["weaviate"], indirect=True)
-def test_delete_documents_by_id_with_filters(document_store_with_docs):
-    docs_to_delete = document_store_with_docs.get_all_documents(filters={"metafield": ["test1", "test2"]})
-    docs_not_to_delete = document_store_with_docs.get_all_documents(filters={"metafield": ["test3"]})
-
-    document_store_with_docs.delete_documents(ids=[doc.id for doc in docs_to_delete], filters={"metafield": ["test1"]})
-
-    all_docs_left = document_store_with_docs.get_all_documents()
-    assert len(all_docs_left) == 2
-    assert all(doc.meta["metafield"] != "test1" for doc in all_docs_left)
-    all_ids_left = [doc.id for doc in all_docs_left]
-    assert all(doc.id in all_ids_left for doc in docs_not_to_delete)