haystack/test/test_retriever.py
Sara Zan 11cf94a965
Pipeline's YAML: syntax validation (#2226)
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes

* Make error composition work properly

* Clarify typing

* Help mypy a bit more

* Update Documentation & Code Style

* Enable autogenerated docs for Milvus1 and 2 separately

* Revert "Enable autogenerated docs for Milvus1 and 2 separately"

This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.

* Update Documentation & Code Style

* Re-enable 'additionalProperties: False'

* Add pipeline.type to JSON Schema, was somehow forgotten

* Disable additionalProperties on the pipeline properties too

* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)

* Cal super in PipelineValidationError

* Improve _read_pipeline_config_from_yaml's error handling

* Fix generate_json_schema.py to include document stores

* Fix json schemas (retro-fix 1.1.0 again)

* Improve custom errors printing, add link to docs

* Add function in BaseComponent to list its subclasses in a module

* Make some document stores base classes abstract

* Add marker 'integration' in pytest flags

* Slighly improve validation of pipelines at load

* Adding tests for YAML loading and validation

* Make custom_query Optional for validation issues

* Fix bug in _read_pipeline_config_from_yaml

* Improve error handling in BasePipeline and Pipeline and add DAG check

* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)

* Simplify errors slightly

* Add some YAML validation tests

* Remove load_from_config from BasePipeline, it was never used anyway

* Improve tests

* Include json-schemas in package

* Fix conftest imports

* Make BasePipeline abstract

* Improve mocking by making the test independent from the YAML version

* Add exportable_to_yaml decorator to forget about set_config on mock nodes

* Fix mypy errors

* Comment out one monkeypatch

* Fix typing again

* Improve error message for validation

* Add required properties to pipelines

* Fix YAML version for REST API YAMLs to 1.2.0

* Fix load_from_yaml call in load_from_deepset_cloud

* fix HaystackError.__getattr__

* Add super().__init__()in most nodes and docstore, comment set_config

* Remove type from REST API pipelines

* Remove useless init from doc2answers

* Call super in Seq3SeqGenerator

* Typo in deepsetcloud.py

* Fix rest api indexing error mismatch and mock version of JSON schema in all tests

* Working on pipeline tests

* Improve errors printing slightly

* Add back test_pipeline.yaml

* _json_schema.py supports different versions with identical schemas

* Add type to 0.7 schema for backwards compatibility

* Fix small bug in _json_schema.py

* Try alternative to generate json schemas on the CI

* Update Documentation & Code Style

* Make linux CI match autoformat CI

* Fix super-init-not-called

* Accidentally committed file

* Update Documentation & Code Style

* fix test_summarizer_translation.py's import

* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args

* Fix json schema for ray tests too

* Update Documentation & Code Style

* Reintroduce validation

* Usa unstable version in tests and rest api

* Make unstable support the latest versions

* Update Documentation & Code Style

* Remove needless fixture

* Make type in pipeline optional in the strings validation

* Fix schemas

* Fix string validation for pipeline type

* Improve validate_config_strings

* Remove type from test p[ipelines

* Update Documentation & Code Style

* Fix test_pipeline

* Removing more type from pipelines

* Temporary CI patc

* Fix issue with exportable_to_yaml never invoking the wrapped init

* rm stray file

* pipeline tests are green again

* Linux CI now needs .[all] to generate the schema

* Bugfixes, pipeline tests seems to be green

* Typo in version after merge

* Implement missing methods in Weaviate

* Trying to avoid FAISS tests from running in the Milvus1 test suite

* Fix some stray test paths and faiss index dumping

* Fix pytest markers list

* Temporarily disable cache to be able to see tests failures

* Fix pyproject.toml syntax

* Use only tmp_path

* Fix preprocessor signature after merge

* Fix faiss bug

* Fix Ray test

* Fix documentation issue by removing quotes from faiss type

* Update Documentation & Code Style

* use document properly in preprocessor tests

* Update Documentation & Code Style

* make preprocessor capable of handling documents

* import document

* Revert support for documents in preprocessor, do later

* Fix bug in _json_schema.py that was breaking validation

* re-enable cache

* Update Documentation & Code Style

* Simplify calling _json_schema.py from the CI

* Remove redundant ABC inheritance

* Ensure exportable_to_yaml works only on implementations

* Rename subclass to class_ in Meta

* Make run() and get_config() abstract in BasePipeline

* Revert unintended change in preprocessor

* Move outgoing_edges_input_node check inside try block

* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX

* Add check for a RecursionError on validate_config_strings

* Address usages of _pipeline_config in data silo and elasticsearch

* Rename _pipeline_config into _init_parameters

* Fix pytest marker and remove unused imports

* Remove most redundant ABCs

* Rename _init_parameters into _component_configuration

* Remove set_config and type from _component_configuration's dict

* Remove last instances of set_config and replace with super().__init__()

* Implement __init_subclass__ approach

* Simplify checks on the existence of _component_configuration

* Fix faiss issue

* Dynamic generation of node schemas & weed out old schemas

* Add debatable test

* Add docstring to debatable test

* Positive diff between schemas implemented

* Improve diff printing

* Rename REST API YAML files to trigger IDE validation

* Fix typing issues

* Fix more typing

* Typo in YAML filename

* Remove needless type:ignore

* Add tests

* Fix tests & validation feedback for accessory classes in custom nodes

* Refactor RAGeneratorType out

* Fix broken import in conftest

* Improve source error handling

* Remove unused import in test_eval.py breaking tests

* Fix changed error message in tests matches too

* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions

* Fix path to generate_openapi_specs.py in autoformat.yml

* Update Documentation & Code Style

* Add test for FAISSDocumentStore-like situations (superclass with init params)

* Update Documentation & Code Style

* Fix indentation

* Remove commented set_config

* Store model_name_or_path in FARMReader to use in DistillationDataSilo

* Rename _component_configuration into _component_config

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:17:26 +01:00

482 lines
20 KiB
Python

import time
import numpy as np
import pandas as pd
import pytest
from pathlib import Path
from elasticsearch import Elasticsearch
from haystack.document_stores import WeaviateDocumentStore
from haystack.schema import Document
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
from haystack.document_stores.faiss import FAISSDocumentStore
from haystack.document_stores import MilvusDocumentStore
from haystack.nodes.retriever.dense import DensePassageRetriever, TableTextRetriever
from haystack.nodes.retriever.sparse import ElasticsearchRetriever, ElasticsearchFilterOnlyRetriever, TfidfRetriever
from transformers import DPRContextEncoderTokenizerFast, DPRQuestionEncoderTokenizerFast
from .conftest import SAMPLES_PATH
@pytest.fixture()
def docs():
documents = [
Document(
content="""Aaron Aaron ( or ; ""Ahärôn"") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother's spokesman (""prophet"") to the Pharaoh. Part of the Law (Torah) that Moses received from""",
meta={"name": "0"},
id="1",
),
Document(
content="""Democratic Republic of the Congo to the south. Angola's capital, Luanda, lies on the Atlantic coast in the northwest of the country. Angola, although located in a tropical zone, has a climate that is not characterized for this region, due to the confluence of three factors: As a result, Angola's climate is characterized by two seasons: rainfall from October to April and drought, known as ""Cacimbo"", from May to August, drier, as the name implies, and with lower temperatures. On the other hand, while the coastline has high rainfall rates, decreasing from North to South and from to , with""",
id="2",
),
Document(
content="""Schopenhauer, describing him as an ultimately shallow thinker: ""Schopenhauer has quite a crude mind ... where real depth starts, his comes to an end."" His friend Bertrand Russell had a low opinion on the philosopher, and attacked him in his famous ""History of Western Philosophy"" for hypocritically praising asceticism yet not acting upon it. On the opposite isle of Russell on the foundations of mathematics, the Dutch mathematician L. E. J. Brouwer incorporated the ideas of Kant and Schopenhauer in intuitionism, where mathematics is considered a purely mental activity, instead of an analytic activity wherein objective properties of reality are""",
meta={"name": "1"},
id="3",
),
Document(
content="""The Dothraki vocabulary was created by David J. Peterson well in advance of the adaptation. HBO hired the Language Creatio""",
meta={"name": "2"},
id="4",
),
Document(
content="""The title of the episode refers to the Great Sept of Baelor, the main religious building in King's Landing, where the episode's pivotal scene takes place. In the world created by George R. R. Martin""",
meta={},
id="5",
),
]
return documents
# TODO check if we this works with only "memory" arg
@pytest.mark.parametrize(
"retriever_with_docs,document_store_with_docs",
[
("dpr", "elasticsearch"),
("dpr", "faiss"),
("dpr", "memory"),
("dpr", "milvus1"),
("embedding", "elasticsearch"),
("embedding", "faiss"),
("embedding", "memory"),
("embedding", "milvus1"),
("elasticsearch", "elasticsearch"),
("es_filter_only", "elasticsearch"),
("tfidf", "memory"),
],
indirect=True,
)
def test_retrieval(retriever_with_docs, document_store_with_docs):
if not isinstance(retriever_with_docs, (ElasticsearchRetriever, ElasticsearchFilterOnlyRetriever, TfidfRetriever)):
document_store_with_docs.update_embeddings(retriever_with_docs)
# test without filters
res = retriever_with_docs.retrieve(query="Who lives in Berlin?")
assert res[0].content == "My name is Carla and I live in Berlin"
assert len(res) == 5
assert res[0].meta["name"] == "filename1"
# test with filters
if not isinstance(document_store_with_docs, (FAISSDocumentStore, MilvusDocumentStore)) and not isinstance(
retriever_with_docs, TfidfRetriever
):
# single filter
result = retriever_with_docs.retrieve(query="godzilla", filters={"name": ["filename3"]}, top_k=5)
assert len(result) == 1
assert type(result[0]) == Document
assert result[0].content == "My name is Christelle and I live in Paris"
assert result[0].meta["name"] == "filename3"
# multiple filters
result = retriever_with_docs.retrieve(
query="godzilla", filters={"name": ["filename2"], "meta_field": ["test2", "test3"]}, top_k=5
)
assert len(result) == 1
assert type(result[0]) == Document
assert result[0].meta["name"] == "filename2"
result = retriever_with_docs.retrieve(
query="godzilla", filters={"name": ["filename1"], "meta_field": ["test2", "test3"]}, top_k=5
)
assert len(result) == 0
@pytest.mark.elasticsearch
def test_elasticsearch_custom_query():
client = Elasticsearch()
client.indices.delete(index="haystack_test_custom", ignore=[404])
document_store = ElasticsearchDocumentStore(
index="haystack_test_custom", content_field="custom_text_field", embedding_field="custom_embedding_field"
)
documents = [
{"content": "test_1", "meta": {"year": "2019"}},
{"content": "test_2", "meta": {"year": "2020"}},
{"content": "test_3", "meta": {"year": "2021"}},
{"content": "test_4", "meta": {"year": "2021"}},
{"content": "test_5", "meta": {"year": "2021"}},
]
document_store.write_documents(documents)
# test custom "terms" query
retriever = ElasticsearchRetriever(
document_store=document_store,
custom_query="""
{
"size": 10,
"query": {
"bool": {
"should": [{
"multi_match": {"query": ${query}, "type": "most_fields", "fields": ["content"]}}],
"filter": [{"terms": {"year": ${years}}}]}}}""",
)
results = retriever.retrieve(query="test", filters={"years": ["2020", "2021"]})
assert len(results) == 4
# test custom "term" query
retriever = ElasticsearchRetriever(
document_store=document_store,
custom_query="""
{
"size": 10,
"query": {
"bool": {
"should": [{
"multi_match": {"query": ${query}, "type": "most_fields", "fields": ["content"]}}],
"filter": [{"term": {"year": ${years}}}]}}}""",
)
results = retriever.retrieve(query="test", filters={"years": "2021"})
assert len(results) == 3
@pytest.mark.slow
@pytest.mark.parametrize(
"document_store", ["elasticsearch", "faiss", "memory", "milvus1", "milvus", "weaviate"], indirect=True
)
@pytest.mark.parametrize("retriever", ["dpr"], indirect=True)
def test_dpr_embedding(document_store, retriever, docs):
document_store.return_embedding = True
document_store.write_documents(docs)
document_store.update_embeddings(retriever=retriever)
time.sleep(1)
# always normalize vector as faiss returns normalized vectors and other document stores do not
doc_1 = document_store.get_document_by_id("1").embedding
doc_1 /= np.linalg.norm(doc_1)
assert len(doc_1) == 768
assert abs(doc_1[0] - (-0.0250)) < 0.001
doc_2 = document_store.get_document_by_id("2").embedding
doc_2 /= np.linalg.norm(doc_2)
assert abs(doc_2[0] - (-0.0314)) < 0.001
doc_3 = document_store.get_document_by_id("3").embedding
doc_3 /= np.linalg.norm(doc_3)
assert abs(doc_3[0] - (-0.0200)) < 0.001
doc_4 = document_store.get_document_by_id("4").embedding
doc_4 /= np.linalg.norm(doc_4)
assert abs(doc_4[0] - (-0.0070)) < 0.001
doc_5 = document_store.get_document_by_id("5").embedding
doc_5 /= np.linalg.norm(doc_5)
assert abs(doc_5[0] - (-0.0049)) < 0.001
@pytest.mark.slow
@pytest.mark.parametrize(
"document_store", ["elasticsearch", "faiss", "memory", "milvus1", "milvus", "weaviate"], indirect=True
)
@pytest.mark.parametrize("retriever", ["retribert"], indirect=True)
@pytest.mark.embedding_dim(128)
def test_retribert_embedding(document_store, retriever, docs):
if isinstance(document_store, WeaviateDocumentStore):
# Weaviate sets the embedding dimension to 768 as soon as it is initialized.
# We need 128 here and therefore initialize a new WeaviateDocumentStore.
document_store = WeaviateDocumentStore(
weaviate_url="http://localhost:8080", index="haystack_test", embedding_dim=128
)
document_store.weaviate_client.schema.delete_all()
document_store._create_schema_and_index_if_not_exist()
document_store.return_embedding = True
document_store.write_documents(docs)
document_store.update_embeddings(retriever=retriever)
time.sleep(1)
assert len(document_store.get_document_by_id("1").embedding) == 128
assert abs(document_store.get_document_by_id("1").embedding[0]) < 0.6
assert abs(document_store.get_document_by_id("2").embedding[0]) < 0.03
assert abs(document_store.get_document_by_id("3").embedding[0]) < 0.095
assert abs(document_store.get_document_by_id("4").embedding[0]) < 0.3
assert abs(document_store.get_document_by_id("5").embedding[0]) < 0.32
@pytest.mark.slow
@pytest.mark.parametrize("retriever", ["table_text_retriever"], indirect=True)
@pytest.mark.parametrize("document_store", ["elasticsearch"], indirect=True)
@pytest.mark.embedding_dim(512)
def test_table_text_retriever_embedding(document_store, retriever, docs):
document_store.return_embedding = True
document_store.write_documents(docs)
table_data = {
"Mountain": ["Mount Everest", "K2", "Kangchenjunga", "Lhotse", "Makalu"],
"Height": ["8848m", "8,611 m", "8 586m", "8 516 m", "8,485m"],
}
table = pd.DataFrame(table_data)
table_doc = Document(content=table, content_type="table", id="6")
document_store.write_documents([table_doc])
document_store.update_embeddings(retriever=retriever)
time.sleep(1)
doc_1 = document_store.get_document_by_id("1")
assert len(doc_1.embedding) == 512
assert abs(doc_1.embedding[0] - (0.0593)) < 0.001
doc_2 = document_store.get_document_by_id("2")
assert abs(doc_2.embedding[0] - (0.9031)) < 0.001
doc_3 = document_store.get_document_by_id("3")
assert abs(doc_3.embedding[0] - (0.1366)) < 0.001
doc_4 = document_store.get_document_by_id("4")
assert abs(doc_4.embedding[0] - (0.0575)) < 0.001
doc_5 = document_store.get_document_by_id("5")
assert abs(doc_5.embedding[0] - (0.1486)) < 0.001
doc_6 = document_store.get_document_by_id("6")
assert len(doc_6.embedding) == 512
assert abs(doc_6.embedding[0] - (0.2745)) < 0.001
@pytest.mark.parametrize("retriever", ["dpr"], indirect=True)
@pytest.mark.parametrize("document_store", ["memory"], indirect=True)
def test_dpr_saving_and_loading(tmp_path, retriever, document_store):
retriever.save(f"{tmp_path}/test_dpr_save")
def sum_params(model):
s = []
for p in model.parameters():
n = p.cpu().data.numpy()
s.append(np.sum(n))
return sum(s)
original_sum_query = sum_params(retriever.query_encoder)
original_sum_passage = sum_params(retriever.passage_encoder)
del retriever
loaded_retriever = DensePassageRetriever.load(f"{tmp_path}/test_dpr_save", document_store)
loaded_sum_query = sum_params(loaded_retriever.query_encoder)
loaded_sum_passage = sum_params(loaded_retriever.passage_encoder)
assert abs(original_sum_query - loaded_sum_query) < 0.1
assert abs(original_sum_passage - loaded_sum_passage) < 0.1
# comparison of weights (RAM intense!)
# for p1, p2 in zip(retriever.query_encoder.parameters(), loaded_retriever.query_encoder.parameters()):
# assert (p1.data.ne(p2.data).sum() == 0)
#
# for p1, p2 in zip(retriever.passage_encoder.parameters(), loaded_retriever.passage_encoder.parameters()):
# assert (p1.data.ne(p2.data).sum() == 0)
# attributes
assert loaded_retriever.processor.embed_title == True
assert loaded_retriever.batch_size == 16
assert loaded_retriever.processor.max_seq_len_passage == 256
assert loaded_retriever.processor.max_seq_len_query == 64
# Tokenizer
assert isinstance(loaded_retriever.passage_tokenizer, DPRContextEncoderTokenizerFast)
assert isinstance(loaded_retriever.query_tokenizer, DPRQuestionEncoderTokenizerFast)
assert loaded_retriever.passage_tokenizer.do_lower_case == True
assert loaded_retriever.query_tokenizer.do_lower_case == True
assert loaded_retriever.passage_tokenizer.vocab_size == 30522
assert loaded_retriever.query_tokenizer.vocab_size == 30522
assert loaded_retriever.passage_tokenizer.model_max_length == 512
assert loaded_retriever.query_tokenizer.model_max_length == 512
@pytest.mark.parametrize("retriever", ["table_text_retriever"], indirect=True)
@pytest.mark.embedding_dim(512)
def test_table_text_retriever_saving_and_loading(tmp_path, retriever, document_store):
retriever.save(f"{tmp_path}/test_table_text_retriever_save")
def sum_params(model):
s = []
for p in model.parameters():
n = p.cpu().data.numpy()
s.append(np.sum(n))
return sum(s)
original_sum_query = sum_params(retriever.query_encoder)
original_sum_passage = sum_params(retriever.passage_encoder)
original_sum_table = sum_params(retriever.table_encoder)
del retriever
loaded_retriever = TableTextRetriever.load(f"{tmp_path}/test_table_text_retriever_save", document_store)
loaded_sum_query = sum_params(loaded_retriever.query_encoder)
loaded_sum_passage = sum_params(loaded_retriever.passage_encoder)
loaded_sum_table = sum_params(loaded_retriever.table_encoder)
assert abs(original_sum_query - loaded_sum_query) < 0.1
assert abs(original_sum_passage - loaded_sum_passage) < 0.1
assert abs(original_sum_table - loaded_sum_table) < 0.01
# attributes
assert loaded_retriever.processor.embed_meta_fields == ["name", "section_title", "caption"]
assert loaded_retriever.batch_size == 16
assert loaded_retriever.processor.max_seq_len_passage == 256
assert loaded_retriever.processor.max_seq_len_table == 256
assert loaded_retriever.processor.max_seq_len_query == 64
# Tokenizer
assert isinstance(loaded_retriever.passage_tokenizer, DPRContextEncoderTokenizerFast)
assert isinstance(loaded_retriever.table_tokenizer, DPRContextEncoderTokenizerFast)
assert isinstance(loaded_retriever.query_tokenizer, DPRQuestionEncoderTokenizerFast)
assert loaded_retriever.passage_tokenizer.do_lower_case == True
assert loaded_retriever.table_tokenizer.do_lower_case == True
assert loaded_retriever.query_tokenizer.do_lower_case == True
assert loaded_retriever.passage_tokenizer.vocab_size == 30522
assert loaded_retriever.table_tokenizer.vocab_size == 30522
assert loaded_retriever.query_tokenizer.vocab_size == 30522
assert loaded_retriever.passage_tokenizer.model_max_length == 512
assert loaded_retriever.table_tokenizer.model_max_length == 512
assert loaded_retriever.query_tokenizer.model_max_length == 512
@pytest.mark.embedding_dim(128)
def test_table_text_retriever_training(document_store):
retriever = TableTextRetriever(
document_store=document_store,
query_embedding_model="prajjwal1/bert-tiny",
passage_embedding_model="prajjwal1/bert-tiny",
table_embedding_model="prajjwal1/bert-tiny",
use_gpu=False,
)
retriever.train(
data_dir=SAMPLES_PATH / "mmr",
train_filename="sample.json",
n_epochs=1,
n_gpu=0,
save_dir="test_table_text_retriever_train",
)
# Load trained model
retriever = TableTextRetriever.load(load_dir="test_table_text_retriever_train", document_store=document_store)
@pytest.mark.elasticsearch
def test_elasticsearch_highlight():
client = Elasticsearch()
client.indices.delete(index="haystack_hl_test", ignore=[404])
# Mapping the content and title field as "text" perform search on these both fields.
document_store = ElasticsearchDocumentStore(
index="haystack_hl_test",
content_field="title",
custom_mapping={"mappings": {"properties": {"content": {"type": "text"}, "title": {"type": "text"}}}},
)
documents = [
{
"title": "Green tea components",
"meta": {
"content": "The green tea plant contains a range of healthy compounds that make it into the final drink"
},
"id": "1",
},
{
"title": "Green tea catechin",
"meta": {"content": "Green tea contains a catechin called epigallocatechin-3-gallate (EGCG)."},
"id": "2",
},
{
"title": "Minerals in Green tea",
"meta": {"content": "Green tea also has small amounts of minerals that can benefit your health."},
"id": "3",
},
{
"title": "Green tea Benefits",
"meta": {"content": "Green tea does more than just keep you alert, it may also help boost brain function."},
"id": "4",
},
]
document_store.write_documents(documents)
# Enabled highlighting on "title"&"content" field only using custom query
retriever_1 = ElasticsearchRetriever(
document_store=document_store,
custom_query="""{
"size": 20,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": ${query},
"fields": [
"content^3",
"title^5"
]
}
}
]
}
},
"highlight": {
"pre_tags": [
"**"
],
"post_tags": [
"**"
],
"number_of_fragments": 3,
"fragment_size": 5,
"fields": {
"content": {},
"title": {}
}
}
}""",
)
results = retriever_1.retrieve(query="is green tea healthy")
assert len(results[0].meta["highlighted"]) == 2
assert results[0].meta["highlighted"]["title"] == ["**Green**", "**tea** components"]
assert results[0].meta["highlighted"]["content"] == ["The **green**", "**tea** plant", "range of **healthy**"]
# Enabled highlighting on "title" field only using custom query
retriever_2 = ElasticsearchRetriever(
document_store=document_store,
custom_query="""{
"size": 20,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": ${query},
"fields": [
"content^3",
"title^5"
]
}
}
]
}
},
"highlight": {
"pre_tags": [
"**"
],
"post_tags": [
"**"
],
"number_of_fragments": 3,
"fragment_size": 5,
"fields": {
"title": {}
}
}
}""",
)
results = retriever_2.retrieve(query="is green tea healthy")
assert len(results[0].meta["highlighted"]) == 1
assert results[0].meta["highlighted"]["title"] == ["**Green**", "**tea** components"]