haystack/e2e/modeling/test_adaptive_qa_inference.py

import pytest

from haystack.modeling.infer import Inferencer


@pytest.mark.parametrize("multiprocessing_chunksize", [None, 2])
def test_qa_format_and_results(multiprocessing_chunksize):
    qa_inputs_dicts = [
        {
            "questions": ["In what country is Normandy"],
            "text": "The Normans are an ethnic group that arose in Normandy, a northern region "
            "of France, from contact between Viking settlers and indigenous Franks and Gallo-Romans",
        },
        {
            "questions": ["Who counted the game among the best ever made?"],
            "text": "Twilight Princess was released to universal critical acclaim and commercial success. It received "
            "perfect scores from major publications such as 1UP.com, Computer and Video Games, Electronic "
            "Gaming Monthly, Game Informer, GamesRadar, and GameSpy. On the review aggregators GameRankings "
            "and Metacritic, Twilight Princess has average scores of 95% and 95 for the Wii version and scores "
            "of 95% and 96 for the GameCube version. GameTrailers in their review called it one of the "
            "greatest games ever created.",
        },
    ]
    ground_truths = ["France", "GameTrailers"]

    adaptive_model_qa = Inferencer.load(
        "deepset/bert-medium-squad2-distilled", task_type="question_answering", batch_size=16, gpu=False
    )
    results = adaptive_model_qa.inference_from_dicts(
        dicts=qa_inputs_dicts, multiprocessing_chunksize=multiprocessing_chunksize
    )

    # sample results
    # [
    #     {
    #         "task": "qa",
    #         "predictions": [
    #             {
    #                 "question": "In what country is Normandy",
    #                 "question_id": "None",
    #                 "ground_truth": None,
    #                 "answers": [
    #                     {
    #                         "score": 1.1272038221359253,
    #                         "probability": -1,
    #                         "answer": "France",
    #                         "offset_answer_start": 54,
    #                         "offset_answer_end": 60,
    #                         "context": "The Normans gave their name to Normandy, a region in France.",
    #                         "offset_context_start": 0,
    #                         "offset_context_end": 60,
    #                         "document_id": None,
    #                     }
    #                 ]
    #             }
    #         ],
    #     }
    # ]
    predictions = list(results)[0]["predictions"]

    for prediction, ground_truth, qa_input_dict in zip(predictions, ground_truths, qa_inputs_dicts):
        assert prediction["question"] == qa_input_dict["questions"][0]
        answer = prediction["answers"][0]
        assert answer["answer"] in answer["context"]
        assert answer["answer"] == ground_truth
        assert {
            "answer",
            "score",
            "probability",
            "offset_answer_start",
            "offset_answer_end",
            "context",
            "offset_context_start",
            "offset_context_end",
            "document_id",
        } == answer.keys()
Add inferencer for QA only (#1484) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-22 16:56:51 +02:00			`import pytest`

test: move several modeling tests in e2e/ (#4308) * no dpr test seems worth mocking * move distillation tests * pylint * mypy * pylint * move feature_extraction tests as well * move feature_extraction tests as well * merge feature extractor suites * get_language_model tests and adaptive model tests * duplicate test * moving fixtures * mypy * mypy-again * trigger * un-mock integration test * review feedback * feedback * pylint 2023-04-28 17:08:41 +02:00			`from haystack.modeling.infer import Inferencer`

Add inferencer for QA only (#1484) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-22 16:56:51 +02:00
			`@pytest.mark.parametrize("multiprocessing_chunksize", [None, 2])`
test: move several modeling tests in e2e/ (#4308) * no dpr test seems worth mocking * move distillation tests * pylint * mypy * pylint * move feature_extraction tests as well * move feature_extraction tests as well * merge feature extractor suites * get_language_model tests and adaptive model tests * duplicate test * moving fixtures * mypy * mypy-again * trigger * un-mock integration test * review feedback * feedback * pylint 2023-04-28 17:08:41 +02:00			`def test_qa_format_and_results(multiprocessing_chunksize):`
Add inferencer for QA only (#1484) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-22 16:56:51 +02:00			`qa_inputs_dicts = [`
			`{`
			`"questions": ["In what country is Normandy"],`
			`"text": "The Normans are an ethnic group that arose in Normandy, a northern region "`
			`"of France, from contact between Viking settlers and indigenous Franks and Gallo-Romans",`
			`},`
			`{`
			`"questions": ["Who counted the game among the best ever made?"],`
			`"text": "Twilight Princess was released to universal critical acclaim and commercial success. It received "`
			`"perfect scores from major publications such as 1UP.com, Computer and Video Games, Electronic "`
			`"Gaming Monthly, Game Informer, GamesRadar, and GameSpy. On the review aggregators GameRankings "`
			`"and Metacritic, Twilight Princess has average scores of 95% and 95 for the Wii version and scores "`
			`"of 95% and 96 for the GameCube version. GameTrailers in their review called it one of the "`
			`"greatest games ever created.",`
			`},`
			`]`
			`ground_truths = ["France", "GameTrailers"]`

test: move several modeling tests in e2e/ (#4308) * no dpr test seems worth mocking * move distillation tests * pylint * mypy * pylint * move feature_extraction tests as well * move feature_extraction tests as well * merge feature extractor suites * get_language_model tests and adaptive model tests * duplicate test * moving fixtures * mypy * mypy-again * trigger * un-mock integration test * review feedback * feedback * pylint 2023-04-28 17:08:41 +02:00			`adaptive_model_qa = Inferencer.load(`
			`"deepset/bert-medium-squad2-distilled", task_type="question_answering", batch_size=16, gpu=False`
			`)`
Add inferencer for QA only (#1484) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-22 16:56:51 +02:00			`results = adaptive_model_qa.inference_from_dicts(`
fix pip backtracking issue (#2281) * fix pip backtracking issue * restrict azure-core version * Remove the trailing comma * Add skip_magic_trailing_comma in pyproject.toml for pydoc compatibility * Pin pydoc-markdown _again_ Co-authored-by: Sara Zan <sarazanzo94@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-07 19:25:33 +01:00			`dicts=qa_inputs_dicts, multiprocessing_chunksize=multiprocessing_chunksize`
Add inferencer for QA only (#1484) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-22 16:56:51 +02:00			`)`
test: move several modeling tests in e2e/ (#4308) * no dpr test seems worth mocking * move distillation tests * pylint * mypy * pylint * move feature_extraction tests as well * move feature_extraction tests as well * merge feature extractor suites * get_language_model tests and adaptive model tests * duplicate test * moving fixtures * mypy * mypy-again * trigger * un-mock integration test * review feedback * feedback * pylint 2023-04-28 17:08:41 +02:00
Add inferencer for QA only (#1484) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-22 16:56:51 +02:00			`# sample results`
			`# [`
			`# {`
			`# "task": "qa",`
			`# "predictions": [`
			`# {`
			`# "question": "In what country is Normandy",`
			`# "question_id": "None",`
			`# "ground_truth": None,`
			`# "answers": [`
			`# {`
			`# "score": 1.1272038221359253,`
			`# "probability": -1,`
			`# "answer": "France",`
			`# "offset_answer_start": 54,`
			`# "offset_answer_end": 60,`
			`# "context": "The Normans gave their name to Normandy, a region in France.",`
			`# "offset_context_start": 0,`
			`# "offset_context_end": 60,`
			`# "document_id": None,`
			`# }`
			`# ]`
			`# }`
			`# ],`
			`# }`
			`# ]`
			`predictions = list(results)[0]["predictions"]`

Apply black formatting (#2115) * Testing black on ui/ * Applying black on docstores * Add latest docstring and tutorial changes * Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too * Remove comments * Relax constraints on pydoc-markdown * Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade * Fix a couple of bugs * Add a type: ignore that was missing somehow * Give path to black * Apply Black * Apply Black * Relocate a couple of type: ignore * Update documentation * Make Linux CI run after applying Black * Triggering Black * Apply Black * Remove dependency, does not work well * Remove manually double trailing commas * Update documentation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-02-03 13:43:18 +01:00			`for prediction, ground_truth, qa_input_dict in zip(predictions, ground_truths, qa_inputs_dicts):`
Add inferencer for QA only (#1484) * Add inferencer for QA only * Add latest docstring and tutorial changes * Add QA inferencer tests * Add type annotations for inferencer * Fix type annotations, move util functions * Fix type annotations * Move fixtures to the top of the file Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-22 16:56:51 +02:00			`assert prediction["question"] == qa_input_dict["questions"][0]`
			`answer = prediction["answers"][0]`
			`assert answer["answer"] in answer["context"]`
			`assert answer["answer"] == ground_truth`
Apply black formatting (#2115) * Testing black on ui/ * Applying black on docstores * Add latest docstring and tutorial changes * Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too * Remove comments * Relax constraints on pydoc-markdown * Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade * Fix a couple of bugs * Add a type: ignore that was missing somehow * Give path to black * Apply Black * Apply Black * Relocate a couple of type: ignore * Update documentation * Make Linux CI run after applying Black * Triggering Black * Apply Black * Remove dependency, does not work well * Remove manually double trailing commas * Update documentation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-02-03 13:43:18 +01:00			`assert {`
			`"answer",`
			`"score",`
			`"probability",`
			`"offset_answer_start",`
			`"offset_answer_end",`
			`"context",`
			`"offset_context_start",`
			`"offset_context_end",`
			`"document_id",`
			`} == answer.keys()`