haystack/test/conftest.py

119 lines
3.6 KiB
Python
Raw Normal View History

# SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai>
#
# SPDX-License-Identifier: Apache-2.0
fix: pipeline run bugs in cyclic and acyclic pipelines (#8707) * add component checks * pipeline should run deterministically * add FIFOQueue * add agent tests * add order dependent tests * run new tests * remove code that is not needed * test: intermediate from cycle outputs are available outside cycle * add tests for component checks (Claude) * adapt tests for component checks (o1 review) * chore: format * remove tests that aren't needed anymore * add _calculate_priority tests * revert accidental change in pyproject.toml * test format conversion * adapt to naming convention * chore: proper docstrings and type hints for PQ * format * add more unit tests * rm unneeded comments * test input consumption * lint * fix: docstrings * lint * format * format * fix license header * fix license header * add component run tests * fix: pass correct input format to tracing * fix types * format * format * types * add defaults from Socket instead of signature - otherwise components with dynamic inputs would fail * fix test names * still wait for optional inputs on greedy variadic sockets - mirrors previous behavior * fix format * wip: warn for ambiguous running order * wip: alternative warning * fix license header * make code more readable Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * Introduce content tracing to a behavioral test * Fixing linting * Remove debug print statements * Fix tracer tests * remove print * test: test for component inputs * test: remove testing for run order * chore: update component checks from experimental * chore: update pipeline and base from experimental * refactor: remove unused method * refactor: remove unused method * refactor: outdated comment * refactor: inputs state is updated as side effect - to prepare for AsyncPipeline implementation * format * test: add file conversion test * format * fix: original implementation deepcopies outputs * lint * fix: from_dict was updated * fix: format * fix: test * test: add test for thread safety * remove unused imports * format * test: FIFOPriorityQueue * chore: add release note * fix: resolve merge conflict with mermaid changes * fix: format * fix: remove unused import * refactor: rename to avoid accidental conflicts * chore: remove unused inputs, add missing license header * chore: extend release notes * Update releasenotes/notes/fix-pipeline-run-2fefeafc705a6d91.yaml Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * fix: format * fix: format * Update release note --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 15:19:47 +01:00
import asyncio
import time
from pathlib import Path
from typing import Dict, Generator
fix: pipeline run bugs in cyclic and acyclic pipelines (#8707) * add component checks * pipeline should run deterministically * add FIFOQueue * add agent tests * add order dependent tests * run new tests * remove code that is not needed * test: intermediate from cycle outputs are available outside cycle * add tests for component checks (Claude) * adapt tests for component checks (o1 review) * chore: format * remove tests that aren't needed anymore * add _calculate_priority tests * revert accidental change in pyproject.toml * test format conversion * adapt to naming convention * chore: proper docstrings and type hints for PQ * format * add more unit tests * rm unneeded comments * test input consumption * lint * fix: docstrings * lint * format * format * fix license header * fix license header * add component run tests * fix: pass correct input format to tracing * fix types * format * format * types * add defaults from Socket instead of signature - otherwise components with dynamic inputs would fail * fix test names * still wait for optional inputs on greedy variadic sockets - mirrors previous behavior * fix format * wip: warn for ambiguous running order * wip: alternative warning * fix license header * make code more readable Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * Introduce content tracing to a behavioral test * Fixing linting * Remove debug print statements * Fix tracer tests * remove print * test: test for component inputs * test: remove testing for run order * chore: update component checks from experimental * chore: update pipeline and base from experimental * refactor: remove unused method * refactor: remove unused method * refactor: outdated comment * refactor: inputs state is updated as side effect - to prepare for AsyncPipeline implementation * format * test: add file conversion test * format * fix: original implementation deepcopies outputs * lint * fix: from_dict was updated * fix: format * fix: test * test: add test for thread safety * remove unused imports * format * test: FIFOPriorityQueue * chore: add release note * fix: resolve merge conflict with mermaid changes * fix: format * fix: remove unused import * refactor: rename to avoid accidental conflicts * chore: remove unused inputs, add missing license header * chore: extend release notes * Update releasenotes/notes/fix-pipeline-run-2fefeafc705a6d91.yaml Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * fix: format * fix: format * Update release note --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 15:19:47 +01:00
from unittest.mock import Mock
import pytest
from haystack import component, tracing
feat: adding debugging breakpoints to `Pipeline` and `Agent` (#9611) * wip: fixing tests * wip: fixing tests * wip: fixing tests * wip: fixing tests * fixing circular imports * decoupling resume and initial run() for agent * adding release notes * re-raising BreakPointException from pipeline.run() * fixing imports * refactor: Refactor suggestions for Pipeline breakpoints (#9614) * Refactoring * Start adding debug_path into Breakpoint class * Fully move debug_path into Breakpoint dataclass * Simplifications in pipeline run logic * More simplification * lint * More simplification * Updates * Rename resume_state to pipeline_snapshot * PR comments * Missed renaming of state in a few more places * feat: Add dataclasses to represent a `PipelineSnapshot` and refactored to use it (#9619) * Refactor to use dataclasses for PipelineSnapshot and AgentSnapshot * Fix integration tests * Mypy * Fix mypy * Fix lint * Refactor AgentSnapshot to only contain needed info * Fix mypy * More refactoring * removing unused import --------- Co-authored-by: David S. Batista <dsbatista@gmail.com> * feat: saving include_outputs_from intermediate results to `PipelineState` object (#9629) * saving intermediate components results in include_outputs_from into the PipelineSnaptshot * cleaning up * fixing tests * fixing tests * extending tests * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * linting * moving intermediate results to pipeline state and adding pipeline outputs to state * moving ordered_component_names and include_outputs_from to PipelineSnapshot * moving original_input_data to PipelineSnapshot * simplifying saving the intermediate results * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * linting * cleaning up * avoiding creating PipelineSnapshot for every component run * removing unecessary code * Update checks in Agent to not unecessarily create AgentSnapshot when not needed. * Update haystack/components/agents/agent.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/components/agents/agent.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * cleaning up tests * linting --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-07-24 09:54:23 +01:00
from haystack.core.pipeline.breakpoint import load_pipeline_snapshot
2023-11-24 14:48:43 +01:00
from haystack.testing.test_utils import set_all_seeds
from test.tracing.utils import SpyingTracer
set_all_seeds(0)
Introduce readonly DCDocumentStore (without labels support) (#1991) * minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
2022-01-25 20:36:28 +01:00
# Tracing is disable by default to avoid failures in CI
tracing.disable_tracing()
fix: pipeline run bugs in cyclic and acyclic pipelines (#8707) * add component checks * pipeline should run deterministically * add FIFOQueue * add agent tests * add order dependent tests * run new tests * remove code that is not needed * test: intermediate from cycle outputs are available outside cycle * add tests for component checks (Claude) * adapt tests for component checks (o1 review) * chore: format * remove tests that aren't needed anymore * add _calculate_priority tests * revert accidental change in pyproject.toml * test format conversion * adapt to naming convention * chore: proper docstrings and type hints for PQ * format * add more unit tests * rm unneeded comments * test input consumption * lint * fix: docstrings * lint * format * format * fix license header * fix license header * add component run tests * fix: pass correct input format to tracing * fix types * format * format * types * add defaults from Socket instead of signature - otherwise components with dynamic inputs would fail * fix test names * still wait for optional inputs on greedy variadic sockets - mirrors previous behavior * fix format * wip: warn for ambiguous running order * wip: alternative warning * fix license header * make code more readable Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * Introduce content tracing to a behavioral test * Fixing linting * Remove debug print statements * Fix tracer tests * remove print * test: test for component inputs * test: remove testing for run order * chore: update component checks from experimental * chore: update pipeline and base from experimental * refactor: remove unused method * refactor: remove unused method * refactor: outdated comment * refactor: inputs state is updated as side effect - to prepare for AsyncPipeline implementation * format * test: add file conversion test * format * fix: original implementation deepcopies outputs * lint * fix: from_dict was updated * fix: format * fix: test * test: add test for thread safety * remove unused imports * format * test: FIFOPriorityQueue * chore: add release note * fix: resolve merge conflict with mermaid changes * fix: format * fix: remove unused import * refactor: rename to avoid accidental conflicts * chore: remove unused inputs, add missing license header * chore: extend release notes * Update releasenotes/notes/fix-pipeline-run-2fefeafc705a6d91.yaml Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * fix: format * fix: format * Update release note --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 15:19:47 +01:00
@pytest.fixture()
def waiting_component():
@component
class Waiter:
@component.output_types(waited_for=int)
def run(self, wait_for: int) -> Dict[str, int]:
time.sleep(wait_for)
return {"waited_for": wait_for}
@component.output_types(waited_for=int)
async def run_async(self, wait_for: int) -> Dict[str, int]:
await asyncio.sleep(wait_for)
return {"waited_for": wait_for}
return Waiter
@pytest.fixture()
def mock_tokenizer():
"""
Tokenizes the string by splitting on spaces.
"""
tokenizer = Mock()
tokenizer.encode = lambda text: text.split()
tokenizer.decode = lambda tokens: " ".join(tokens)
return tokenizer
@pytest.fixture()
def test_files_path():
return Path(__file__).parent / "test_files"
@pytest.fixture(autouse=True)
def request_blocker(request: pytest.FixtureRequest, monkeypatch):
"""
This fixture is applied automatically to all tests.
Those that are not marked as integration will have the requests module
monkeypatched to avoid making HTTP requests by mistake.
"""
marker = request.node.get_closest_marker("integration")
if marker is not None:
return
def urlopen_mock(self, method, url, *args, **kwargs):
raise RuntimeError(f"The test was about to {method} {self.scheme}://{self.host}{url}")
monkeypatch.setattr("urllib3.connectionpool.HTTPConnectionPool.urlopen", urlopen_mock)
@pytest.fixture()
def spying_tracer() -> Generator[SpyingTracer, None, None]:
tracer = SpyingTracer()
tracing.enable_tracing(tracer)
fix: pipeline run bugs in cyclic and acyclic pipelines (#8707) * add component checks * pipeline should run deterministically * add FIFOQueue * add agent tests * add order dependent tests * run new tests * remove code that is not needed * test: intermediate from cycle outputs are available outside cycle * add tests for component checks (Claude) * adapt tests for component checks (o1 review) * chore: format * remove tests that aren't needed anymore * add _calculate_priority tests * revert accidental change in pyproject.toml * test format conversion * adapt to naming convention * chore: proper docstrings and type hints for PQ * format * add more unit tests * rm unneeded comments * test input consumption * lint * fix: docstrings * lint * format * format * fix license header * fix license header * add component run tests * fix: pass correct input format to tracing * fix types * format * format * types * add defaults from Socket instead of signature - otherwise components with dynamic inputs would fail * fix test names * still wait for optional inputs on greedy variadic sockets - mirrors previous behavior * fix format * wip: warn for ambiguous running order * wip: alternative warning * fix license header * make code more readable Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * Introduce content tracing to a behavioral test * Fixing linting * Remove debug print statements * Fix tracer tests * remove print * test: test for component inputs * test: remove testing for run order * chore: update component checks from experimental * chore: update pipeline and base from experimental * refactor: remove unused method * refactor: remove unused method * refactor: outdated comment * refactor: inputs state is updated as side effect - to prepare for AsyncPipeline implementation * format * test: add file conversion test * format * fix: original implementation deepcopies outputs * lint * fix: from_dict was updated * fix: format * fix: test * test: add test for thread safety * remove unused imports * format * test: FIFOPriorityQueue * chore: add release note * fix: resolve merge conflict with mermaid changes * fix: format * fix: remove unused import * refactor: rename to avoid accidental conflicts * chore: remove unused inputs, add missing license header * chore: extend release notes * Update releasenotes/notes/fix-pipeline-run-2fefeafc705a6d91.yaml Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * fix: format * fix: format * Update release note --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 15:19:47 +01:00
tracer.is_content_tracing_enabled = True
yield tracer
# Make sure to disable tracing after the test to avoid affecting other tests
tracing.disable_tracing()
feat: adding debugging breakpoints to `Pipeline` and `Agent` (#9611) * wip: fixing tests * wip: fixing tests * wip: fixing tests * wip: fixing tests * fixing circular imports * decoupling resume and initial run() for agent * adding release notes * re-raising BreakPointException from pipeline.run() * fixing imports * refactor: Refactor suggestions for Pipeline breakpoints (#9614) * Refactoring * Start adding debug_path into Breakpoint class * Fully move debug_path into Breakpoint dataclass * Simplifications in pipeline run logic * More simplification * lint * More simplification * Updates * Rename resume_state to pipeline_snapshot * PR comments * Missed renaming of state in a few more places * feat: Add dataclasses to represent a `PipelineSnapshot` and refactored to use it (#9619) * Refactor to use dataclasses for PipelineSnapshot and AgentSnapshot * Fix integration tests * Mypy * Fix mypy * Fix lint * Refactor AgentSnapshot to only contain needed info * Fix mypy * More refactoring * removing unused import --------- Co-authored-by: David S. Batista <dsbatista@gmail.com> * feat: saving include_outputs_from intermediate results to `PipelineState` object (#9629) * saving intermediate components results in include_outputs_from into the PipelineSnaptshot * cleaning up * fixing tests * fixing tests * extending tests * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * linting * moving intermediate results to pipeline state and adding pipeline outputs to state * moving ordered_component_names and include_outputs_from to PipelineSnapshot * moving original_input_data to PipelineSnapshot * simplifying saving the intermediate results * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/dataclasses/breakpoints.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * linting * cleaning up * avoiding creating PipelineSnapshot for every component run * removing unecessary code * Update checks in Agent to not unecessarily create AgentSnapshot when not needed. * Update haystack/components/agents/agent.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update haystack/components/agents/agent.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * cleaning up tests * linting --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-07-24 09:54:23 +01:00
def load_and_resume_pipeline_snapshot(pipeline, output_directory: Path, component_name: str, data: Dict = None) -> Dict:
"""
Utility function to load and resume pipeline snapshot from a breakpoint file.
:param pipeline: The pipeline instance to resume
:param output_directory: Directory containing the breakpoint files
:param component_name: Component name to look for in breakpoint files
:param data: Data to pass to the pipeline run (defaults to empty dict)
:returns:
Dict containing the pipeline run results
:raises:
ValueError: If no breakpoint file is found for the given component
"""
data = data or {}
all_files = list(output_directory.glob("*"))
file_found = False
for full_path in all_files:
f_name = Path(full_path).name
if str(f_name).startswith(component_name):
pipeline_snapshot = load_pipeline_snapshot(full_path)
return pipeline.run(data=data, pipeline_snapshot=pipeline_snapshot)
if not file_found:
msg = f"No files found for {component_name} in {output_directory}."
raise ValueError(msg)
@pytest.fixture()
def base64_image_string():
return "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+ip1sAAAAASUVORK5CYII="