claude-task-master/.taskmaster/docs/tdd-workflow-phase-1-orchestrator.md
Ralph Khreish ccb87a516a
feat: implement tdd workflow (#1309)
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-18 16:29:03 +02:00

12 KiB

Phase 1: Core Rails - State Machine & Orchestration

Objective

Build the WorkflowOrchestrator as a state machine that guides AI sessions through TDD workflow, rather than directly executing code.

Architecture Overview

Execution Model

The orchestrator acts as a state manager and guide, not a code executor:

┌─────────────────────────────────────────────────────────────┐
│                    Claude Code (MCP Client)                  │
│  - Queries "what to do next"                                │
│  - Executes work (writes tests, code, runs commands)        │
│  - Reports completion                                        │
└────────────────┬────────────────────────────────────────────┘
                 │ MCP Protocol
                 ▼
┌─────────────────────────────────────────────────────────────┐
│              WorkflowOrchestrator (tm-core)                  │
│  - Maintains state machine (RED → GREEN → COMMIT)           │
│  - Returns work units with context                          │
│  - Validates preconditions                                  │
│  - Records progress                                         │
│  - Persists state for resumability                          │
└─────────────────────────────────────────────────────────────┘

Why This Approach?

  1. Separation of Concerns: State management separate from code execution
  2. Leverage Existing Tools: Uses Claude Code's capabilities instead of reimplementing
  3. Human-in-the-Loop: Easy to inspect state and intervene at any phase
  4. Simpler Implementation: Orchestrator is pure logic, no AI model integration needed
  5. Flexible Executors: Any tool (Claude Code, human, other AI) can execute work units

Core Components

1. WorkflowOrchestrator Service

Location: packages/tm-core/src/services/workflow-orchestrator.service.ts

Responsibilities:

  • Track current phase (RED/GREEN/COMMIT) per subtask
  • Generate work units with context for each phase
  • Validate phase completion criteria
  • Advance state machine on successful completion
  • Handle errors and retry logic
  • Persist run state for resumability

API:

interface WorkflowOrchestrator {
  // Start a new autopilot run
  startRun(taskId: string, options?: RunOptions): Promise<RunContext>;

  // Get next work unit to execute
  getNextWorkUnit(runId: string): Promise<WorkUnit | null>;

  // Report work unit completion
  completeWorkUnit(
    runId: string,
    workUnitId: string,
    result: WorkUnitResult
  ): Promise<void>;

  // Get current run state
  getRunState(runId: string): Promise<RunState>;

  // Pause/resume
  pauseRun(runId: string): Promise<void>;
  resumeRun(runId: string): Promise<void>;
}

interface WorkUnit {
  id: string;                    // Unique work unit ID
  phase: 'RED' | 'GREEN' | 'COMMIT';
  subtaskId: string;             // e.g., "42.1"
  action: string;                // Human-readable description
  context: WorkUnitContext;      // All info needed to execute
  preconditions: Precondition[]; // Checks before execution
}

interface WorkUnitContext {
  taskId: string;
  taskTitle: string;
  subtaskTitle: string;
  subtaskDescription: string;
  dependencies: string[];        // Completed subtask IDs
  testCommand: string;           // e.g., "npm test"

  // Phase-specific context
  redPhase?: {
    testFile: string;            // Where to create test
    testFramework: string;       // e.g., "vitest"
    acceptanceCriteria: string[];
  };

  greenPhase?: {
    testFile: string;            // Test to make pass
    implementationHints: string[];
    expectedFiles: string[];     // Files likely to modify
  };

  commitPhase?: {
    commitMessage: string;       // Pre-generated message
    filesToCommit: string[];     // Files modified in RED+GREEN
  };
}

interface WorkUnitResult {
  success: boolean;
  phase: 'RED' | 'GREEN' | 'COMMIT';

  // RED phase results
  testsCreated?: string[];
  testsFailed?: number;

  // GREEN phase results
  testsPassed?: number;
  filesModified?: string[];
  attempts?: number;

  // COMMIT phase results
  commitSha?: string;

  // Common
  error?: string;
  logs?: string;
}

interface RunState {
  runId: string;
  taskId: string;
  status: 'running' | 'paused' | 'completed' | 'failed';
  currentPhase: 'RED' | 'GREEN' | 'COMMIT';
  currentSubtask: string;
  completedSubtasks: string[];
  failedSubtasks: string[];
  startTime: Date;
  lastUpdateTime: Date;

  // Resumability
  checkpoint: {
    subtaskId: string;
    phase: 'RED' | 'GREEN' | 'COMMIT';
    attemptNumber: number;
  };
}

2. State Machine Logic

Phase Transitions:

START → RED(subtask 1) → GREEN(subtask 1) → COMMIT(subtask 1)
                                               ↓
                        RED(subtask 2) ← ─ ─ ─ ┘
                             ↓
                        GREEN(subtask 2)
                             ↓
                        COMMIT(subtask 2)
                             ↓
                           (repeat for remaining subtasks)
                             ↓
                        FINALIZE → END

Phase Rules:

  • RED: Can only transition to GREEN if tests created and failing
  • GREEN: Can only transition to COMMIT if tests passing (attempt < maxAttempts)
  • COMMIT: Can only transition to next RED if commit successful
  • FINALIZE: Can only start if all subtasks completed

Preconditions:

  • RED: No uncommitted changes (or staged from previous GREEN that failed)
  • GREEN: RED phase complete, tests exist and are failing
  • COMMIT: GREEN phase complete, all tests passing, coverage meets threshold

3. MCP Integration

New MCP Tools (expose WorkflowOrchestrator via MCP):

// Start an autopilot run
mcp__task_master_ai__autopilot_start(taskId: string, dryRun?: boolean)

// Get next work unit
mcp__task_master_ai__autopilot_next_work_unit(runId: string)

// Complete current work unit
mcp__task_master_ai__autopilot_complete_work_unit(
  runId: string,
  workUnitId: string,
  result: WorkUnitResult
)

// Get run state
mcp__task_master_ai__autopilot_get_state(runId: string)

// Pause/resume
mcp__task_master_ai__autopilot_pause(runId: string)
mcp__task_master_ai__autopilot_resume(runId: string)

4. Git/Test Adapters

GitAdapter (packages/tm-core/src/services/git-adapter.service.ts):

  • Check working tree status
  • Validate branch state
  • Read git config (user, remote, default branch)
  • Does NOT execute git commands (that's executor's job)

TestAdapter (packages/tm-core/src/services/test-adapter.service.ts):

  • Detect test framework from package.json
  • Parse test output (failures, passes, coverage)
  • Validate coverage thresholds
  • Does NOT run tests (that's executor's job)

5. Run State Persistence

Storage Location: .taskmaster/reports/runs/<runId>/

Files:

  • state.json - Current run state (for resumability)
  • log.jsonl - Event stream (timestamped work unit completions)
  • manifest.json - Run metadata
  • work-units.json - All work units generated for this run

Example state.json:

{
  "runId": "2025-01-15-142033",
  "taskId": "42",
  "status": "paused",
  "currentPhase": "GREEN",
  "currentSubtask": "42.2",
  "completedSubtasks": ["42.1"],
  "failedSubtasks": [],
  "checkpoint": {
    "subtaskId": "42.2",
    "phase": "GREEN",
    "attemptNumber": 2
  },
  "startTime": "2025-01-15T14:20:33Z",
  "lastUpdateTime": "2025-01-15T14:35:12Z"
}

Implementation Plan

Step 1: WorkflowOrchestrator Skeleton

  • Create workflow-orchestrator.service.ts with interfaces
  • Implement state machine logic (phase transitions)
  • Add run state persistence (state.json, log.jsonl)
  • Write unit tests for state machine

Step 2: Work Unit Generation

  • Implement getNextWorkUnit() with context assembly
  • Generate RED phase work units (test file paths, criteria)
  • Generate GREEN phase work units (implementation hints)
  • Generate COMMIT phase work units (commit messages)

Step 3: Git/Test Adapters

  • Create GitAdapter for status checks only
  • Create TestAdapter for output parsing only
  • Add precondition validation using adapters
  • Write adapter unit tests

Step 4: MCP Integration

  • Add MCP tool definitions in packages/mcp-server/src/tools/
  • Wire up WorkflowOrchestrator to MCP tools
  • Test MCP tools via Claude Code
  • Document MCP workflow in CLAUDE.md

Step 5: CLI Integration

  • Update autopilot.command.ts to call WorkflowOrchestrator
  • Add --interactive mode that shows work units and waits for completion
  • Add --resume flag to continue paused runs
  • Test end-to-end flow

Step 6: Integration Testing

  • Create test task with 2-3 subtasks
  • Run autopilot start → get work unit → complete → repeat
  • Verify state persistence and resumability
  • Test failure scenarios (test failures, git issues)

Success Criteria

  • WorkflowOrchestrator can generate work units for all phases
  • MCP tools allow Claude Code to query and complete work units
  • State persists correctly between work unit completions
  • Run can be paused and resumed from checkpoint
  • Adapters validate preconditions without executing commands
  • End-to-end: Claude Code can complete a simple task via work units

Out of Scope (Phase 1)

  • Actual git operations (branch creation, commits) - executor handles this
  • Actual test execution - executor handles this
  • PR creation - deferred to Phase 2
  • TUI interface - deferred to Phase 3
  • Coverage enforcement - deferred to Phase 2

Example Usage Flow

# Terminal 1: Claude Code session
$ claude

# In Claude Code (via MCP):
> Start autopilot for task 42
[Calls mcp__task_master_ai__autopilot_start(42)]
→ Run started: run-2025-01-15-142033

> Get next work unit
[Calls mcp__task_master_ai__autopilot_next_work_unit(run-2025-01-15-142033)]
→ Work unit: RED phase for subtask 42.1
→ Action: Generate failing tests for metrics schema
→ Test file: src/__tests__/schema.test.js
→ Framework: vitest

> [Claude Code creates test file, runs tests]

> Complete work unit
[Calls mcp__task_master_ai__autopilot_complete_work_unit(
  run-2025-01-15-142033,
  workUnit-42.1-RED,
  { success: true, testsCreated: ['src/__tests__/schema.test.js'], testsFailed: 3 }
)]
→ Work unit completed. State saved.

> Get next work unit
[Calls mcp__task_master_ai__autopilot_next_work_unit(run-2025-01-15-142033)]
→ Work unit: GREEN phase for subtask 42.1
→ Action: Implement code to pass failing tests
→ Test file: src/__tests__/schema.test.js
→ Expected implementation: src/schema.js

> [Claude Code implements schema.js, runs tests, confirms all pass]

> Complete work unit
[...]
→ Work unit completed. Ready for COMMIT.

> Get next work unit
[...]
→ Work unit: COMMIT phase for subtask 42.1
→ Commit message: "feat(metrics): add metrics schema (task 42.1)"
→ Files to commit: src/__tests__/schema.test.js, src/schema.js

> [Claude Code stages files and commits]

> Complete work unit
[...]
→ Subtask 42.1 complete! Moving to 42.2...

Dependencies

  • Existing TaskService (task loading, status updates)
  • Existing PreflightChecker (environment validation)
  • Existing TaskLoaderService (dependency ordering)
  • MCP server infrastructure

Estimated Effort

7-10 days

Next Phase

Phase 2 will add:

  • PR creation via gh CLI
  • Coverage enforcement
  • Enhanced error recovery
  • Full resumability testing