claude-task-master/.taskmaster/docs/tdd-workflow-phase-1-core-rails.md
Ralph Khreish ccb87a516a
feat: implement tdd workflow (#1309)
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-18 16:29:03 +02:00

36 KiB

Phase 1: Core Rails - Autonomous TDD Workflow

Objective

Implement the core autonomous TDD workflow with safe git operations, test generation/execution, and commit gating.

Scope

  • WorkflowOrchestrator with event stream
  • GitAdapter and TestResultValidator
  • Subtask loop (RED → GREEN → COMMIT)
  • CLI commands for AI agent orchestration
  • MCP tools for AI agent orchestration
  • Test result validation (AI reports, TaskMaster validates)
  • Commit creation with enhanced metadata
  • Branch/tag mapping
  • Global storage for state and activity logs
  • Framework-agnostic design (AI runs tests, not TaskMaster)
  • Run report persistence

Key Design Decisions

Global Storage (~/.taskmaster/)

  • Why: Keeps project directory clean, client-friendly, no tooling evidence in PRs
  • What: All runtime state, logs, and throwaway artifacts
  • Where: ~/.taskmaster/projects/<project-path>/runs/<run-id>/

Dual System: State + Activity Log

  • State (state.json): For orchestration, tells AI what to do next, mutable
  • Activity Log (activity.jsonl): For debugging/audit, append-only event stream
  • Separation: Optimizes for different use cases (fast reads vs. complete history)

Enhanced Commit Messages

  • Why: Enables future task-checker bot validation without external dependencies
  • What: Embeds task ID, phase, tag, test counts, coverage in commit body
  • Benefit: PR contains full context for review and automated validation

Worktree Support

  • Why: Enables parallel autonomous agents on different branches
  • How: Each worktree has independent global state directory
  • Isolation: No conflicts, complete separation

Framework-Agnostic Test Execution

  • AI runs tests: AI agent knows project context and test framework (npm test, pytest, go test)
  • TaskMaster validates: Only checks that RED fails and GREEN passes
  • No framework detection: TaskMaster doesn't need to know Jest vs Vitest vs pytest
  • Trust but verify: AI reports results, TaskMaster validates they make sense
  • Language agnostic: Works with any language/framework without TaskMaster changes

AI Agent Orchestration Model

  • Who executes: User's AI agent (Claude Code, Cursor, Windsurf, etc.) - not TaskMaster
  • TaskMaster's role: Workflow orchestration, validation, commit creation
  • AI agent's role: Code generation, test execution, result reporting
  • Communication: Via CLI commands or MCP tools
  • State-driven: AI agent reads state.json to know what to do next

Separation of Concerns:

TaskMaster Responsibilities AI Agent Responsibilities
Workflow state machine Generate tests
Validate phase transitions Run tests (knows test framework)
Create commits with metadata Implement code
Store activity logs Report test results
Manage git operations Understand project context
Track progress Choose appropriate test commands

Flow:

AI Agent                                    TaskMaster
   │                                            │
   ├──► tm autopilot start 1                    │
   │                                            ├──► Creates state, branch
   │                                            ├──► Returns: "next action: RED phase for 1.1"
   │                                            │
   ├──► tm autopilot next                       │
   │                                            ├──► Reads state.json
   │                                            ├──► Returns: { phase: "red", subtask: "1.1", context: {...} }
   │                                            │
   │    Generate tests (AI does this)           │
   │    npm test (AI runs this)                 │
   │    Results: 3 failed, 0 passed             │
   │                                            │
   ├──► tm autopilot complete red 1.1 \         │
   │    --results="failed:3,passed:0"           │
   │                                            ├──► Validates: tests failed ✓
   │                                            ├──► Updates state to GREEN
   │                                            ├──► Returns: "next action: GREEN phase"
   │                                            │
   ├──► tm autopilot next                       │
   │                                            ├──► Returns: { phase: "green", subtask: "1.1" }
   │                                            │
   │    Implement code (AI does this)           │
   │    npm test (AI runs this)                 │
   │    Results: 3 passed, 0 failed             │
   │                                            │
   ├──► tm autopilot complete green 1.1 \       │
   │    --results="passed:3,failed:0"           │
   │                                            ├──► Validates: tests passed ✓
   │                                            ├──► Updates state to COMMIT
   │                                            ├──► Returns: "next action: COMMIT phase"
   │                                            │
   ├──► tm autopilot commit 1.1                 │
   │                                            ├──► Detects changed files (git status)
   │                                            ├──► Stages files
   │                                            ├──► Creates commit with metadata
   │                                            ├──► Updates state to next subtask
   │                                            ├──► Returns: { sha: "a1b2c3d", nextAction: {...} }
   │                                            │
   └──► Loop continues...                        │

Key principle: AI agent is the domain expert (knows the codebase, frameworks, tools). TaskMaster is the workflow expert (knows TDD process, state management, git operations).

Deliverables

1. WorkflowOrchestrator (packages/tm-core/src/services/workflow-orchestrator.ts)

Responsibilities:

  • State machine driving phases: Preflight → Branch/Tag → SubtaskIter → Finalize
  • Event emission for progress tracking
  • Coordination of Git, Test, and Executor adapters
  • Run state persistence

API:

class WorkflowOrchestrator {
  async executeTask(taskId: string, options: AutopilotOptions): Promise<RunResult>
  async resume(runId: string): Promise<RunResult>
  on(event: string, handler: (data: any) => void): void

  // Events emitted:
  // - 'phase:start' { phase, timestamp }
  // - 'phase:complete' { phase, status, timestamp }
  // - 'subtask:start' { subtaskId, phase }
  // - 'subtask:complete' { subtaskId, phase, status }
  // - 'test:run' { subtaskId, phase, results }
  // - 'commit:created' { subtaskId, sha, message }
  // - 'error' { phase, error, recoverable }
}

State Machine Phases:

  1. Preflight - validate environment
  2. BranchSetup - create branch, set tag
  3. SubtaskLoop - for each subtask: RED → GREEN → COMMIT
  4. Finalize - full test suite, coverage check
  5. Complete - run report, cleanup

2. GitAdapter (packages/tm-core/src/services/git-adapter.ts)

Responsibilities:

  • All git operations with safety checks
  • Branch name generation from tag/task
  • Confirmation gates for destructive operations

API:

class GitAdapter {
  async isWorkingTreeClean(): Promise<boolean>
  async getCurrentBranch(): Promise<string>
  async getDefaultBranch(): Promise<string>
  async createBranch(name: string): Promise<void>
  async checkoutBranch(name: string): Promise<void>
  async commit(message: string, files?: string[]): Promise<string>
  async push(branch: string, remote?: string): Promise<void>

  // Safety checks
  async assertNotOnDefaultBranch(): Promise<void>
  async assertCleanOrConfirm(): Promise<void>

  // Branch naming
  generateBranchName(tag: string, taskId: string, slug: string): string
}

Guardrails:

  • Never allow commits on default branch
  • Always check working tree before branch creation
  • Confirm destructive operations unless --no-confirm flag

3. Test Result Validator (packages/tm-core/src/services/test-result-validator.ts)

Responsibilities:

  • Validate test results reported by AI agent
  • Ensure RED phase has failing tests
  • Ensure GREEN phase has passing tests
  • Enforce coverage thresholds (if provided)

API:

class TestResultValidator {
  async validateRedPhase(results: TestResults): Promise<ValidationResult>
  async validateGreenPhase(results: TestResults, coverage?: number): Promise<ValidationResult>
  async meetsThresholds(coverage: number): Promise<boolean>
}

interface TestResults {
  passed: number
  failed: number
  skipped?: number
  total: number
}

interface ValidationResult {
  valid: boolean
  message: string
  suggestion?: string
}

Validation Logic:

async function validateRedPhase(results: TestResults): ValidationResult {
  if (results.failed === 0) {
    return {
      valid: false,
      message: "RED phase requires failing tests. All tests passed.",
      suggestion: "Verify tests are checking expected behavior. Tests should fail before implementation."
    }
  }

  if (results.passed > 0) {
    return {
      valid: true,
      message: `RED phase valid: ${results.failed} failing, ${results.passed} passing (existing tests)`,
      warning: "Some tests passing - ensure new tests are failing"
    }
  }

  return {
    valid: true,
    message: `RED phase complete: ${results.failed} tests failing as expected`
  }
}

async function validateGreenPhase(results: TestResults): ValidationResult {
  if (results.failed > 0) {
    return {
      valid: false,
      message: `GREEN phase incomplete: ${results.failed} tests still failing`,
      suggestion: "Continue implementing until all tests pass or retry GREEN phase"
    }
  }

  return {
    valid: true,
    message: `GREEN phase complete: ${results.passed} tests passing`
  }
}

Note: AI agent is responsible for:

  • Running test commands (knows npm test vs pytest vs go test)
  • Parsing test output
  • Reporting results to TaskMaster

TaskMaster only validates the reported numbers make sense for the phase.

4. Test Generation Integration

Use Surgical Test Generator:

  • Load prompt from .claude/agents/surgical-test-generator.md
  • Compose with task/subtask context
  • Generate tests via executor (Claude)
  • Write test files to detected locations

Prompt Composition:

async function composeRedPrompt(subtask: Subtask, context: ProjectContext): Promise<string> {
  const systemPrompts = [
    loadFile('.cursor/rules/git_workflow.mdc'),
    loadFile('.cursor/rules/test_workflow.mdc'),
    loadFile('.claude/agents/surgical-test-generator.md')
  ]

  const taskContext = formatTaskContext(subtask)
  const instruction = formatRedInstruction(subtask, context)

  return [
    ...systemPrompts,
    '<TASK CONTEXT>',
    taskContext,
    '<INSTRUCTION>',
    instruction
  ].join('\n\n')
}

5. Subtask Loop Implementation

RED Phase:

  1. TaskMaster returns RED action with subtask context
  2. AI agent generates tests (TaskMaster not involved)
  3. AI agent writes test files (TaskMaster not involved)
  4. AI agent runs tests using project's test command (e.g., npm test)
  5. AI agent reports results: tm autopilot complete red <id> --results="failed:3,passed:0"
  6. TaskMaster validates: tests should have failures
  7. If validation fails (tests passed), return error with suggestion
  8. If validation passes, update state to GREEN, store results in activity log
  9. Return next action (GREEN phase)

GREEN Phase:

  1. TaskMaster returns GREEN action with subtask context
  2. AI agent implements code (TaskMaster not involved)
  3. AI agent runs tests using project's test command
  4. AI agent reports results: tm autopilot complete green <id> --results="passed:5,failed:0" --coverage="85"
  5. TaskMaster validates: all tests should pass
  6. If validation fails (tests still failing):
    • Increment attempt counter
    • If under max attempts: return GREEN action again with attempt number
    • If max attempts reached: save state, emit pause event, return resumable checkpoint
  7. If validation passes: update state to COMMIT, store results in activity log
  8. Return next action (COMMIT phase)

COMMIT Phase:

  1. TaskMaster receives commit command: tm autopilot commit <id>
  2. Detect changed files: git status --porcelain
  3. Validate coverage meets thresholds (if provided and threshold configured)
  4. Generate conventional commit message with task metadata
  5. Stage files: git add <files>
  6. Create commit: git commit -m "<message>"
  7. Update subtask status to 'done' in tasks.json
  8. Log commit event to activity.jsonl
  9. Update state to next subtask's RED phase
  10. Return next action

Key changes from original design:

  • AI agent runs all test commands (framework agnostic)
  • TaskMaster only validates reported results
  • No test framework detection needed
  • No test execution by TaskMaster
  • AI agent is trusted to report accurate results

6. Branch & Tag Management

Integration with existing tag system:

  • Use scripts/modules/task-manager/tag-management.js
  • Explicit tag switching when branch created
  • Store branch ↔ tag mapping in run state

Branch Naming:

  • Pattern from config: {tag}/task-{id}-{slug}
  • Default: analytics/task-42-user-metrics
  • Sanitize: lowercase, replace spaces with hyphens

7. Global Storage & State Management

Philosophy:

  • All runtime state, logs, and throwaway artifacts stored globally in ~/.taskmaster/
  • Project directory stays clean - only code changes and tasks.json versioned
  • Enables single-player autonomous mode without polluting PRs
  • Client-friendly: no evidence of tooling in source code

Global directory structure:

~/.taskmaster/
├── projects/
│   └── <project-path-normalized>/
│       ├── runs/
│       │   └── <tag>__task-<id>__<timestamp>/
│       │       ├── manifest.json          # run metadata
│       │       ├── activity.jsonl         # event stream (debugging)
│       │       ├── state.json             # resumable checkpoint
│       │       ├── commits.txt            # commit SHAs
│       │       └── test-results/
│       │           ├── subtask-1.1-red.json
│       │           ├── subtask-1.1-green.json
│       │           └── final-suite.json
│       └── tags/
│           └── <tag-name>/
│               └── current-run.json       # active run pointer
└── cache/
    └── templates/                          # shared templates

Project path normalization:

function getProjectStoragePath(projectRoot: string): string {
  const normalized = projectRoot
    .replace(/\//g, '-')
    .replace(/^-/, '')

  return path.join(os.homedir(), '.taskmaster', 'projects', normalized)
  // Example: ~/.taskmaster/projects/-Volumes-Workspace-contrib-task-master-claude-task-master
}

Run ID generation:

function generateRunId(tag: string, taskId: string): string {
  const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
  return `${tag}__task-${taskId}__${timestamp}`
  // Example: tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z
}

manifest.json:

{
  "runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
  "projectRoot": "/Volumes/Workspace/contrib/task-master/claude-task-master",
  "taskId": "1",
  "tag": "tdd-workflow-phase-0",
  "branch": "tdd-phase-0-implementation",
  "startTime": "2025-10-07T14:30:00Z",
  "endTime": null,
  "status": "in-progress",
  "currentPhase": "subtask-loop",
  "currentSubtask": "1.2",
  "subtasksCompleted": ["1.1"],
  "subtasksFailed": [],
  "totalCommits": 1
}

state.json (orchestration state):

{
  "runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
  "taskId": "1",
  "tag": "tdd-workflow-phase-0",
  "branch": "tdd-phase-0-implementation",
  "currentSubtask": "1.2",
  "currentPhase": "green",
  "attemptNumber": 1,
  "maxAttempts": 3,
  "completedSubtasks": ["1.1"],
  "pendingSubtasks": ["1.2", "1.3", "1.4"],
  "nextAction": {
    "type": "implement",
    "subtask": "1.2",
    "phase": "green",
    "context": {
      "testFile": "src/__tests__/preflight.test.ts",
      "failingTests": [
        "should detect test runner from package.json",
        "should validate git working tree"
      ],
      "implementationFiles": ["src/services/preflight-checker.ts"]
    }
  },
  "lastUpdated": "2025-10-07T14:31:45Z",
  "canResume": true
}

activity.jsonl (append-only event log):

{"ts":"2025-10-07T14:30:00Z","event":"phase:start","phase":"preflight","status":"ok"}
{"ts":"2025-10-07T14:30:15Z","event":"phase:complete","phase":"preflight","checks":{"git":true,"test":true,"tools":true}}
{"ts":"2025-10-07T14:30:20Z","event":"branch:created","branch":"tdd-phase-0-implementation"}
{"ts":"2025-10-07T14:30:22Z","event":"tag:switched","from":"master","to":"tdd-workflow-phase-0"}
{"ts":"2025-10-07T14:30:25Z","event":"subtask:start","subtaskId":"1.1","phase":"red"}
{"ts":"2025-10-07T14:31:10Z","event":"test:generated","files":["src/__tests__/autopilot.test.ts"],"testCount":3}
{"ts":"2025-10-07T14:31:15Z","event":"test:run","subtaskId":"1.1","phase":"red","passed":0,"failed":3,"status":"expected"}
{"ts":"2025-10-07T14:31:20Z","event":"phase:transition","from":"red","to":"green"}
{"ts":"2025-10-07T14:32:45Z","event":"code:modified","files":["src/commands/autopilot.ts"],"linesChanged":"+58,-0"}
{"ts":"2025-10-07T14:33:00Z","event":"test:run","subtaskId":"1.1","phase":"green","attempt":1,"passed":3,"failed":0,"status":"success"}
{"ts":"2025-10-07T14:33:15Z","event":"commit:created","subtaskId":"1.1","sha":"a1b2c3d","message":"feat(cli): add autopilot command skeleton (task 1.1)"}
{"ts":"2025-10-07T14:33:20Z","event":"subtask:complete","subtaskId":"1.1","duration":"180s"}

current-run.json (active run pointer):

{
  "runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
  "taskId": "1",
  "tag": "tdd-workflow-phase-0",
  "startTime": "2025-10-07T14:30:00Z",
  "status": "in-progress"
}

What stays in project (versioned):

<project>/
├── .taskmaster/
│   ├── tasks/
│   │   └── tasks.json              # ✅ Versioned (task definitions)
│   └── config.json                 # ✅ Versioned (shared config)
└── .gitignore                       # Add: .taskmaster/state/, .taskmaster/reports/

State vs Activity Log:

State File (state.json) Activity Log (activity.jsonl)
Current position Full history
What to do next What happened
Mutable (updated) Immutable (append-only)
For orchestration For debugging/audit
Single JSON object Line-delimited JSON
Small (~2KB) Can grow large

Resume logic:

async function resumeWorkflow(): Promise<void> {
  // 1. Find active run
  const currentRun = await loadJSON('~/.taskmaster/projects/<project>/tags/<tag>/current-run.json')

  // 2. Load state from that run
  const state = await loadJSON(`~/.taskmaster/projects/<project>/runs/${currentRun.runId}/state.json`)

  // 3. Continue from checkpoint
  return orchestrator.resumeFrom(state)
}

8. Enhanced Commit Message Format

Purpose:

  • Embed task context in commits for future validation
  • Enable task-checker bot to verify alignment
  • Provide audit trail without needing external logs in PR

Commit message template:

{type}({scope}): {summary} (task {taskId})

{detailed description}

Task: #{taskId} - {taskTitle}
Phase: {phaseName}
Tag: {tagName}

Tests: {testCount} passing
Coverage: {coveragePercent}% lines

Example commit:

feat(cli): add autopilot command skeleton (task 1.1)

Implements AutopilotCommand class with Commander.js integration.
Adds argument parsing for task ID and dry-run flag. Includes basic
command registration and help text following existing CLI patterns.

Task: #1.1 - Create command structure
Phase: Phase 0 - Spike
Tag: tdd-workflow-phase-0

Tests: 3 passing
Coverage: 92% lines

Conventional commit types:

  • feat - New feature or capability
  • fix - Bug fix
  • test - Test-only changes
  • refactor - Code restructuring without behavior change
  • docs - Documentation updates
  • chore - Build/tooling changes

Scope determination:

function determineScope(files: string[]): string {
  // Extract common scope from changed files
  const scopes = files.map(f => {
    if (f.startsWith('apps/cli/')) return 'cli'
    if (f.startsWith('packages/tm-core/')) return 'core'
    if (f.startsWith('packages/tm-mcp/')) return 'mcp'
    return 'misc'
  })

  // Use most common scope
  return mode(scopes)
}

Commit validation (future task-checker bot):

async function validateCommit(commit: Commit, task: Task): Promise<ValidationResult> {
  const taskId = extractTaskId(commit.message)  // "1.1"
  const task = await loadTask(taskId)

  return aiChecker.validate({
    commitDiff: commit.diff,
    commitMessage: commit.message,
    taskDescription: task.description,
    acceptanceCriteria: task.acceptanceCriteria,
    testStrategy: task.testStrategy
  })
}

9. CLI Commands for AI Agent Orchestration

New CLI commands (all under tm autopilot namespace):

# Start workflow - creates branch, initializes state
tm autopilot start <taskId> [options]
  --branch <name>       # Override branch name
  --no-confirm          # Skip confirmations
  --max-attempts <n>    # Override max GREEN attempts

# Get next action from state
tm autopilot next [options]
  --json                # Output as JSON for parsing

# Complete a phase and report test results
tm autopilot complete <phase> <subtaskId> --results="<passed:n,failed:n>" [options]
  # phase: red | green
  --results <passed:n,failed:n>  # Required: test results from AI
  --coverage <percentage>        # Optional: coverage percentage
  --files <file1,file2>          # Optional: files changed (auto-detected if omitted)

# Create commit (called by AI after GREEN passes)
tm autopilot commit <subtaskId> [options]
  --message <msg>       # Override commit message

# Resume from interrupted run
tm autopilot resume [options]
  --run-id <id>         # Specific run to resume

# Get current status
tm autopilot status
  --json                # Output as JSON

# Watch activity log in real-time
tm autopilot watch

# Abort current run
tm autopilot abort [options]
  --cleanup             # Delete branch and state

Command details:

tm autopilot start <taskId>

  • Creates global state directory
  • Creates feature branch
  • Switches tag
  • Initializes state.json with first subtask
  • Returns next action (RED phase for first subtask)

tm autopilot next

  • Reads ~/.taskmaster/projects/<project>/tags/<tag>/current-run.json
  • Reads ~/.taskmaster/projects/<project>/runs/<run-id>/state.json
  • Returns next action with full context

Output:

{
  "action": "red",
  "subtask": {
    "id": "1.1",
    "title": "Create command structure",
    "description": "...",
    "testStrategy": "..."
  },
  "context": {
    "projectRoot": "/path/to/project",
    "testPattern": "**/*.test.ts",
    "existingTests": []
  },
  "instructions": "Generate tests for this subtask. Tests should fail initially."
}

tm autopilot complete <phase> <subtaskId>

  • Receives test results from AI agent
  • Validates phase completion:
    • RED: Ensures reported results show failures
    • GREEN: Ensures reported results show all tests passing
  • Updates state to next phase
  • Logs event to activity.jsonl with test results
  • Returns next action

Examples:

# After AI generates tests and runs them
tm autopilot complete red 1.1 --results="failed:3,passed:0"

# After AI implements code and runs tests
tm autopilot complete green 1.1 --results="passed:3,failed:0" --coverage="92"

# With existing passing tests
tm autopilot complete red 1.1 --results="failed:3,passed:12"

tm autopilot commit <subtaskId>

  • Generates commit message from template
  • Stages files
  • Creates commit with enhanced message
  • Updates subtask status to 'done'
  • Updates state to next subtask
  • Returns next action

tm autopilot status

{
  "runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
  "taskId": "1",
  "currentSubtask": "1.2",
  "currentPhase": "green",
  "attemptNumber": 1,
  "progress": {
    "completed": ["1.1"],
    "current": "1.2",
    "remaining": ["1.3", "1.4"]
  },
  "commits": 1,
  "startTime": "2025-10-07T14:30:00Z",
  "duration": "5m 30s"
}

10. MCP Tools for AI Agent Orchestration

New MCP tools (add to packages/tm-mcp/src/tools/):

// autopilot_start
{
  name: "autopilot_start",
  description: "Start autonomous TDD workflow for a task",
  parameters: {
    taskId: string,
    options?: {
      branch?: string,
      maxAttempts?: number
    }
  },
  returns: {
    runId: string,
    branch: string,
    nextAction: NextAction
  }
}

// autopilot_next
{
  name: "autopilot_next",
  description: "Get next action from workflow state",
  parameters: {
    projectRoot?: string  // defaults to current
  },
  returns: {
    action: "red" | "green" | "commit" | "complete",
    subtask: Subtask,
    context: Context,
    instructions: string
  }
}

// autopilot_complete_phase
{
  name: "autopilot_complete_phase",
  description: "Report test results and validate phase completion",
  parameters: {
    phase: "red" | "green",
    subtaskId: string,
    testResults: {
      passed: number,
      failed: number,
      skipped?: number
    },
    coverage?: number,  // Optional coverage percentage
    files?: string[]    // Optional, auto-detected if not provided
  },
  returns: {
    validated: boolean,
    message: string,
    suggestion?: string,
    nextAction: NextAction
  }
}

// autopilot_commit
{
  name: "autopilot_commit",
  description: "Create commit for completed subtask",
  parameters: {
    subtaskId: string,
    files?: string[],
    message?: string  // Override
  },
  returns: {
    commitSha: string,
    message: string,
    nextAction: NextAction
  }
}

// autopilot_status
{
  name: "autopilot_status",
  description: "Get current workflow status",
  parameters: {
    projectRoot?: string
  },
  returns: {
    runId: string,
    taskId: string,
    currentSubtask: string,
    currentPhase: string,
    progress: Progress,
    commits: number
  }
}

// autopilot_resume
{
  name: "autopilot_resume",
  description: "Resume interrupted workflow",
  parameters: {
    runId?: string  // defaults to current
  },
  returns: {
    resumed: boolean,
    nextAction: NextAction
  }
}

MCP tool usage example (Claude Code session):

// AI agent calls MCP tools
const { runId, nextAction } = await mcp.autopilot_start({ taskId: "1" })

while (nextAction.action !== "complete") {
  const action = await mcp.autopilot_next()

  if (action.action === "red") {
    // AI generates tests
    const tests = await generateTests(action.subtask, action.context)
    await writeFiles(tests)

    // AI runs tests (using project's test command)
    const testOutput = await runCommand("npm test")  // or pytest, go test, etc.
    const results = parseTestOutput(testOutput)

    // Report results to TaskMaster
    const validation = await mcp.autopilot_complete_phase({
      phase: "red",
      subtaskId: action.subtask.id,
      testResults: {
        passed: results.passed,
        failed: results.failed,
        skipped: results.skipped
      }
    })

    if (!validation.validated) {
      console.error(validation.message)
      // Handle validation failure
    }
  }

  if (action.action === "green") {
    // AI implements code
    const impl = await implementCode(action.subtask, action.context)
    await writeFiles(impl)

    // AI runs tests again
    const testOutput = await runCommand("npm test")
    const results = parseTestOutput(testOutput)
    const coverage = parseCoverage(testOutput)

    // Report results to TaskMaster
    const validation = await mcp.autopilot_complete_phase({
      phase: "green",
      subtaskId: action.subtask.id,
      testResults: {
        passed: results.passed,
        failed: results.failed
      },
      coverage: coverage.lines
    })

    if (!validation.validated) {
      console.log(validation.message, validation.suggestion)
      // Retry or handle failure
    }
  }

  if (action.action === "commit") {
    // TaskMaster creates the commit
    const { commitSha, nextAction: next } = await mcp.autopilot_commit({
      subtaskId: action.subtask.id
    })

    nextAction = next
  }
}

11. AI Agent Instructions (CLAUDE.md integration)

Add to .claude/CLAUDE.md or .cursor/rules/:

## TaskMaster Autonomous Workflow

When working on tasks with `tm autopilot`:

1. **Start workflow:**
   ```bash
   tm autopilot start <taskId>
   ```

2. **Loop until complete:**
   ```bash
   # Get next action
   NEXT=$(tm autopilot next --json)
   ACTION=$(echo $NEXT | jq -r '.action')
   SUBTASK=$(echo $NEXT | jq -r '.subtask.id')

   case $ACTION in
     red)
       # 1. Generate tests based on instructions
       # 2. Write test files
       # 3. Run tests yourself (you know the test command)
       npm test  # or pytest, go test, cargo test, etc.

       # 4. Report results to TaskMaster
       tm autopilot complete red $SUBTASK --results="failed:3,passed:0"
       ;;

     green)
       # 1. Implement code to pass tests
       # 2. Write implementation files
       # 3. Run tests yourself
       npm test

       # 4. Report results to TaskMaster (include coverage if available)
       tm autopilot complete green $SUBTASK --results="passed:3,failed:0" --coverage="92"
       ;;

     commit)
       # TaskMaster handles git operations
       tm autopilot commit $SUBTASK
       ;;

     complete)
       echo "Workflow complete!"
       ;;
   esac
   ```

3. **State is preserved** - you can stop/resume anytime with `tm autopilot resume`

**Important:** You are responsible for:
- Running test commands (TaskMaster doesn't know your test framework)
- Parsing test output (passed/failed counts)
- Reporting accurate results

**Via MCP:**
Use `autopilot_*` tools for the same workflow with better integration.

Example AI agent prompt:

You are working autonomously on Task Master tasks using the autopilot workflow.

Instructions:
1. Call `tm autopilot next --json` to get your next action
2. Read the action type and context
3. Execute the action:
   - **RED**:
     * Generate tests that fail initially
     * Run tests: `npm test` (or appropriate test command)
     * Report: `tm autopilot complete red <id> --results="failed:n,passed:n"`
   - **GREEN**:
     * Implement code to pass the tests
     * Run tests: `npm test`
     * Report: `tm autopilot complete green <id> --results="passed:n,failed:n" --coverage="nn"`
   - **COMMIT**:
     * Call: `tm autopilot commit <id>` (TaskMaster handles git)
4. Repeat until action is "complete"

Always:
- Follow TDD principles (RED → GREEN → COMMIT)
- YOU run the tests (TaskMaster doesn't know test frameworks)
- Report accurate test results (passed/failed counts)
- Write minimal code to pass tests
- Check `tm autopilot status` if unsure of current state

You are responsible for:
- Knowing which test command to run (npm test, pytest, go test, etc.)
- Parsing test output to get pass/fail counts
- Understanding the project's testing framework
- Running tests after generating/implementing code

TaskMaster is responsible for:
- Validating your reported results make sense (RED should fail, GREEN should pass)
- Creating properly formatted git commits
- Managing workflow state and transitions

Success Criteria

  • Can execute a simple task end-to-end without manual intervention
  • All commits made on feature branch, never on default branch
  • Tests are generated before implementation (RED → GREEN order enforced)
  • Only commits when tests pass and coverage meets threshold
  • Run state is persisted and can be inspected post-run
  • Clear error messages when things go wrong
  • Orchestrator events allow CLI to show live progress

Configuration

Add to .taskmaster/config.json (versioned):

{
  "autopilot": {
    "enabled": true,
    "requireCleanWorkingTree": true,
    "commitTemplate": "{type}({scope}): {msg} (task {taskId})",
    "defaultCommitType": "feat",
    "maxGreenAttempts": 3,
    "testTimeout": 300000,
    "storage": {
      "location": "global",
      "basePath": "~/.taskmaster"
    }
  },
  "test": {
    "runner": "auto",
    "coverageThresholds": {
      "lines": 80,
      "branches": 80,
      "functions": 80,
      "statements": 80
    },
    "targetedRunPattern": "**/*.test.js"
  },
  "git": {
    "branchPattern": "{tag}/task-{id}-{slug}",
    "defaultRemote": "origin"
  }
}

Update .gitignore (keep project clean):

# TaskMaster runtime artifacts (stored globally, not needed in repo)
.taskmaster/state/
.taskmaster/reports/

# Keep these versioned
!.taskmaster/tasks/
!.taskmaster/config.json
!.taskmaster/docs/
!.taskmaster/templates/

Out of Scope (defer to Phase 2)

  • PR creation (gh integration)
  • Resume functionality (--resume flag)
  • Lint/format step
  • Multiple executor support (only Claude)

Implementation Order

Phase 1A: Infrastructure (Week 1)

  1. Global storage utilities (path normalization, run ID generation)
  2. Activity log writer (append-only JSONL)
  3. State manager (load/save/update state.json)
  4. GitAdapter with safety checks
  5. TestResultValidator (validate RED/GREEN phase results)

Phase 1B: Orchestration Core (Week 1-2)

  1. WorkflowOrchestrator state machine skeleton
  2. State transitions (Preflight → BranchSetup → SubtaskLoop → Finalize)
  3. Event emitter for activity logging
  4. Enhanced commit message generator

Phase 1C: TDD Loop (Week 2)

  1. RED phase validator (ensure tests fail)
  2. GREEN phase validator (ensure tests pass)
  3. COMMIT phase implementation (staging, committing)
  4. Subtask progression logic

Phase 1D: CLI Interface (Week 2-3)

  1. tm autopilot start command
  2. tm autopilot next command
  3. tm autopilot complete command
  4. tm autopilot commit command
  5. tm autopilot status command
  6. tm autopilot resume command

Phase 1E: MCP Interface (Week 3)

  1. autopilot_start tool
  2. autopilot_next tool
  3. autopilot_complete_phase tool
  4. autopilot_commit tool
  5. autopilot_status tool
  6. autopilot_resume tool

Phase 1F: Integration (Week 3)

  1. AI agent instruction templates
  2. Error handling and recovery
  3. Integration tests
  4. Documentation

Testing Strategy

  • Unit tests for global storage (path normalization, state management)
  • Unit tests for activity log (JSONL append, parsing)
  • Unit tests for each adapter (mock git/test commands)
  • Integration tests with real git repo (temporary directory)
  • End-to-end test with sample task in test project
  • Verify no commits on default branch (security test)
  • Verify commit gating works (force test failure, ensure no commit)
  • Verify enhanced commit messages include task context
  • Test resume from state checkpoint
  • Verify project directory stays clean (no runtime artifacts)

Dependencies

  • Phase 0 completed (CLI skeleton, preflight checks)
  • Existing TaskService and executor infrastructure
  • Surgical Test Generator prompt file exists

Estimated Effort

2-3 weeks

Risks & Mitigations

  • Risk: Test generation produces invalid/wrong tests

    • Mitigation: Use Surgical Test Generator prompt, add manual review step in early iterations
  • Risk: Implementation attempts timeout/fail repeatedly

    • Mitigation: Max attempts with pause/resume; store state for manual intervention
  • Risk: Coverage parsing fails on different test frameworks

    • Mitigation: Start with one framework (vitest), add parsers incrementally
  • Risk: Git operations fail (conflicts, permissions)

    • Mitigation: Detailed error messages, save state before destructive ops

Validation

Test with:

  • Simple task (1 subtask, clear requirements)
  • Medium task (3 subtasks with dependencies)
  • Task requiring multiple GREEN attempts
  • Task with dirty working tree (should error)
  • Task on default branch (should error)
  • Project without test command (should error with helpful message)
  • Verify global storage created in ~/.taskmaster/projects/<project>/
  • Verify activity log is valid JSONL and streamable
  • Verify state.json allows resumption
  • Verify commit messages include task metadata
  • Verify project directory contains no runtime artifacts after run
  • Test with multiple worktrees (independent state per worktree)