Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
36 KiB
Phase 1: Core Rails - Autonomous TDD Workflow
Objective
Implement the core autonomous TDD workflow with safe git operations, test generation/execution, and commit gating.
Scope
- WorkflowOrchestrator with event stream
- GitAdapter and TestResultValidator
- Subtask loop (RED → GREEN → COMMIT)
- CLI commands for AI agent orchestration
- MCP tools for AI agent orchestration
- Test result validation (AI reports, TaskMaster validates)
- Commit creation with enhanced metadata
- Branch/tag mapping
- Global storage for state and activity logs
- Framework-agnostic design (AI runs tests, not TaskMaster)
- Run report persistence
Key Design Decisions
Global Storage (~/.taskmaster/)
- Why: Keeps project directory clean, client-friendly, no tooling evidence in PRs
- What: All runtime state, logs, and throwaway artifacts
- Where:
~/.taskmaster/projects/<project-path>/runs/<run-id>/
Dual System: State + Activity Log
- State (
state.json): For orchestration, tells AI what to do next, mutable - Activity Log (
activity.jsonl): For debugging/audit, append-only event stream - Separation: Optimizes for different use cases (fast reads vs. complete history)
Enhanced Commit Messages
- Why: Enables future task-checker bot validation without external dependencies
- What: Embeds task ID, phase, tag, test counts, coverage in commit body
- Benefit: PR contains full context for review and automated validation
Worktree Support
- Why: Enables parallel autonomous agents on different branches
- How: Each worktree has independent global state directory
- Isolation: No conflicts, complete separation
Framework-Agnostic Test Execution
- AI runs tests: AI agent knows project context and test framework (npm test, pytest, go test)
- TaskMaster validates: Only checks that RED fails and GREEN passes
- No framework detection: TaskMaster doesn't need to know Jest vs Vitest vs pytest
- Trust but verify: AI reports results, TaskMaster validates they make sense
- Language agnostic: Works with any language/framework without TaskMaster changes
AI Agent Orchestration Model
- Who executes: User's AI agent (Claude Code, Cursor, Windsurf, etc.) - not TaskMaster
- TaskMaster's role: Workflow orchestration, validation, commit creation
- AI agent's role: Code generation, test execution, result reporting
- Communication: Via CLI commands or MCP tools
- State-driven: AI agent reads
state.jsonto know what to do next
Separation of Concerns:
| TaskMaster Responsibilities | AI Agent Responsibilities |
|---|---|
| Workflow state machine | Generate tests |
| Validate phase transitions | Run tests (knows test framework) |
| Create commits with metadata | Implement code |
| Store activity logs | Report test results |
| Manage git operations | Understand project context |
| Track progress | Choose appropriate test commands |
Flow:
AI Agent TaskMaster
│ │
├──► tm autopilot start 1 │
│ ├──► Creates state, branch
│ ├──► Returns: "next action: RED phase for 1.1"
│ │
├──► tm autopilot next │
│ ├──► Reads state.json
│ ├──► Returns: { phase: "red", subtask: "1.1", context: {...} }
│ │
│ Generate tests (AI does this) │
│ npm test (AI runs this) │
│ Results: 3 failed, 0 passed │
│ │
├──► tm autopilot complete red 1.1 \ │
│ --results="failed:3,passed:0" │
│ ├──► Validates: tests failed ✓
│ ├──► Updates state to GREEN
│ ├──► Returns: "next action: GREEN phase"
│ │
├──► tm autopilot next │
│ ├──► Returns: { phase: "green", subtask: "1.1" }
│ │
│ Implement code (AI does this) │
│ npm test (AI runs this) │
│ Results: 3 passed, 0 failed │
│ │
├──► tm autopilot complete green 1.1 \ │
│ --results="passed:3,failed:0" │
│ ├──► Validates: tests passed ✓
│ ├──► Updates state to COMMIT
│ ├──► Returns: "next action: COMMIT phase"
│ │
├──► tm autopilot commit 1.1 │
│ ├──► Detects changed files (git status)
│ ├──► Stages files
│ ├──► Creates commit with metadata
│ ├──► Updates state to next subtask
│ ├──► Returns: { sha: "a1b2c3d", nextAction: {...} }
│ │
└──► Loop continues... │
Key principle: AI agent is the domain expert (knows the codebase, frameworks, tools). TaskMaster is the workflow expert (knows TDD process, state management, git operations).
Deliverables
1. WorkflowOrchestrator (packages/tm-core/src/services/workflow-orchestrator.ts)
Responsibilities:
- State machine driving phases: Preflight → Branch/Tag → SubtaskIter → Finalize
- Event emission for progress tracking
- Coordination of Git, Test, and Executor adapters
- Run state persistence
API:
class WorkflowOrchestrator {
async executeTask(taskId: string, options: AutopilotOptions): Promise<RunResult>
async resume(runId: string): Promise<RunResult>
on(event: string, handler: (data: any) => void): void
// Events emitted:
// - 'phase:start' { phase, timestamp }
// - 'phase:complete' { phase, status, timestamp }
// - 'subtask:start' { subtaskId, phase }
// - 'subtask:complete' { subtaskId, phase, status }
// - 'test:run' { subtaskId, phase, results }
// - 'commit:created' { subtaskId, sha, message }
// - 'error' { phase, error, recoverable }
}
State Machine Phases:
- Preflight - validate environment
- BranchSetup - create branch, set tag
- SubtaskLoop - for each subtask: RED → GREEN → COMMIT
- Finalize - full test suite, coverage check
- Complete - run report, cleanup
2. GitAdapter (packages/tm-core/src/services/git-adapter.ts)
Responsibilities:
- All git operations with safety checks
- Branch name generation from tag/task
- Confirmation gates for destructive operations
API:
class GitAdapter {
async isWorkingTreeClean(): Promise<boolean>
async getCurrentBranch(): Promise<string>
async getDefaultBranch(): Promise<string>
async createBranch(name: string): Promise<void>
async checkoutBranch(name: string): Promise<void>
async commit(message: string, files?: string[]): Promise<string>
async push(branch: string, remote?: string): Promise<void>
// Safety checks
async assertNotOnDefaultBranch(): Promise<void>
async assertCleanOrConfirm(): Promise<void>
// Branch naming
generateBranchName(tag: string, taskId: string, slug: string): string
}
Guardrails:
- Never allow commits on default branch
- Always check working tree before branch creation
- Confirm destructive operations unless
--no-confirmflag
3. Test Result Validator (packages/tm-core/src/services/test-result-validator.ts)
Responsibilities:
- Validate test results reported by AI agent
- Ensure RED phase has failing tests
- Ensure GREEN phase has passing tests
- Enforce coverage thresholds (if provided)
API:
class TestResultValidator {
async validateRedPhase(results: TestResults): Promise<ValidationResult>
async validateGreenPhase(results: TestResults, coverage?: number): Promise<ValidationResult>
async meetsThresholds(coverage: number): Promise<boolean>
}
interface TestResults {
passed: number
failed: number
skipped?: number
total: number
}
interface ValidationResult {
valid: boolean
message: string
suggestion?: string
}
Validation Logic:
async function validateRedPhase(results: TestResults): ValidationResult {
if (results.failed === 0) {
return {
valid: false,
message: "RED phase requires failing tests. All tests passed.",
suggestion: "Verify tests are checking expected behavior. Tests should fail before implementation."
}
}
if (results.passed > 0) {
return {
valid: true,
message: `RED phase valid: ${results.failed} failing, ${results.passed} passing (existing tests)`,
warning: "Some tests passing - ensure new tests are failing"
}
}
return {
valid: true,
message: `RED phase complete: ${results.failed} tests failing as expected`
}
}
async function validateGreenPhase(results: TestResults): ValidationResult {
if (results.failed > 0) {
return {
valid: false,
message: `GREEN phase incomplete: ${results.failed} tests still failing`,
suggestion: "Continue implementing until all tests pass or retry GREEN phase"
}
}
return {
valid: true,
message: `GREEN phase complete: ${results.passed} tests passing`
}
}
Note: AI agent is responsible for:
- Running test commands (knows npm test vs pytest vs go test)
- Parsing test output
- Reporting results to TaskMaster
TaskMaster only validates the reported numbers make sense for the phase.
4. Test Generation Integration
Use Surgical Test Generator:
- Load prompt from
.claude/agents/surgical-test-generator.md - Compose with task/subtask context
- Generate tests via executor (Claude)
- Write test files to detected locations
Prompt Composition:
async function composeRedPrompt(subtask: Subtask, context: ProjectContext): Promise<string> {
const systemPrompts = [
loadFile('.cursor/rules/git_workflow.mdc'),
loadFile('.cursor/rules/test_workflow.mdc'),
loadFile('.claude/agents/surgical-test-generator.md')
]
const taskContext = formatTaskContext(subtask)
const instruction = formatRedInstruction(subtask, context)
return [
...systemPrompts,
'<TASK CONTEXT>',
taskContext,
'<INSTRUCTION>',
instruction
].join('\n\n')
}
5. Subtask Loop Implementation
RED Phase:
- TaskMaster returns RED action with subtask context
- AI agent generates tests (TaskMaster not involved)
- AI agent writes test files (TaskMaster not involved)
- AI agent runs tests using project's test command (e.g., npm test)
- AI agent reports results:
tm autopilot complete red <id> --results="failed:3,passed:0" - TaskMaster validates: tests should have failures
- If validation fails (tests passed), return error with suggestion
- If validation passes, update state to GREEN, store results in activity log
- Return next action (GREEN phase)
GREEN Phase:
- TaskMaster returns GREEN action with subtask context
- AI agent implements code (TaskMaster not involved)
- AI agent runs tests using project's test command
- AI agent reports results:
tm autopilot complete green <id> --results="passed:5,failed:0" --coverage="85" - TaskMaster validates: all tests should pass
- If validation fails (tests still failing):
- Increment attempt counter
- If under max attempts: return GREEN action again with attempt number
- If max attempts reached: save state, emit pause event, return resumable checkpoint
- If validation passes: update state to COMMIT, store results in activity log
- Return next action (COMMIT phase)
COMMIT Phase:
- TaskMaster receives commit command:
tm autopilot commit <id> - Detect changed files:
git status --porcelain - Validate coverage meets thresholds (if provided and threshold configured)
- Generate conventional commit message with task metadata
- Stage files:
git add <files> - Create commit:
git commit -m "<message>" - Update subtask status to 'done' in tasks.json
- Log commit event to activity.jsonl
- Update state to next subtask's RED phase
- Return next action
Key changes from original design:
- AI agent runs all test commands (framework agnostic)
- TaskMaster only validates reported results
- No test framework detection needed
- No test execution by TaskMaster
- AI agent is trusted to report accurate results
6. Branch & Tag Management
Integration with existing tag system:
- Use
scripts/modules/task-manager/tag-management.js - Explicit tag switching when branch created
- Store branch ↔ tag mapping in run state
Branch Naming:
- Pattern from config:
{tag}/task-{id}-{slug} - Default:
analytics/task-42-user-metrics - Sanitize: lowercase, replace spaces with hyphens
7. Global Storage & State Management
Philosophy:
- All runtime state, logs, and throwaway artifacts stored globally in
~/.taskmaster/ - Project directory stays clean - only code changes and tasks.json versioned
- Enables single-player autonomous mode without polluting PRs
- Client-friendly: no evidence of tooling in source code
Global directory structure:
~/.taskmaster/
├── projects/
│ └── <project-path-normalized>/
│ ├── runs/
│ │ └── <tag>__task-<id>__<timestamp>/
│ │ ├── manifest.json # run metadata
│ │ ├── activity.jsonl # event stream (debugging)
│ │ ├── state.json # resumable checkpoint
│ │ ├── commits.txt # commit SHAs
│ │ └── test-results/
│ │ ├── subtask-1.1-red.json
│ │ ├── subtask-1.1-green.json
│ │ └── final-suite.json
│ └── tags/
│ └── <tag-name>/
│ └── current-run.json # active run pointer
└── cache/
└── templates/ # shared templates
Project path normalization:
function getProjectStoragePath(projectRoot: string): string {
const normalized = projectRoot
.replace(/\//g, '-')
.replace(/^-/, '')
return path.join(os.homedir(), '.taskmaster', 'projects', normalized)
// Example: ~/.taskmaster/projects/-Volumes-Workspace-contrib-task-master-claude-task-master
}
Run ID generation:
function generateRunId(tag: string, taskId: string): string {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
return `${tag}__task-${taskId}__${timestamp}`
// Example: tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z
}
manifest.json:
{
"runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
"projectRoot": "/Volumes/Workspace/contrib/task-master/claude-task-master",
"taskId": "1",
"tag": "tdd-workflow-phase-0",
"branch": "tdd-phase-0-implementation",
"startTime": "2025-10-07T14:30:00Z",
"endTime": null,
"status": "in-progress",
"currentPhase": "subtask-loop",
"currentSubtask": "1.2",
"subtasksCompleted": ["1.1"],
"subtasksFailed": [],
"totalCommits": 1
}
state.json (orchestration state):
{
"runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
"taskId": "1",
"tag": "tdd-workflow-phase-0",
"branch": "tdd-phase-0-implementation",
"currentSubtask": "1.2",
"currentPhase": "green",
"attemptNumber": 1,
"maxAttempts": 3,
"completedSubtasks": ["1.1"],
"pendingSubtasks": ["1.2", "1.3", "1.4"],
"nextAction": {
"type": "implement",
"subtask": "1.2",
"phase": "green",
"context": {
"testFile": "src/__tests__/preflight.test.ts",
"failingTests": [
"should detect test runner from package.json",
"should validate git working tree"
],
"implementationFiles": ["src/services/preflight-checker.ts"]
}
},
"lastUpdated": "2025-10-07T14:31:45Z",
"canResume": true
}
activity.jsonl (append-only event log):
{"ts":"2025-10-07T14:30:00Z","event":"phase:start","phase":"preflight","status":"ok"}
{"ts":"2025-10-07T14:30:15Z","event":"phase:complete","phase":"preflight","checks":{"git":true,"test":true,"tools":true}}
{"ts":"2025-10-07T14:30:20Z","event":"branch:created","branch":"tdd-phase-0-implementation"}
{"ts":"2025-10-07T14:30:22Z","event":"tag:switched","from":"master","to":"tdd-workflow-phase-0"}
{"ts":"2025-10-07T14:30:25Z","event":"subtask:start","subtaskId":"1.1","phase":"red"}
{"ts":"2025-10-07T14:31:10Z","event":"test:generated","files":["src/__tests__/autopilot.test.ts"],"testCount":3}
{"ts":"2025-10-07T14:31:15Z","event":"test:run","subtaskId":"1.1","phase":"red","passed":0,"failed":3,"status":"expected"}
{"ts":"2025-10-07T14:31:20Z","event":"phase:transition","from":"red","to":"green"}
{"ts":"2025-10-07T14:32:45Z","event":"code:modified","files":["src/commands/autopilot.ts"],"linesChanged":"+58,-0"}
{"ts":"2025-10-07T14:33:00Z","event":"test:run","subtaskId":"1.1","phase":"green","attempt":1,"passed":3,"failed":0,"status":"success"}
{"ts":"2025-10-07T14:33:15Z","event":"commit:created","subtaskId":"1.1","sha":"a1b2c3d","message":"feat(cli): add autopilot command skeleton (task 1.1)"}
{"ts":"2025-10-07T14:33:20Z","event":"subtask:complete","subtaskId":"1.1","duration":"180s"}
current-run.json (active run pointer):
{
"runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
"taskId": "1",
"tag": "tdd-workflow-phase-0",
"startTime": "2025-10-07T14:30:00Z",
"status": "in-progress"
}
What stays in project (versioned):
<project>/
├── .taskmaster/
│ ├── tasks/
│ │ └── tasks.json # ✅ Versioned (task definitions)
│ └── config.json # ✅ Versioned (shared config)
└── .gitignore # Add: .taskmaster/state/, .taskmaster/reports/
State vs Activity Log:
| State File (state.json) | Activity Log (activity.jsonl) |
|---|---|
| Current position | Full history |
| What to do next | What happened |
| Mutable (updated) | Immutable (append-only) |
| For orchestration | For debugging/audit |
| Single JSON object | Line-delimited JSON |
| Small (~2KB) | Can grow large |
Resume logic:
async function resumeWorkflow(): Promise<void> {
// 1. Find active run
const currentRun = await loadJSON('~/.taskmaster/projects/<project>/tags/<tag>/current-run.json')
// 2. Load state from that run
const state = await loadJSON(`~/.taskmaster/projects/<project>/runs/${currentRun.runId}/state.json`)
// 3. Continue from checkpoint
return orchestrator.resumeFrom(state)
}
8. Enhanced Commit Message Format
Purpose:
- Embed task context in commits for future validation
- Enable task-checker bot to verify alignment
- Provide audit trail without needing external logs in PR
Commit message template:
{type}({scope}): {summary} (task {taskId})
{detailed description}
Task: #{taskId} - {taskTitle}
Phase: {phaseName}
Tag: {tagName}
Tests: {testCount} passing
Coverage: {coveragePercent}% lines
Example commit:
feat(cli): add autopilot command skeleton (task 1.1)
Implements AutopilotCommand class with Commander.js integration.
Adds argument parsing for task ID and dry-run flag. Includes basic
command registration and help text following existing CLI patterns.
Task: #1.1 - Create command structure
Phase: Phase 0 - Spike
Tag: tdd-workflow-phase-0
Tests: 3 passing
Coverage: 92% lines
Conventional commit types:
feat- New feature or capabilityfix- Bug fixtest- Test-only changesrefactor- Code restructuring without behavior changedocs- Documentation updateschore- Build/tooling changes
Scope determination:
function determineScope(files: string[]): string {
// Extract common scope from changed files
const scopes = files.map(f => {
if (f.startsWith('apps/cli/')) return 'cli'
if (f.startsWith('packages/tm-core/')) return 'core'
if (f.startsWith('packages/tm-mcp/')) return 'mcp'
return 'misc'
})
// Use most common scope
return mode(scopes)
}
Commit validation (future task-checker bot):
async function validateCommit(commit: Commit, task: Task): Promise<ValidationResult> {
const taskId = extractTaskId(commit.message) // "1.1"
const task = await loadTask(taskId)
return aiChecker.validate({
commitDiff: commit.diff,
commitMessage: commit.message,
taskDescription: task.description,
acceptanceCriteria: task.acceptanceCriteria,
testStrategy: task.testStrategy
})
}
9. CLI Commands for AI Agent Orchestration
New CLI commands (all under tm autopilot namespace):
# Start workflow - creates branch, initializes state
tm autopilot start <taskId> [options]
--branch <name> # Override branch name
--no-confirm # Skip confirmations
--max-attempts <n> # Override max GREEN attempts
# Get next action from state
tm autopilot next [options]
--json # Output as JSON for parsing
# Complete a phase and report test results
tm autopilot complete <phase> <subtaskId> --results="<passed:n,failed:n>" [options]
# phase: red | green
--results <passed:n,failed:n> # Required: test results from AI
--coverage <percentage> # Optional: coverage percentage
--files <file1,file2> # Optional: files changed (auto-detected if omitted)
# Create commit (called by AI after GREEN passes)
tm autopilot commit <subtaskId> [options]
--message <msg> # Override commit message
# Resume from interrupted run
tm autopilot resume [options]
--run-id <id> # Specific run to resume
# Get current status
tm autopilot status
--json # Output as JSON
# Watch activity log in real-time
tm autopilot watch
# Abort current run
tm autopilot abort [options]
--cleanup # Delete branch and state
Command details:
tm autopilot start <taskId>
- Creates global state directory
- Creates feature branch
- Switches tag
- Initializes state.json with first subtask
- Returns next action (RED phase for first subtask)
tm autopilot next
- Reads
~/.taskmaster/projects/<project>/tags/<tag>/current-run.json - Reads
~/.taskmaster/projects/<project>/runs/<run-id>/state.json - Returns next action with full context
Output:
{
"action": "red",
"subtask": {
"id": "1.1",
"title": "Create command structure",
"description": "...",
"testStrategy": "..."
},
"context": {
"projectRoot": "/path/to/project",
"testPattern": "**/*.test.ts",
"existingTests": []
},
"instructions": "Generate tests for this subtask. Tests should fail initially."
}
tm autopilot complete <phase> <subtaskId>
- Receives test results from AI agent
- Validates phase completion:
- RED: Ensures reported results show failures
- GREEN: Ensures reported results show all tests passing
- Updates state to next phase
- Logs event to activity.jsonl with test results
- Returns next action
Examples:
# After AI generates tests and runs them
tm autopilot complete red 1.1 --results="failed:3,passed:0"
# After AI implements code and runs tests
tm autopilot complete green 1.1 --results="passed:3,failed:0" --coverage="92"
# With existing passing tests
tm autopilot complete red 1.1 --results="failed:3,passed:12"
tm autopilot commit <subtaskId>
- Generates commit message from template
- Stages files
- Creates commit with enhanced message
- Updates subtask status to 'done'
- Updates state to next subtask
- Returns next action
tm autopilot status
{
"runId": "tdd-workflow-phase-0__task-1__2025-10-07T14-30-00-000Z",
"taskId": "1",
"currentSubtask": "1.2",
"currentPhase": "green",
"attemptNumber": 1,
"progress": {
"completed": ["1.1"],
"current": "1.2",
"remaining": ["1.3", "1.4"]
},
"commits": 1,
"startTime": "2025-10-07T14:30:00Z",
"duration": "5m 30s"
}
10. MCP Tools for AI Agent Orchestration
New MCP tools (add to packages/tm-mcp/src/tools/):
// autopilot_start
{
name: "autopilot_start",
description: "Start autonomous TDD workflow for a task",
parameters: {
taskId: string,
options?: {
branch?: string,
maxAttempts?: number
}
},
returns: {
runId: string,
branch: string,
nextAction: NextAction
}
}
// autopilot_next
{
name: "autopilot_next",
description: "Get next action from workflow state",
parameters: {
projectRoot?: string // defaults to current
},
returns: {
action: "red" | "green" | "commit" | "complete",
subtask: Subtask,
context: Context,
instructions: string
}
}
// autopilot_complete_phase
{
name: "autopilot_complete_phase",
description: "Report test results and validate phase completion",
parameters: {
phase: "red" | "green",
subtaskId: string,
testResults: {
passed: number,
failed: number,
skipped?: number
},
coverage?: number, // Optional coverage percentage
files?: string[] // Optional, auto-detected if not provided
},
returns: {
validated: boolean,
message: string,
suggestion?: string,
nextAction: NextAction
}
}
// autopilot_commit
{
name: "autopilot_commit",
description: "Create commit for completed subtask",
parameters: {
subtaskId: string,
files?: string[],
message?: string // Override
},
returns: {
commitSha: string,
message: string,
nextAction: NextAction
}
}
// autopilot_status
{
name: "autopilot_status",
description: "Get current workflow status",
parameters: {
projectRoot?: string
},
returns: {
runId: string,
taskId: string,
currentSubtask: string,
currentPhase: string,
progress: Progress,
commits: number
}
}
// autopilot_resume
{
name: "autopilot_resume",
description: "Resume interrupted workflow",
parameters: {
runId?: string // defaults to current
},
returns: {
resumed: boolean,
nextAction: NextAction
}
}
MCP tool usage example (Claude Code session):
// AI agent calls MCP tools
const { runId, nextAction } = await mcp.autopilot_start({ taskId: "1" })
while (nextAction.action !== "complete") {
const action = await mcp.autopilot_next()
if (action.action === "red") {
// AI generates tests
const tests = await generateTests(action.subtask, action.context)
await writeFiles(tests)
// AI runs tests (using project's test command)
const testOutput = await runCommand("npm test") // or pytest, go test, etc.
const results = parseTestOutput(testOutput)
// Report results to TaskMaster
const validation = await mcp.autopilot_complete_phase({
phase: "red",
subtaskId: action.subtask.id,
testResults: {
passed: results.passed,
failed: results.failed,
skipped: results.skipped
}
})
if (!validation.validated) {
console.error(validation.message)
// Handle validation failure
}
}
if (action.action === "green") {
// AI implements code
const impl = await implementCode(action.subtask, action.context)
await writeFiles(impl)
// AI runs tests again
const testOutput = await runCommand("npm test")
const results = parseTestOutput(testOutput)
const coverage = parseCoverage(testOutput)
// Report results to TaskMaster
const validation = await mcp.autopilot_complete_phase({
phase: "green",
subtaskId: action.subtask.id,
testResults: {
passed: results.passed,
failed: results.failed
},
coverage: coverage.lines
})
if (!validation.validated) {
console.log(validation.message, validation.suggestion)
// Retry or handle failure
}
}
if (action.action === "commit") {
// TaskMaster creates the commit
const { commitSha, nextAction: next } = await mcp.autopilot_commit({
subtaskId: action.subtask.id
})
nextAction = next
}
}
11. AI Agent Instructions (CLAUDE.md integration)
Add to .claude/CLAUDE.md or .cursor/rules/:
## TaskMaster Autonomous Workflow
When working on tasks with `tm autopilot`:
1. **Start workflow:**
```bash
tm autopilot start <taskId>
```
2. **Loop until complete:**
```bash
# Get next action
NEXT=$(tm autopilot next --json)
ACTION=$(echo $NEXT | jq -r '.action')
SUBTASK=$(echo $NEXT | jq -r '.subtask.id')
case $ACTION in
red)
# 1. Generate tests based on instructions
# 2. Write test files
# 3. Run tests yourself (you know the test command)
npm test # or pytest, go test, cargo test, etc.
# 4. Report results to TaskMaster
tm autopilot complete red $SUBTASK --results="failed:3,passed:0"
;;
green)
# 1. Implement code to pass tests
# 2. Write implementation files
# 3. Run tests yourself
npm test
# 4. Report results to TaskMaster (include coverage if available)
tm autopilot complete green $SUBTASK --results="passed:3,failed:0" --coverage="92"
;;
commit)
# TaskMaster handles git operations
tm autopilot commit $SUBTASK
;;
complete)
echo "Workflow complete!"
;;
esac
```
3. **State is preserved** - you can stop/resume anytime with `tm autopilot resume`
**Important:** You are responsible for:
- Running test commands (TaskMaster doesn't know your test framework)
- Parsing test output (passed/failed counts)
- Reporting accurate results
**Via MCP:**
Use `autopilot_*` tools for the same workflow with better integration.
Example AI agent prompt:
You are working autonomously on Task Master tasks using the autopilot workflow.
Instructions:
1. Call `tm autopilot next --json` to get your next action
2. Read the action type and context
3. Execute the action:
- **RED**:
* Generate tests that fail initially
* Run tests: `npm test` (or appropriate test command)
* Report: `tm autopilot complete red <id> --results="failed:n,passed:n"`
- **GREEN**:
* Implement code to pass the tests
* Run tests: `npm test`
* Report: `tm autopilot complete green <id> --results="passed:n,failed:n" --coverage="nn"`
- **COMMIT**:
* Call: `tm autopilot commit <id>` (TaskMaster handles git)
4. Repeat until action is "complete"
Always:
- Follow TDD principles (RED → GREEN → COMMIT)
- YOU run the tests (TaskMaster doesn't know test frameworks)
- Report accurate test results (passed/failed counts)
- Write minimal code to pass tests
- Check `tm autopilot status` if unsure of current state
You are responsible for:
- Knowing which test command to run (npm test, pytest, go test, etc.)
- Parsing test output to get pass/fail counts
- Understanding the project's testing framework
- Running tests after generating/implementing code
TaskMaster is responsible for:
- Validating your reported results make sense (RED should fail, GREEN should pass)
- Creating properly formatted git commits
- Managing workflow state and transitions
Success Criteria
- Can execute a simple task end-to-end without manual intervention
- All commits made on feature branch, never on default branch
- Tests are generated before implementation (RED → GREEN order enforced)
- Only commits when tests pass and coverage meets threshold
- Run state is persisted and can be inspected post-run
- Clear error messages when things go wrong
- Orchestrator events allow CLI to show live progress
Configuration
Add to .taskmaster/config.json (versioned):
{
"autopilot": {
"enabled": true,
"requireCleanWorkingTree": true,
"commitTemplate": "{type}({scope}): {msg} (task {taskId})",
"defaultCommitType": "feat",
"maxGreenAttempts": 3,
"testTimeout": 300000,
"storage": {
"location": "global",
"basePath": "~/.taskmaster"
}
},
"test": {
"runner": "auto",
"coverageThresholds": {
"lines": 80,
"branches": 80,
"functions": 80,
"statements": 80
},
"targetedRunPattern": "**/*.test.js"
},
"git": {
"branchPattern": "{tag}/task-{id}-{slug}",
"defaultRemote": "origin"
}
}
Update .gitignore (keep project clean):
# TaskMaster runtime artifacts (stored globally, not needed in repo)
.taskmaster/state/
.taskmaster/reports/
# Keep these versioned
!.taskmaster/tasks/
!.taskmaster/config.json
!.taskmaster/docs/
!.taskmaster/templates/
Out of Scope (defer to Phase 2)
- PR creation (gh integration)
- Resume functionality (
--resumeflag) - Lint/format step
- Multiple executor support (only Claude)
Implementation Order
Phase 1A: Infrastructure (Week 1)
- Global storage utilities (path normalization, run ID generation)
- Activity log writer (append-only JSONL)
- State manager (load/save/update state.json)
- GitAdapter with safety checks
- TestResultValidator (validate RED/GREEN phase results)
Phase 1B: Orchestration Core (Week 1-2)
- WorkflowOrchestrator state machine skeleton
- State transitions (Preflight → BranchSetup → SubtaskLoop → Finalize)
- Event emitter for activity logging
- Enhanced commit message generator
Phase 1C: TDD Loop (Week 2)
- RED phase validator (ensure tests fail)
- GREEN phase validator (ensure tests pass)
- COMMIT phase implementation (staging, committing)
- Subtask progression logic
Phase 1D: CLI Interface (Week 2-3)
tm autopilot startcommandtm autopilot nextcommandtm autopilot completecommandtm autopilot commitcommandtm autopilot statuscommandtm autopilot resumecommand
Phase 1E: MCP Interface (Week 3)
autopilot_starttoolautopilot_nexttoolautopilot_complete_phasetoolautopilot_committoolautopilot_statustoolautopilot_resumetool
Phase 1F: Integration (Week 3)
- AI agent instruction templates
- Error handling and recovery
- Integration tests
- Documentation
Testing Strategy
- Unit tests for global storage (path normalization, state management)
- Unit tests for activity log (JSONL append, parsing)
- Unit tests for each adapter (mock git/test commands)
- Integration tests with real git repo (temporary directory)
- End-to-end test with sample task in test project
- Verify no commits on default branch (security test)
- Verify commit gating works (force test failure, ensure no commit)
- Verify enhanced commit messages include task context
- Test resume from state checkpoint
- Verify project directory stays clean (no runtime artifacts)
Dependencies
- Phase 0 completed (CLI skeleton, preflight checks)
- Existing TaskService and executor infrastructure
- Surgical Test Generator prompt file exists
Estimated Effort
2-3 weeks
Risks & Mitigations
-
Risk: Test generation produces invalid/wrong tests
- Mitigation: Use Surgical Test Generator prompt, add manual review step in early iterations
-
Risk: Implementation attempts timeout/fail repeatedly
- Mitigation: Max attempts with pause/resume; store state for manual intervention
-
Risk: Coverage parsing fails on different test frameworks
- Mitigation: Start with one framework (vitest), add parsers incrementally
-
Risk: Git operations fail (conflicts, permissions)
- Mitigation: Detailed error messages, save state before destructive ops
Validation
Test with:
- Simple task (1 subtask, clear requirements)
- Medium task (3 subtasks with dependencies)
- Task requiring multiple GREEN attempts
- Task with dirty working tree (should error)
- Task on default branch (should error)
- Project without test command (should error with helpful message)
- Verify global storage created in
~/.taskmaster/projects/<project>/ - Verify activity log is valid JSONL and streamable
- Verify state.json allows resumption
- Verify commit messages include task metadata
- Verify project directory contains no runtime artifacts after run
- Test with multiple worktrees (independent state per worktree)