Runner

Source files: src/runner/types.ts, src/runner/adapters/, src/runner/models.ts, src/runner/execute.ts

The runner invokes LLM CLIs as child processes. Each LLM has a different CLI interface — Claude takes prompts as positional arguments (claude --print "prompt"), while Codex, Gemini, and OpenCode accept them via stdin. The adapter pattern abstracts these differences behind a common RunnerAdapter interface, so the execution loop doesn’t know or care which LLM it’s talking to. It just calls adapter.buildCommand(prompt, model) and gets back a uniform CommandSpec that the process spawner can execute.

Tests run sequentially because LLM CLIs are rate-limited — parallel invocation would trigger throttling or errors. Each test goes through the full cycle (build prompt, spawn process, capture output, parse results) before the next one starts.

Types

`CommandSpec`

Describes how to invoke an LLM CLI:

interface CommandSpec {
  command: string;   // CLI binary name (e.g. "claude")
  args: string[];    // Command-line arguments
  stdin?: string;    // Optional stdin input
}

`RunnerAdapter`

Interface every adapter implements:

interface RunnerAdapter {
  buildCommand(prompt: string, model?: string): CommandSpec;
  parseRawOutput(raw: string): string;
}

Adapter pattern

Each supported LLM CLI has a dedicated adapter in src/runner/adapters/:

Adapter	CLI	Prompt delivery	`parseRawOutput` behaviour
`claude.ts`	`claude`	Positional arg (`--print <prompt>`)	Extracts `.result` or `.text` from JSON wrapper, falls back to raw
`codex.ts`	`codex`	stdin	Pass-through
`gemini.ts`	`gemini`	stdin	Pass-through
`opencode.ts`	`opencode`	stdin	Pass-through

Adapter registry

src/runner/adapters/index.ts maps runner names to adapter instances:

const adapterRegistry: Record<string, RunnerAdapter> = {
  claude: claudeAdapter,
  codex: codexAdapter,
  gemini: geminiAdapter,
  opencode: opencodeAdapter,
};

function resolveAdapter(runner: string): RunnerAdapter

Throws with a clear error listing supported runners if the name is unknown.

Model matrix

src/runner/models.ts maps each (runner, capability) pair to a concrete model ID:

Runner	`high`	`balanced`	`fast`
claude	`claude-opus-4-6`	`claude-sonnet-4-6`	`claude-haiku-4-5-20251001`
codex	`o3`	`o4-mini`	`gpt-4.1-mini`
gemini	`gemini-2.5-pro`	`gemini-2.5-flash`	`gemini-2.5-flash-lite`
opencode	`o3`	`o4-mini`	`gpt-4.1-mini`

Execution loop

executeTests() in src/runner/execute.ts is the core orchestrator:

Resolves the adapter and model from config
Iterates over tests sequentially (LLM CLIs are rate-limited)
For each test:
- Builds the prompt
- Builds the CLI command via the adapter
- Spawns the process (src/utils/process.ts)
- Parses the response
- Retries up to 3 times on empty responses
- Fires progress callbacks for terminal output
Aggregates all results into a RunResult

`RunResult`

interface RunResult {
  tests: TestRunResult[];
  summary: {
    total: number;
    passed: number;
    failed: number;
    errored: number;
    invalid: number;
    skipped: number;
  };
  status: "pass" | "fail" | "error";
  timestamp: string;
}

Status precedence

The overall run status follows: error > fail > pass. Invalid and skipped results don’t affect the overall status.

Progress callbacks

The execution loop accepts optional callbacks for live terminal feedback:

Callback	Fires when
`onTestStart`	A test begins execution
`onTestRetry`	An empty response triggers a retry
`onTestComplete`	A test finishes (success or failure)
`onDebugOutput`	Raw stdout/stderr is captured (debug mode only)