Technical Overview
semtest-runner is a pipeline-based CLI tool. Data flows forward through discrete stages, each transforming it into the input the next stage needs. The mental model: load config, find test files, construct prompts, invoke LLMs, parse responses, validate results, generate reports, print a summary. Each stage is a separate module with clear inputs and outputs — the config module produces a SemtestConfig, discovery produces SemanticTest[], the runner produces RunResult, and so on. This makes the codebase easy to navigate: if you want to understand how prompts work, you read src/prompt/builder.ts. If you want to understand how LLM output is parsed, you read src/parser/result.ts.
Execution flow
Section titled “Execution flow”When a user runs semtest run, the following pipeline executes:
- CLI parses flags and arguments via Commander
- Config Loader finds and validates
semtest.config.tsusing jiti + Zod - Discovery scans the test directory for
.mdfiles (or resolves specific file paths) - Execute Loop iterates over each test sequentially:
- Prompt Builder constructs the full LLM prompt from the test file content
- Adapter Registry resolves the correct CLI adapter (claude, codex, gemini, opencode)
- Model Resolver maps the capability level (high/balanced/fast) to a concrete model ID
- Process Spawner invokes the LLM CLI as a child process
- Parser extracts JSON results from the raw LLM output (with fallback strategies)
- Retries up to 3 times on empty responses
- Validation checks for duplicate IDs, missing IDs, and invalid results
- Reports are generated (Markdown + JSON) and written to the output directory
- Terminal Output prints a summary with colour-coded results
- Exit code is determined: 0 = pass, 1 = fail, 2 = error
Module diagram
Section titled “Module diagram”The diagram below shows how modules connect. Dashed lines indicate optional or feedback paths.
Legend
Section titled “Legend”| Colour | Category | Modules |
|---|---|---|
| Blue | Entry point | CLI |
| Amber | Core pipeline | Config, Discovery, Execute Loop, Prompt Builder, Adapter Registry, Model Resolver, Process Spawner, Parser, Validation |
| Green | Output layer | MD Report, JSON Report, Terminal Output, Debug Output |
Key types
Section titled “Key types”The most important types flow through the pipeline:
| Type | Module | Purpose |
|---|---|---|
SemtestConfig | config/schema | Validated configuration object |
SemanticTest | discovery/tests | Discovered test file with name, path, and content |
CommandSpec | runner/types | Command + args + optional stdin for a CLI invocation |
RunnerAdapter | runner/types | Interface each LLM adapter implements |
TestResult | parser/result | Parsed result for a single test scenario |
TestRunResult | runner/execute | Result tied to its source file |
RunResult | runner/execute | Full run output with summary and status |
CIResult | report/json | JSON report shape consumed by CI |
ValidationResult | validation/results | Validation issues found post-run |