Skip to content

Technical Overview

semtest-runner is a pipeline-based CLI tool. Data flows forward through discrete stages, each transforming it into the input the next stage needs. The mental model: load config, find test files, construct prompts, invoke LLMs, parse responses, validate results, generate reports, print a summary. Each stage is a separate module with clear inputs and outputs — the config module produces a SemtestConfig, discovery produces SemanticTest[], the runner produces RunResult, and so on. This makes the codebase easy to navigate: if you want to understand how prompts work, you read src/prompt/builder.ts. If you want to understand how LLM output is parsed, you read src/parser/result.ts.

When a user runs semtest run, the following pipeline executes:

  1. CLI parses flags and arguments via Commander
  2. Config Loader finds and validates semtest.config.ts using jiti + Zod
  3. Discovery scans the test directory for .md files (or resolves specific file paths)
  4. Execute Loop iterates over each test sequentially:
    • Prompt Builder constructs the full LLM prompt from the test file content
    • Adapter Registry resolves the correct CLI adapter (claude, codex, gemini, opencode)
    • Model Resolver maps the capability level (high/balanced/fast) to a concrete model ID
    • Process Spawner invokes the LLM CLI as a child process
    • Parser extracts JSON results from the raw LLM output (with fallback strategies)
    • Retries up to 3 times on empty responses
  5. Validation checks for duplicate IDs, missing IDs, and invalid results
  6. Reports are generated (Markdown + JSON) and written to the output directory
  7. Terminal Output prints a summary with colour-coded results
  8. Exit code is determined: 0 = pass, 1 = fail, 2 = error

The diagram below shows how modules connect. Dashed lines indicate optional or feedback paths.

CLI (cli.ts) Config Loader Discovery Prompt Builder Execute Loop Adapter Registry Model Resolver Process Spawner Parser Validation MD Report JSON Report Terminal Output Debug Output
ColourCategoryModules
BlueEntry pointCLI
AmberCore pipelineConfig, Discovery, Execute Loop, Prompt Builder, Adapter Registry, Model Resolver, Process Spawner, Parser, Validation
GreenOutput layerMD Report, JSON Report, Terminal Output, Debug Output

The most important types flow through the pipeline:

TypeModulePurpose
SemtestConfigconfig/schemaValidated configuration object
SemanticTestdiscovery/testsDiscovered test file with name, path, and content
CommandSpecrunner/typesCommand + args + optional stdin for a CLI invocation
RunnerAdapterrunner/typesInterface each LLM adapter implements
TestResultparser/resultParsed result for a single test scenario
TestRunResultrunner/executeResult tied to its source file
RunResultrunner/executeFull run output with summary and status
CIResultreport/jsonJSON report shape consumed by CI
ValidationResultvalidation/resultsValidation issues found post-run