Skip to content

What is semtest-runner?

Verifying that a codebase meets high-level specifications — architecture decisions, naming conventions, structural patterns, design constraints — is manual and subjective. Code review catches some of it. Tribal knowledge covers the rest. Nothing enforces it systematically.

Traditional tests can’t help here. You can’t write a unit test for “all adapters should implement the RunnerAdapter interface” or “the config schema should use Zod with runtime validation.” These are semantic properties of the codebase, not function input/output pairs.

semtest-runner codifies those specifications as test files (any format — Markdown, plain text, PDF, JSON, or any file type an LLM can parse), then uses an LLM CLI to read the codebase and evaluate each specification programmatically.

The workflow:

  1. Write expectations in a Markdown file (plain English, structured with headings)
  2. Run semtest run
  3. The tool discovers all test files, constructs a prompt for each one, invokes an LLM CLI, and parses the structured results
  4. Reports are generated — a human-readable Markdown report and a machine-readable JSON report for CI
# Config Schema Validation
## Validates runner names
The config schema should restrict the `llm.runner` field to exactly
"claude", "codex", "gemini", or "opencode". Any other value should
fail validation.
## Provides defaults
Fields like `output`, `strict`, `debug`, and `timestamp` should have
sensible defaults so a minimal config works without specifying them.
✔ [1/3] config-schema.md · 2 passed
✗ [2/3] adapter-pattern.md · 1 passed, 1 failed
✔ [3/3] project-structure.md · 3 passed
Semantic tests completed
Report: semantic-test-results/latest.md
CI Output: semantic-test-results/ci-results.json
Passed: 6
Failed: 1
FAILED

Every tool in the stack serves a specific purpose. This section explains what each one is, what role it plays in semtest-runner, and where it lives in the codebase.

TypeScript is the implementation language for the entire package. Every module, type, and interface is written in TypeScript with strict mode enabled — the compiler catches type errors at compile time rather than letting them surface at runtime.

The project is ESM-only ("type": "module" in package.json). There is no CommonJS — all imports and exports use the import/export syntax exclusively. The compilation target is ES2022, which means modern JavaScript features like top-level await and private class fields are available without polyfills.

Types flow through the entire pipeline: the config schema defines the SemtestConfig type, discovery produces SemanticTest[], the runner returns RunResult, and reports consume CIResult. This end-to-end typing means a change to any data structure is caught by the compiler everywhere it’s used.

tsup is a zero-config TypeScript bundler powered by esbuild. Its role is to take the TypeScript source in src/ and produce distributable JavaScript in dist/.

It produces two entry points:

  • dist/cli.js — the CLI binary, referenced by the bin field in package.json. This is what runs when you type semtest.
  • dist/index.js — the public API for programmatic use. This is what consumers get when they import from @thulanek/semtest-runner.

tsup also generates .d.ts type declaration files so consumers get TypeScript autocompletion even though they’re importing compiled JavaScript. Code splitting is enabled, which means shared code between the CLI and API entry points is bundled once rather than duplicated.

Why tsup over raw tsc? tsup is faster (esbuild compiles orders of magnitude faster than tsc), handles ESM output correctly without extra configuration, and generates both JavaScript and declarations in a single step. tsc alone can’t bundle and requires more configuration for clean ESM output.

Config: packages/semtest-runner/tsup.config.ts

Zod is a TypeScript-first schema validation library. Its role in semtest-runner is to validate the semtest.config.ts configuration file at runtime.

Why runtime validation matters: TypeScript types are erased when code runs. The SemtestConfig type exists at compile time, but when a user’s config file is loaded at runtime, it’s just a JavaScript object that could contain anything — wrong types, invalid runner names, missing fields. Zod catches these problems at config load time with clear error messages instead of letting them cause cryptic crashes later in the pipeline.

Specific capabilities used:

  • Schema definition with defaults — a minimal config works because Zod fills in defaults for output, strict, debug, timestamp, and other optional fields
  • z.enum() — restricts llm.runner to exactly "claude" | "codex" | "gemini" | "opencode". Any other string fails validation immediately.
  • z.infer<> — generates TypeScript types directly from the schema. The schema is the single source of truth for both runtime validation and compile-time types. There’s no separate type definition that could drift out of sync.
  • Two derived types: SemtestConfig (the fully resolved output — all fields present after defaults are applied) and SemtestUserConfig (the partial input — fields are optional where defaults exist)

Where: src/config/schema.ts

Commander is a mature Node.js CLI framework. Its role is to define the semtest run command interface — the subcommand, its arguments, and its flags.

It provides:

  • The run subcommand as the primary action
  • Positional arguments for targeting specific test files (semtest run auth.md api.md)
  • Flag parsing: --strict, --debug, --timestamp, --include-passing, --skip-validation, --extensions
  • Automatic --help text generation from the command definition

Why Commander: it handles the exact pattern semtest-runner needs (subcommand + optional positional args + boolean/value flags), is well-documented, and has zero learning curve for contributors who’ve worked with any Node.js CLI tool.

Where: src/cli.ts

jiti is a runtime TypeScript/ESM loader — it can import() a .ts file directly without compiling it first.

Its role in semtest-runner is to load semtest.config.ts at runtime. This matters because users write their config in TypeScript to get autocompletion from the defineConfig() helper. But Node.js can’t natively import .ts files. jiti bridges this gap — it transpiles on-the-fly when the config is loaded, so no separate build step is needed for the config file.

jiti also handles .js and .mjs config files transparently, so users who prefer plain JavaScript aren’t forced into TypeScript.

Where: src/config/loader.ts — the loadConfig() function uses jiti to import whatever config file it finds.

ANSI escape codes are special character sequences that terminals interpret as colour and style instructions. semtest-runner uses them to provide coloured terminal output — green for pass, red for fail, yellow for errors, cyan for spinners, dim for secondary info, bold for the final verdict.

The project uses raw ANSI sequences directly (e.g., \x1b[32m for green) rather than a colour library like chalk or picocolors. The colors utility object wraps these sequences into named functions: colors.green(), colors.red(), colors.bold(), and so on.

The colour system respects the NO_COLOR environment variable and FORCE_COLOR=0 per the no-color.org standard. When either is set, all colour functions return the input string unchanged — no escape codes are emitted.

Where: src/output/progress.ts (colour definitions, spinner), src/output/terminal.ts (summary output)

The spinner provides live feedback while LLM invocations are in progress (which can take seconds to minutes per test file). It uses Unicode Braille characters (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏) as animation frames, cycling at 80ms intervals via setInterval. This creates a smooth spinning effect by overwriting the current terminal line using \r (carriage return).

The system detects whether stdout is a TTY. In a terminal, it shows the animated spinner. In piped or CI output (where animation would produce garbage), it prints static lines instead.

LLM CLIs are invoked using child_process.spawn(), which creates a subprocess running the chosen CLI tool (e.g., claude, codex, gemini).

How it works:

  • stdin: the prompt is written to the subprocess’s stdin (except Claude, which takes it as a positional argument)
  • stdout/stderr: captured as string buffers, then passed to the parser
  • Exit codes: monitored to detect CLI-level failures

Where: src/utils/process.ts — the runCommand() function takes a CommandSpec (command name, arguments, optional stdin) and returns { stdout, stderr, exitCode }.

Vitest is a modern test framework built for Vite/ESM projects. It runs the unit test suite in packages/semtest-runner/tests/.

Why Vitest over Jest: the project is ESM-only, and Jest has historically poor ESM support requiring extensive transform configuration. Vitest is ESM-native — it requires zero transform config and works out of the box with TypeScript and ESM imports.

Turborepo is a build system for JavaScript monorepos. It orchestrates build, dev, and test tasks across the three workspaces.

Key behaviours:

  • Topological builds: the ^build dependency means packages build before apps that depend on them
  • Task caching: Turborepo tracks inputs and outputs, rebuilding only what changed
  • Parallel execution: independent tasks across workspaces run in parallel

Where: turbo.json at the repo root.

pnpm manages workspace dependencies and hoisting. pnpm-workspace.yaml defines the monorepo structure.

pnpm uses strict dependency resolution by default — packages only see their own declared dependencies, not transitive ones hoisted from other workspaces. The project uses shamefully-hoist=true in .npmrc to work around a local dependency resolution issue with Zod versions (this doesn’t affect CI).


semtest-runner/
├── packages/
│ └── semtest-runner/ # The main package (@thulanek/semtest-runner)
│ ├── src/
│ │ ├── cli.ts # Commander CLI entry point
│ │ ├── index.ts # Public API re-exports
│ │ ├── config/ # Zod schema + jiti config loader
│ │ ├── discovery/ # Test file discovery
│ │ ├── prompt/ # LLM prompt construction
│ │ ├── runner/ # Adapter pattern for LLM CLIs
│ │ │ └── adapters/ # claude, codex, gemini, opencode
│ │ ├── parser/ # Resilient JSON parsing
│ │ ├── validation/ # Post-run result validation
│ │ ├── report/ # Markdown + JSON report generation
│ │ ├── output/ # Terminal spinner + summary
│ │ └── utils/ # Process spawning, filesystem helpers
│ ├── tests/ # Vitest unit tests
│ ├── tsup.config.ts # Bundler configuration
│ └── tsconfig.json # TypeScript configuration
├── apps/
│ ├── docs/ # Public documentation (Astro Starlight)
│ └── internal-docs/ # Contributor documentation (these docs)
├── turbo.json # Turborepo task definitions
├── pnpm-workspace.yaml # Workspace definitions
└── package.json # Root package (private, scripts only)
  • packages/semtest-runner/ — the only publishable package. Contains all source code, tests, and build config. Published to GitHub Packages as @thulanek/semtest-runner.
  • apps/docs/ — public-facing documentation site. Built with Astro Starlight and Tailwind CSS v4. Runs on port 4321 in dev.
  • apps/internal-docs/ — contributor documentation (what you’re reading now). Same stack as the public docs. Runs on port 4322 in dev.