Running Tests

Basic usage

# Run all tests in the configured directory
semtest run

# Run specific test files
semtest run auth-middleware.md api-routes.txt

# Run with full paths
semtest run semtests/auth-middleware.md

When file arguments are provided, they’re resolved against cwd first, then against the configured tests directory.

CLI flags

Flag	Type	Default	Description
`--timestamp`	boolean	`false`	Generate a timestamped copy of the Markdown report
`--include-passing`	boolean	`false`	Include passing tests in the Markdown report
`--strict`	boolean	`false`	Exit code 2 if validation issues are found
`--skip-validation`	boolean	`false`	Skip post-run validation entirely
`--extensions <exts>`	string	(all files)	Comma-separated file extensions (e.g. `.md,.txt`)
`--debug`	boolean	`false`	Log raw LLM output to `{output}/debug/`

Config file options

All CLI flags can also be set in semtest.config.ts:

import { defineConfig } from "@thulanek/semtest-runner";

export default defineConfig({
  tests: "semtests/",
  output: "semantic-test-results/",
  llm: {
    runner: "claude",
    capability: "balanced",
  },
  strict: true,
  debug: true,
  timestamp: true,
  includePassing: false,
  extensions: [".md", ".txt"],
});

Flag precedence

CLI flags always override config file values:

CLI flag > config file > schema default

For example, if the config has strict: true but you run semtest run without --strict, strict mode is still enabled. But if you explicitly pass a flag, it wins.

Exit codes

Code	Meaning	When
`0`	Pass	All tests passed
`1`	Fail	At least one test failed (but no errors)
`2`	Error	LLM subprocess error, parse error, or `--strict` with validation issues

Precedence: error (2) > fail (1) > pass (0)

Debug mode

When --debug is enabled:

A debug/ directory is created inside the output directory
For each test file, a JSON file is written containing all retry attempts
Each attempt includes the raw stdout, stderr, and exitCode from the LLM CLI

semtest run --debug

This is useful for diagnosing unexpected LLM responses or retry behaviour.