Writing Tests

What are semantic tests?

Semantic tests are files that describe expected behaviour of a codebase in natural language. They can be any file type — .md, .txt, .pdf, .json, .ts, or anything else an LLM can parse. Instead of asserting on specific function outputs, they describe what should be true about the code, and an LLM evaluates whether the codebase satisfies those expectations.

File location

Test files live in the directory configured by tests in your config (default: semantic-tests/). The runner discovers all files in this directory.

semtests/
├── adapter-pattern.md
├── cli-interface.txt
├── config-schema.md
├── esm-compliance.md
└── project-structure.txt

File structure

A semantic test file should have:

A heading with an ID — used by the LLM to identify the test scenario
An expectation section — what the code should do
A behaviour section — specific details to verify

Example: Single test per file

# Project Structure - id: project-structure

## Expectation

The project should follow a well-organized directory layout with all
TypeScript source code under `src/`, organized into logical subdirectories
for each concern: `config`, `discovery`, `prompt`, `parser`, `runner`,
`report`, `output`, `validation`, and `utils`.

## Behaviour

The `src/` directory should contain subdirectories for each module:
`config/` (Zod schema and config loader), `discovery/` (test file discovery),
`prompt/` (LLM prompt construction), `parser/` (JSON result parsing),
`runner/` (LLM CLI adapters and execution), `report/` (Markdown and JSON
report generation), `output/` (terminal progress and summary),
`validation/` (result validation), and `utils/` (process and filesystem
helpers).

Example: Architecture verification

# Adapter Pattern - id: adapter-pattern

## Expectation

The runner module should implement an adapter pattern where each supported
LLM CLI (claude, codex, gemini, opencode) has a dedicated adapter
conforming to a shared `RunnerAdapter` interface, with a registry that
resolves the correct adapter by runner name.

## Behaviour

Four adapter files should exist under `src/runner/adapters/`: `claude.ts`,
`codex.ts`, `gemini.ts`, and `opencode.ts`. Each adapter should implement
the `RunnerAdapter` interface, which defines `buildCommand()` and
`parseRawOutput()`. The Claude adapter should pass the prompt as a
positional argument (not via stdin), while the other adapters should pass
the prompt via stdin.

Tips for writing effective tests

Be specific — mention exact file paths, function names, and expected types when possible
Use IDs — include id: my-id in headings so the LLM extracts consistent identifiers
One concern per file — each file should test one aspect of the codebase
Describe observable facts — focus on what can be verified by reading the code (file existence, exports, types, patterns)
Avoid implementation details — describe what the code does, not how it does it internally (unless that’s what you’re testing)