sklab evaluate#
Static checks plus LLM quality review with 0–100 scoring.
Also available over HTTP. The Evaluate Skills endpoint runs the same logic on the server against a GitHub repository — useful when you want results in a browser, a CI job without installing Python, or an agent that reaches skill-lab.dev directly.
Usage#
bash
sklab evaluate [SKILL_PATH] [OPTIONS]Runs 37 static checks across Structure, Naming, Description, Content, and Security, then sends the skill to an LLM judge that scores it on 9 criteria across Activation and Instruction axes. Use --skip-review for a static-only run, or --format json to emit the same payload shape as the /v1/evaluate endpoint.
Arguments#
| Argument | Required | Description |
|---|---|---|
SKILL_PATH | no | Path to the skill directory. Defaults to the current directory. |
Options#
| Flag | Value | Description |
|---|---|---|
--output, -o | <PATH> | Write the report to a file (implies --format json if --format is not set). |
--format, -f | json|consoledefault: console | Output format. |
--verbose, -V | flag | Show all checks (including passing ones) and LLM reasoning. |
--spec-only, -s | flag | Run only the checks required by the Agent Skills spec. |
--all, -a | flag | Discover and evaluate every skill under the current directory. |
--repo | flag | Discover and evaluate every skill from the git repo root. |
--skip-review | flag | Skip the LLM judge (static checks only). |
--model, -m | <MODEL_ID>default: claude-haiku-4-5-20251001 | Model for the LLM judge. Supports Anthropic, OpenAI (gpt-*), and Gemini (gemini-*) models — provider auto-detected from the prefix. |
--optimize | flag | Automatically chain into sklab optimize after evaluation (no interactive prompt). |
Examples#
Evaluate one skill
bash
$ sklab evaluate ./my-skillJSON report to disk
bash
$ sklab evaluate ./my-skill -f json -o report.jsonStatic checks only (no API key)
bash
$ sklab evaluate ./my-skill --skip-reviewEvery skill in the current repo
bash
$ sklab evaluate --repoEvaluate then optimize in one step
bash
$ sklab evaluate ./my-skill --optimizeOutput#
Console rendering groups checks by dimension with pass/fail status and the LLM judge's per-criterion scores. With --format json, the output matches the /v1/repos/{owner}/{repo}/evaluate response payload.
Exit Codes#
| Code | Meaning |
|---|---|
0 | All high-severity checks passed. |
1 | One or more checks failed, or a CLI error occurred. |
Notes#
- LLM review requires ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY. The env var is selected from the model prefix.
- --all and --repo are mutually exclusive, and cannot be combined with a positional SKILL_PATH.