sklab evaluate#

Static checks plus LLM quality review with 0–100 scoring.

Also available over HTTP. The Evaluate Skills endpoint runs the same logic on the server against a GitHub repository — useful when you want results in a browser, a CI job without installing Python, or an agent that reaches skill-lab.dev directly.

Usage#

bash
sklab evaluate [SKILL_PATH] [OPTIONS]

Runs 37 static checks across Structure, Naming, Description, Content, and Security, then sends the skill to an LLM judge that scores it on 9 criteria across Activation and Instruction axes. Use --skip-review for a static-only run, or --format json to emit the same payload shape as the /v1/evaluate endpoint.

Arguments#

ArgumentRequiredDescription
SKILL_PATHnoPath to the skill directory. Defaults to the current directory.

Options#

FlagValueDescription
--output, -o<PATH>Write the report to a file (implies --format json if --format is not set).
--format, -fjson|console
default: console
Output format.
--verbose, -VflagShow all checks (including passing ones) and LLM reasoning.
--spec-only, -sflagRun only the checks required by the Agent Skills spec.
--all, -aflagDiscover and evaluate every skill under the current directory.
--repoflagDiscover and evaluate every skill from the git repo root.
--skip-reviewflagSkip the LLM judge (static checks only).
--model, -m<MODEL_ID>
default: claude-haiku-4-5-20251001
Model for the LLM judge. Supports Anthropic, OpenAI (gpt-*), and Gemini (gemini-*) models — provider auto-detected from the prefix.
--optimizeflagAutomatically chain into sklab optimize after evaluation (no interactive prompt).

Examples#

Evaluate one skill

bash
$ sklab evaluate ./my-skill

JSON report to disk

bash
$ sklab evaluate ./my-skill -f json -o report.json

Static checks only (no API key)

bash
$ sklab evaluate ./my-skill --skip-review

Every skill in the current repo

bash
$ sklab evaluate --repo

Evaluate then optimize in one step

bash
$ sklab evaluate ./my-skill --optimize

Output#

Console rendering groups checks by dimension with pass/fail status and the LLM judge's per-criterion scores. With --format json, the output matches the /v1/repos/{owner}/{repo}/evaluate response payload.

Exit Codes#

CodeMeaning
0All high-severity checks passed.
1One or more checks failed, or a CLI error occurred.

Notes#

  • LLM review requires ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY. The env var is selected from the model prefix.
  • --all and --repo are mutually exclusive, and cannot be combined with a positional SKILL_PATH.