Skill Lab · the evaluation layer

Prove every skill is worth shipping.

Skill Lab grades every SKILL.md in a public GitHub repo against 37 quality and security checks — and can rewrite the failing ones for you. No clone, no sign-up.

The trick

Swap one word in any GitHub URL.

That's the entire onboarding. Works on any public repo containing SKILL.md files.

→ try anthropics/skills
§ I
How it works

Three steps, all real endpoints.

Skill Lab reads every SKILL.md in a public GitHub repo, runs the same 37 checks the sklab CLI runs locally, then lets you call LLM-powered judge, optimize, and triggers passes on demand.

01GET /v1/repos/:o/:r/evaluate

Scan

Paste any GitHub URL. Skill Lab fetches every SKILL.md via the GitHub API — no clone, no sign-up. Results cache by commit SHA.

0237 rules · 5 dimensions

Check

Structure, naming, description, content, security. Every failure ships with a severity and a one-line fix.

03POST /v1/.../optimize

Improve

Optional LLM passes: a judge verdict, an optimize rewrite that lifts the score, and a triggers test plan. All returned as JSON for CI.

§ II
Setting the bar

How real skills score.

Six public skills from anthropics/skills, re-evaluated on every deploy. We pick the skills — the scores are whatever the scanner finds.

scanned May 21, 2026 · refreshed on every deploy
§ III
What 'optimize' actually does

One call, a rewritten SKILL.md.

POST /v1/repos/:o/:r/optimize returns the original and a higher-scoring rewrite, plus the deltas. Below is a frozen example for a deliberately weak refund-handler skill — illustrative numbers, real response shape.

Original
score 46failures 19
---
name: Refund Handler
description:
---

Handle customer refund requests. Look up the order, check the refund
policy, and issue a refund if eligible.
Optimized
score 89failures 4
---
name: refund-handler
description: Use when a customer asks for a refund. Looks up the order, applies the refund policy, and either issues the refund or routes the request for human review.
---

# Refund Handler

Use this skill when a customer requests a refund. The skill verifies
eligibility against the refund policy and either issues the refund
directly or escalates to a human reviewer.

## When to use

- Customer explicitly asks for a refund, return, or money back
- A previous order had a defect, shipping issue, or pricing error

## Inputs

- `order_id` — required. The order being refunded.
- `reason` — customer-supplied; preserved verbatim for audit.

## Steps

1. Look up the order via `scripts/get_order.py`.
2. Check eligibility: within 30 days, marked delivered, not previously refunded.
3. If eligible, issue the refund via the payments API.
4. Otherwise, escalate to a human reviewer with the reason and order summary.

## Example

```
> Refund order 81022 — wrong size
✓ Eligible · refunded $42.00 to original payment method
```

## Safety

- Never refund without a verified order ID.
- Do not promise refund amounts before eligibility passes.
Δ score
+43
fewer failures
15
checks that flipped
description.not-emptynaming.formatcontent.description-actionablecontent.has-examplescontent.scripts-referenced
§ IV
Web ↔ CLI

Same checks, on a server or your laptop.

sklab is the CLI that ships with the skill-lab PyPI package — same 37checks, same judge, same optimizer. Run it in CI or on a directory that hasn't been pushed yet.

# scan a repo from anywhere
curl https://api.skill-lab.dev/v1/repos/anthropics/skills/evaluate

# or just open it in a browser
open https://skill-lab.dev/anthropics/skills
§ try it

Stop shipping skills you can't measure.

Paste any public GitHub repo with SKILL.md files and Skill Lab will scan it against the rubric.

or pip install skill-lab for local runs