Skill

Skill answers one question: how well are you prompting? Every scored prompt gets a letter grade from the LLM judge. The judge looks at seven habits and marks each as present or missing. Your Skill score is the average prompt grade across the window, on a 0–10 scale.

Headline — APG

Average Prompt Grade. The mean grade across every prompt the judge marked ok in this window.

Supporting numbers

Metric	What it tells you
Δ-APG	Change vs the previous window of the same length — are you prompting better or worse?
Worst pillar	The habit group with the lowest hit rate across the last 30 days. This is what to work on first.

The four pillars

The seven rubric checks group into four pillars:

Pillar	Question it asks
Clarity	Did you tell the model what you actually want?
Context	Did you give the model what it needs to answer?
Verification	Did you ask the model to check its work?
Workflow	Did you keep the session tight and on track?

The seven checks

Check	Hint
Goal explicit	States the outcome in concrete, observable terms
Scope bounded	Limits files, features, or surface area touched
References concrete	Cites files, lines, or `@`-references the model can open
Context sufficient	Includes errors, examples, or artifacts the model needs
Verification requested	Asks for tests, expected output, or success criteria
Root-cause oriented	Targets the underlying cause, not just the symptom
Plan first	Opens non-trivial work with an explore-or-plan turn before code

Not every check applies to every prompt — the judge marks each as applicable based on intent (fix, plan, explore, etc.).

What is excluded

Prompts where the judge could not parse a verdict (judge_status ≠ ok) are excluded so a single bad grading run doesn’t drag your score around.

Bands

Poor · Weak · Fair · Good. The threshold for the B baseline is 7.0 — the same anchor PRISM uses.

Why it matters

Speed measures throughput; Skill measures how clean the inputs were that produced it. High Speed with low Skill usually means you are getting lucky — improving Skill is what makes the throughput repeatable.