Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

Skill

Skill answers one question: how well are you prompting? Every scored prompt gets a letter grade from the LLM judge. The judge looks at seven habits and marks each as present or missing. Your Skill score is the average prompt grade across the window, on a 0–10 scale.

Average Prompt Grade. The mean grade across every prompt the judge marked ok in this window.

MetricWhat it tells you
Δ-APGChange vs the previous window of the same length — are you prompting better or worse?
Worst pillarThe habit group with the lowest hit rate across the last 30 days. This is what to work on first.

The seven rubric checks group into four pillars:

PillarQuestion it asks
ClarityDid you tell the model what you actually want?
ContextDid you give the model what it needs to answer?
VerificationDid you ask the model to check its work?
WorkflowDid you keep the session tight and on track?
CheckHint
Goal explicitStates the outcome in concrete, observable terms
Scope boundedLimits files, features, or surface area touched
References concreteCites files, lines, or @-references the model can open
Context sufficientIncludes errors, examples, or artifacts the model needs
Verification requestedAsks for tests, expected output, or success criteria
Root-cause orientedTargets the underlying cause, not just the symptom
Plan firstOpens non-trivial work with an explore-or-plan turn before code

Not every check applies to every prompt — the judge marks each as applicable based on intent (fix, plan, explore, etc.).

Prompts where the judge could not parse a verdict (judge_status ≠ ok) are excluded so a single bad grading run doesn’t drag your score around.

Poor · Weak · Fair · Good. The threshold for the B baseline is 7.0 — the same anchor PRISM uses.

Speed measures throughput; Skill measures how clean the inputs were that produced it. High Speed with low Skill usually means you are getting lucky — improving Skill is what makes the throughput repeatable.