Skill
Skill answers one question: how well are you prompting? Every scored prompt gets a letter grade from the LLM judge. The judge looks at seven habits and marks each as present or missing. Your Skill score is the average prompt grade across the window, on a 0–10 scale.
Headline — APG
Section titled “Headline — APG”Average Prompt Grade. The mean grade across every prompt the judge marked ok in this window.
Supporting numbers
Section titled “Supporting numbers”| Metric | What it tells you |
|---|---|
| Δ-APG | Change vs the previous window of the same length — are you prompting better or worse? |
| Worst pillar | The habit group with the lowest hit rate across the last 30 days. This is what to work on first. |
The four pillars
Section titled “The four pillars”The seven rubric checks group into four pillars:
| Pillar | Question it asks |
|---|---|
| Clarity | Did you tell the model what you actually want? |
| Context | Did you give the model what it needs to answer? |
| Verification | Did you ask the model to check its work? |
| Workflow | Did you keep the session tight and on track? |
The seven checks
Section titled “The seven checks”| Check | Hint |
|---|---|
| Goal explicit | States the outcome in concrete, observable terms |
| Scope bounded | Limits files, features, or surface area touched |
| References concrete | Cites files, lines, or @-references the model can open |
| Context sufficient | Includes errors, examples, or artifacts the model needs |
| Verification requested | Asks for tests, expected output, or success criteria |
| Root-cause oriented | Targets the underlying cause, not just the symptom |
| Plan first | Opens non-trivial work with an explore-or-plan turn before code |
Not every check applies to every prompt — the judge marks each as applicable based on intent (fix, plan, explore, etc.).
What is excluded
Section titled “What is excluded”Prompts where the judge could not parse a verdict (judge_status ≠ ok) are excluded so a single bad grading run doesn’t drag your score around.
Poor · Weak · Fair · Good. The threshold for the B baseline is 7.0 — the same anchor PRISM uses.
Why it matters
Section titled “Why it matters”Speed measures throughput; Skill measures how clean the inputs were that produced it. High Speed with low Skill usually means you are getting lucky — improving Skill is what makes the throughput repeatable.