Measurement
The PRISM Score is built from four facts written for every session. This page explains where each fact comes from, how the engine decides when a session starts and ends, and which LLM agents are involved.
For the formula and the per-page breakdown, see PRISM Score. For the full algorithm spec, see Algorithm Overview.
The four facts
Section titled “The four facts”Every closed session writes four facts. The score then combines them.
| Fact | Source | What it answers |
|---|---|---|
| Substance floor | Deterministic detector | Did real work happen? — ≥3 turns OR ≥10 net lines of code OR ≥1 mutating tool call |
| Goal complete | LLM outcome judge | Did the session land its goal under per-intent criteria? |
| Rework | Deterministic detector | Did a later session revert or rewrite this one? |
| Intent established | LLM rubric judge | Did the session commit to a clear class (Question, Bug fix, Feature, etc.)? |
A session crushes when all four land in the right state. See PRISM Score → What “crushed” means.
Substance floor
Section titled “Substance floor”A deterministic filter — no LLM is involved. The floor catches sessions that didn’t do real work and stops them from padding the denominator.
A session passes the floor if any of these is true:
- ≥3 turns between user and assistant.
- ≥10 net lines of code added (across all files touched).
- ≥1 mutating tool call — Edit, Write, Bash with side effects, etc.
Sessions that fail the floor land on the Trivia page. They aren’t penalized — they’re filtered out of both sides of the ratio.
Outcome judge — per-intent criteria
Section titled “Outcome judge — per-intent criteria”The outcome judge is one LLM call per session. It takes the transcript plus the rubric judge’s intent classification, and decides goal_complete against intent-specific rules:
| Intent | goal_complete = TRUE when… | Silent-completion floor |
|---|---|---|
| Question | User accepted the answer (no follow-up clarification) | 0.50 |
| Investigation | Reached a stated conclusion | 0.60 |
| Review | Produced an actionable verdict | 0.60 |
| Plan / Spec | Plan accepted in-session | 0.60 |
| Small change | Change applied + no immediate revert | 0.50 |
| Bug fix | Fix applied + verification evidence (test pass, error gone, repro confirmed) | 0.75 |
| Feature | Scaffolded matching scope + acceptance + (tests OR explicit “tests later”) | 0.75 |
| Refactor | Behavior-preservation evidence (tests green, lint clean, type check passes) | 0.75 |
The silent-completion floor is the confidence the judge needs from circumstantial evidence alone when no explicit “this worked” signal is present. Bug fixes, features, and refactors require the highest confidence — a fix without a verifying test usually won’t crush.
Rework detection
Section titled “Rework detection”A deterministic check that runs after the session has closed. If a later session reverts or rewrites the same code, the earlier session is downgraded — it didn’t land, even if the outcome judge thought it did. This stops “claim it works, fix it tomorrow” from inflating the score.
Session boundaries
Section titled “Session boundaries”A session closes when any of these fire — never on time alone.
| Signal | Detection |
|---|---|
/clear | OTel event from the plugin — explicit context wipe |
| Topic shift | LLM comparison of consecutive user turns; closes if the topic differs materially |
| Git commit | A commit lands touching files the session modified |
/compact is not a boundary. It compresses the transcript but preserves the goal. Use /compact when you want more context room; use /clear when you’re starting something new.
Anti-fragmentation merge
Section titled “Anti-fragmentation merge”/clear can fire mid-task — someone clears to get the context window back, then keeps working. A post-close merger rejoins sessions when both are true:
- The next session starts within 10 minutes of the close.
- They touch ≥50% of the same files.
The merger is what makes /clear safe to treat as a hard boundary. Without it, /clear-spam would tank the score of diligent developers.
The LLM agents
Section titled “The LLM agents”Four LLM agents back v3. All share one contract (Haiku 4.5 default, temperature = 0, JSON-only output, judge version pinned per persisted row):
| # | Agent | Job |
|---|---|---|
| 1 | Sub-Session Tracker | Topic-shift boundary detection |
| 2 | Language Detection | Tags the session’s primary language |
| 3 | Rubric Judge | Intent classification + 7-boolean rubric + letter grade + title and summary |
| 4 | Outcome Judge | goal_complete against per-intent criteria |
Every transcript that leaves the engine goes through the existing redaction path before any agent sees it. Tenant opt-out is honored.
Update cadence
Section titled “Update cadence”Every page in the PRISM Score v3 group updates within a few minutes of a session closing. The scoring pipeline is event-driven — it runs as soon as the engine sees the boundary fire, not on a timer.
For the full data pipeline (ingest → NATS → S3 → DataFusion → scoring), see Architecture.