Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

Measurement

The PRISM Score is built from four facts written for every session. This page explains where each fact comes from, how the engine decides when a session starts and ends, and which LLM agents are involved.

For the formula and the per-page breakdown, see PRISM Score. For the full algorithm spec, see Algorithm Overview.

Every closed session writes four facts. The score then combines them.

FactSourceWhat it answers
Substance floorDeterministic detectorDid real work happen? — ≥3 turns OR ≥10 net lines of code OR ≥1 mutating tool call
Goal completeLLM outcome judgeDid the session land its goal under per-intent criteria?
ReworkDeterministic detectorDid a later session revert or rewrite this one?
Intent establishedLLM rubric judgeDid the session commit to a clear class (Question, Bug fix, Feature, etc.)?

A session crushes when all four land in the right state. See PRISM Score → What “crushed” means.

A deterministic filter — no LLM is involved. The floor catches sessions that didn’t do real work and stops them from padding the denominator.

A session passes the floor if any of these is true:

  • ≥3 turns between user and assistant.
  • ≥10 net lines of code added (across all files touched).
  • ≥1 mutating tool call — Edit, Write, Bash with side effects, etc.

Sessions that fail the floor land on the Trivia page. They aren’t penalized — they’re filtered out of both sides of the ratio.

The outcome judge is one LLM call per session. It takes the transcript plus the rubric judge’s intent classification, and decides goal_complete against intent-specific rules:

Intentgoal_complete = TRUE when…Silent-completion floor
QuestionUser accepted the answer (no follow-up clarification)0.50
InvestigationReached a stated conclusion0.60
ReviewProduced an actionable verdict0.60
Plan / SpecPlan accepted in-session0.60
Small changeChange applied + no immediate revert0.50
Bug fixFix applied + verification evidence (test pass, error gone, repro confirmed)0.75
FeatureScaffolded matching scope + acceptance + (tests OR explicit “tests later”)0.75
RefactorBehavior-preservation evidence (tests green, lint clean, type check passes)0.75

The silent-completion floor is the confidence the judge needs from circumstantial evidence alone when no explicit “this worked” signal is present. Bug fixes, features, and refactors require the highest confidence — a fix without a verifying test usually won’t crush.

A deterministic check that runs after the session has closed. If a later session reverts or rewrites the same code, the earlier session is downgraded — it didn’t land, even if the outcome judge thought it did. This stops “claim it works, fix it tomorrow” from inflating the score.

A session closes when any of these fire — never on time alone.

SignalDetection
/clearOTel event from the plugin — explicit context wipe
Topic shiftLLM comparison of consecutive user turns; closes if the topic differs materially
Git commitA commit lands touching files the session modified

/compact is not a boundary. It compresses the transcript but preserves the goal. Use /compact when you want more context room; use /clear when you’re starting something new.

/clear can fire mid-task — someone clears to get the context window back, then keeps working. A post-close merger rejoins sessions when both are true:

  • The next session starts within 10 minutes of the close.
  • They touch ≥50% of the same files.

The merger is what makes /clear safe to treat as a hard boundary. Without it, /clear-spam would tank the score of diligent developers.

Four LLM agents back v3. All share one contract (Haiku 4.5 default, temperature = 0, JSON-only output, judge version pinned per persisted row):

#AgentJob
1Sub-Session TrackerTopic-shift boundary detection
2Language DetectionTags the session’s primary language
3Rubric JudgeIntent classification + 7-boolean rubric + letter grade + title and summary
4Outcome Judgegoal_complete against per-intent criteria

Every transcript that leaves the engine goes through the existing redaction path before any agent sees it. Tenant opt-out is honored.

Every page in the PRISM Score v3 group updates within a few minutes of a session closing. The scoring pipeline is event-driven — it runs as soon as the engine sees the boundary fire, not on a timer.

For the full data pipeline (ingest → NATS → S3 → DataFusion → scoring), see Architecture.