Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

Sub-Session Efficiency (SSE)

Weight in Skill: 45% — the heaviest input. SSE is also the headline Efficiency score shown on the /prism hub.

SSE measures whether a sub-session (a coherent block of related turns) produced a good outcome for what it cost — tokens, turns, human time, and response time — all measured against your own baseline, not a hard-coded target.

SSE_raw = Σ wᵢ · clip( ln( baselineᵢ / actualᵢ ), −1, +1 )
SSE_display = asymmetric_map(SSE_raw) (0–10 scale, centered at 7.5)

The four axes and default weights (apps/prism-engine/src/intelligence/efficiency/formula.rs):

AxisWeightBetter when…
Tokens30%Fewer tokens for the same outcome
Turns25%Fewer turns to reach a solution
Human time30%Less wall-clock time per sub-session
Response time15%Faster model responses

Each axis is compared to baseline via ln(baseline / actual). Positive means better than baseline, negative means worse. Log-space makes a 2× improvement and a 2× regression exactly symmetric in magnitude — a linear ratio would let a single great run cancel out three bad ones.

Each axis is clipped to [−1, +1] in log-space (≈ 0.37× to 2.7× in linear) before the weighted mean. Without clipping, a single extreme outlier on one axis could dominate the whole score. Clipping keeps any single axis from owning the number.

Once SSE_raw is computed, it maps to the 0–10 display scale via:

SSE_raw ≥ 0 → min(7.5 + 5.0 · SSE_raw, 10.0)
SSE_raw < 0 → max(7.5 + 7.5 · SSE_raw, 0.0)

Regressions fall faster than improvements rise. The baseline reads as B (7.5) — not as a middle-of-the-road 5 — because consistently matching your own baseline is already a B grade.

Your baseline is derived from your own recent history (roughly the last 8 weeks, populated by the baseline worker in apps/prism-engine/src/intelligence/efficiency/baseline_populator.rs). New developers without enough history fall back to an organization-level baseline until their own settles.

In rough order of what we see most often:

  1. Bundled asks inflate token and turn counts — split them
  2. Retry storms (same prompt re-issued after failure) burn tokens with no outcome — add constraints
  3. Context bloat — long sessions without /compact or /clear degrade response quality and raise cost per turn
  4. Model overkill — Opus for typo fixes — raises tokens spent without improving outcome
  5. Verification after-the-fact — fixing what wasn’t checked raises turn counts

Improving PES and IE both tend to improve SSE too, which is why they’re de-weighted inside Skill.