Sub-Session Efficiency (SSE)
Weight in Skill: 45% — the heaviest input. SSE is also the headline Efficiency score shown on the /prism hub.
SSE measures whether a sub-session (a coherent block of related turns) produced a good outcome for what it cost — tokens, turns, human time, and response time — all measured against your own baseline, not a hard-coded target.
Formula
Section titled “Formula”SSE_raw = Σ wᵢ · clip( ln( baselineᵢ / actualᵢ ), −1, +1 )SSE_display = asymmetric_map(SSE_raw) (0–10 scale, centered at 7.5)The four axes and default weights (apps/prism-engine/src/intelligence/efficiency/formula.rs):
| Axis | Weight | Better when… |
|---|---|---|
| Tokens | 30% | Fewer tokens for the same outcome |
| Turns | 25% | Fewer turns to reach a solution |
| Human time | 30% | Less wall-clock time per sub-session |
| Response time | 15% | Faster model responses |
Why log-ratios
Section titled “Why log-ratios”Each axis is compared to baseline via ln(baseline / actual). Positive means better than baseline, negative means worse. Log-space makes a 2× improvement and a 2× regression exactly symmetric in magnitude — a linear ratio would let a single great run cancel out three bad ones.
Why clipping
Section titled “Why clipping”Each axis is clipped to [−1, +1] in log-space (≈ 0.37× to 2.7× in linear) before the weighted mean. Without clipping, a single extreme outlier on one axis could dominate the whole score. Clipping keeps any single axis from owning the number.
Why asymmetric display mapping
Section titled “Why asymmetric display mapping”Once SSE_raw is computed, it maps to the 0–10 display scale via:
SSE_raw ≥ 0 → min(7.5 + 5.0 · SSE_raw, 10.0)SSE_raw < 0 → max(7.5 + 7.5 · SSE_raw, 0.0)Regressions fall faster than improvements rise. The baseline reads as B (7.5) — not as a middle-of-the-road 5 — because consistently matching your own baseline is already a B grade.
Baselines
Section titled “Baselines”Your baseline is derived from your own recent history (roughly the last 8 weeks, populated by the baseline worker in apps/prism-engine/src/intelligence/efficiency/baseline_populator.rs). New developers without enough history fall back to an organization-level baseline until their own settles.
What moves SSE
Section titled “What moves SSE”In rough order of what we see most often:
- Bundled asks inflate token and turn counts — split them
- Retry storms (same prompt re-issued after failure) burn tokens with no outcome — add constraints
- Context bloat — long sessions without
/compactor/cleardegrade response quality and raise cost per turn - Model overkill — Opus for typo fixes — raises tokens spent without improving outcome
- Verification after-the-fact — fixing what wasn’t checked raises turn counts
Improving PES and IE both tend to improve SSE too, which is why they’re de-weighted inside Skill.