Sub-Session Efficiency (SSE)

Weight in Skill: 45% — the heaviest input. SSE is also the headline Efficiency score shown on the /prism hub.

SSE measures whether a sub-session (a coherent block of related turns) produced a good outcome for what it cost — tokens, turns, human time, and response time — all measured against your own baseline, not a hard-coded target.

Formula

SSE_raw     = Σ wᵢ · clip( ln( baselineᵢ / actualᵢ ), −1, +1 )
SSE_display = asymmetric_map(SSE_raw)         (0–10 scale, centered at 7.5)

The four axes and default weights (apps/prism-engine/src/intelligence/efficiency/formula.rs):

Axis	Weight	Better when…
Tokens	30%	Fewer tokens for the same outcome
Turns	25%	Fewer turns to reach a solution
Human time	30%	Less wall-clock time per sub-session
Response time	15%	Faster model responses

Why log-ratios

Each axis is compared to baseline via ln(baseline / actual). Positive means better than baseline, negative means worse. Log-space makes a 2× improvement and a 2× regression exactly symmetric in magnitude — a linear ratio would let a single great run cancel out three bad ones.

Why clipping

Each axis is clipped to [−1, +1] in log-space (≈ 0.37× to 2.7× in linear) before the weighted mean. Without clipping, a single extreme outlier on one axis could dominate the whole score. Clipping keeps any single axis from owning the number.

Why asymmetric display mapping

Once SSE_raw is computed, it maps to the 0–10 display scale via:

SSE_raw ≥ 0   →  min(7.5 + 5.0 · SSE_raw, 10.0)
SSE_raw < 0   →  max(7.5 + 7.5 · SSE_raw,  0.0)

Regressions fall faster than improvements rise. The baseline reads as B (7.5) — not as a middle-of-the-road 5 — because consistently matching your own baseline is already a B grade.

Baselines

Your baseline is derived from your own recent history (roughly the last 8 weeks, populated by the baseline worker in apps/prism-engine/src/intelligence/efficiency/baseline_populator.rs). New developers without enough history fall back to an organization-level baseline until their own settles.

What moves SSE

In rough order of what we see most often:

Bundled asks inflate token and turn counts — split them
Retry storms (same prompt re-issued after failure) burn tokens with no outcome — add constraints
Context bloat — long sessions without /compact or /clear degrade response quality and raise cost per turn
Model overkill — Opus for typo fixes — raises tokens spent without improving outcome
Verification after-the-fact — fixing what wasn’t checked raises turn counts

Improving PES and IE both tend to improve SSE too, which is why they’re de-weighted inside Skill.