Iteration Efficiency (IE)

Weight in Skill: 15%. IE is the only Skill input that reads Layer 2 purely from turn counts and wall time — no LLM rubric, no retry classifier. It captures a single thing: how many user prompts you issue per hour of active coding time, against your own baseline.

The signal

tph = user_prompt_count / active_cli_hours

Lower tph means each prompt drove more autonomous Claude work (longer chains, more tool use per prompt). Higher tph means you’re sending shorter, more frequent prompts — typically a sign of retry storms or over-fragmented asks.

The formula

IE is band-centered at 5.0 (parity with your rolling median) rather than baseline-B like SSE. Score rises when your current tph drops below baseline, falls when it rises above.

IE = 5 + 5 × clip( ln(baseline_tph / current_tph), −1, 1 )

clip is applied in log-space to [−1, +1] (≈ 0.37× to 2.7× in linear) so a single extreme window can’t swing the number. Source: apps/prism-engine/src/metrics/layer2.rs::iteration_efficiency.

Score	Reading
> 5	Current tph is below your own baseline — fewer, more productive prompts
= 5	At baseline (or still in the learning window)
< 5	tph risen above baseline — more prompts per hour than you usually send

Baseline window

Your baseline is the median of trailing same-kind periods (weeks for a weekly view, days for a daily view):

First 4 weeks of telemetry — 2-period trailing window
After 4 weeks — 8-period trailing window

Until enough trailing periods exist, IE returns a neutral 5.0 and the dashboard shows a “learning” banner instead of a score. No active CLI work in the current window also returns 5.0 with the same banner.

What moves IE up (lower tph is better)

Constrain on failure — when a prompt misses, add constraints to the next prompt rather than resending the same ask. Fewer prompts, same outcome.
Use plan mode for non-trivial work — one planning turn up front cuts exploratory tph.
Let chains finish — one prompt that triggers five tool calls is better for IE than five one-shot prompts.
Include the error in the next prompt — avoids a retry-same-prompt pattern that doubles your tph.
Split only when the split is real — bundling related sub-steps into one prompt keeps tph down; splitting unrelated tasks keeps SSE up. Don’t over-fragment.

IE vs. PES vs. SSE

IE, PES, and SSE all move together for most developers — a clearer prompt needs fewer follow-ups, which lowers tph, which lifts IE, and the corresponding sub-session completes with fewer turns, which lifts SSE. That overlap is why IE is weighted 15% inside Skill while SSE (the outcome) carries 45% and PES (the leading predictor) carries 20% — Skill would double-count behavior if IE were weighted higher.

A divergence pattern to watch for: high IE, low SSE. That typically means long, autonomous chains are running but not producing verified results — the prompts are few but the outputs don’t land. Usually a sign of missing constraints or unverified acceptance.