Architecture
Optra Prism is three stacked measurement layers and one parallel intelligence pipeline. Layers are numbered bottom-up by data flow — signals rise from Layer 0 through Layer 1 into Layer 2 — and each layer holds exactly one kind of thing.
flowchart LR
L0["Layer 0 — Telemetry<br/>OpenTelemetry from Claude Code"] --> L1["Layer 1 — Measurement<br/>Metrics · PES · SSE"]
L1 --> L2["Layer 2 — Prism Score<br/>Speed · Skill · Efficiency"]
L1 -.-> PIQ["Insight<br/>Intent-adaptive agents"]
PIQ -.-> L1
style L0 fill:#22c55e,stroke:#16a34a,color:#fff
style L1 fill:#f59e0b,stroke:#d97706,color:#fff
style L2 fill:#8b5cf6,stroke:#7c3aed,color:#fff
style PIQ fill:#a855f7,stroke:#9333ea,color:#fff
Layer 0 — Telemetry
Section titled “Layer 0 — Telemetry”The single source of truth. Claude Code emits cost, tokens, events, links, and labels via OpenTelemetry. No git webhook, no repo scanner, no CI integration.
Three flags need to be set for full functionality:
CLAUDE_CODE_ENABLE_TELEMETRY=1OTEL_LOG_USER_PROMPTS=1OTEL_LOG_TOOL_DETAILS=1Without OTEL_LOG_USER_PROMPTS, Prompt Efficiency scoring and sub-session detection both run in degraded mode — the prompt text isn’t available to the rubric agents. Pure-counter metrics (tokens, turns, active time) are unaffected. Without OTEL_LOG_TOOL_DETAILS, Quality Retention file-path extraction breaks; sub-session efficiency still runs on timestamps and token counters.
The /prism:setup flow wires these flags up automatically.
Layer 1 — Measurement
Section titled “Layer 1 — Measurement”Three parallel verticals, each measuring something structurally different:
- Metrics — aggregations over sub-session or week. Iteration Efficiency (IE), Context Reset Rate (CRR), Flow Continuity (FC), Response Latency Ratio, Quality Retention, Weekly Token Usage.
- Prompt Efficiency Score (PES) — per-prompt anchor. The Efficiency Rubric combines four LLM-judged dimensions — Context Leverage, Information Density, Turn Economy, Ambiguity Cost — into a single 0–10 score. Language-agnostic by construction: the agents judge semantic content, so prompts in any human language score the same for the same underlying behavior.
- Sub-Session Efficiency Score (SSE) — ground truth. Weighted log-mean of four baseline ratios (tokens, turns, human think time, response latency), clipped per-axis and mapped to 0–10 with a B-centered asymmetric curve. One goal = one sub-session = one score.
The two scoring verticals form a calibration pair: the per-prompt score predicts, the sub-session score verifies. Target correlation is Pearson ≥ 0.65 on a rolling window.
Layer 2 — The Prism Score
Section titled “Layer 2 — The Prism Score”Three developer-facing scores, derived from Layer 1 by pure arithmetic — no LLM judgment at this layer.
| Score | Formula (summary) | Unit |
|---|---|---|
| Speed | Σ(active_cli_seconds) / 3600, discounted when Quality Retention < 85% | Hours/week |
| Skill | 100 × (0.45·SSE + 0.20·PES + 0.15·IE + 0.10·CRR + 0.10·FC) | 0–100 |
| Efficiency | Σ(tokens) / Σ(active_cli_hours) — lower is better | Tokens/hour |
See PRISM Scores for the full definitions and tier tables.
Insight — intelligence pipeline
Section titled “Insight — intelligence pipeline”Insight runs alongside the layers, not above them. It’s a multi-agent pipeline that turns raw prompts into the per-prompt score that feeds Layer 1:
- Language detection + intent classification — classifies each prompt into one of seven intents (
new_code,fix,refactor,question,meta,continuation,system_callback), with confidence. - Rubric routing — intents that describe actionable work are routed to an intent-specific rubric agent (authoring, debugging, planning). Intents like
question,meta,continuation, andsystem_callbackshort-circuit and don’t get rubric-scored. - Rubric scoring — the rubric agent scores the four PES dimensions on a 0–10 scale.
- Aggregation + confidence — the four dimensions combine into the PES score; a confidence value is stored alongside for audit.
Everything Insight produces is auditable end-to-end: each agent’s output is logged so a human can see why a prompt scored the way it did.
How it flows end-to-end
Section titled “How it flows end-to-end”sequenceDiagram
participant Dev as You
participant Plugin as Prism Plugin
participant Ingest as Ingest Service
participant Engine as Prism Engine
participant Dash as Dashboard
Dev->>Plugin: Write a prompt
Plugin-->>Dev: Single-line nudge (if needed)
Plugin->>Ingest: Send OTLP + prompt text
Ingest->>Engine: NATS publish
Engine->>Engine: Parquet write → Insight → SSE → Skill
Dev->>Dash: Open /prism
Dash-->>Dev: Speed / Skill / Efficiency
Everything the dashboard shows is recomputable from S3 Parquet + Postgres — no scores live only in memory.
Authentication
Section titled “Authentication”All communication uses your gck_* API key:
- The plugin includes your key on every request to Ingest
- Ingest validates the key and associates data with your organization
- Dashboard access uses Supabase login (email/password or OAuth)
Your key is stored locally in ~/.prism/config.json with restricted file permissions.
Two distinct gck_* keys
Section titled “Two distinct gck_* keys”There are two kinds of gck_* keys in the system, and they do different jobs:
| Key | Who holds it | What it’s used for |
|---|---|---|
| Plugin key | You, locally in ~/.prism/config.json | Authenticates your Claude Code traffic when it’s redirected through the Optra gateway — the one you set with /prism:setup |
| Platform key | The Prism Engine, as OPTRA_GATEWAY_KEY in its environment | Authenticates server-side LLM calls for rubric scoring, session summaries, and the dashboard advisor |
The two are not interchangeable. The plugin key is scoped to your developer identity and gateway governance; the platform key is a service credential the engine uses to call the Optra gateway on your behalf. Mixing them up is the usual cause of “gateway 401” errors during engine boot.