Architecture

Optra Prism is three stacked measurement layers and one parallel intelligence pipeline. Layers are numbered bottom-up by data flow — signals rise from Layer 0 through Layer 1 into Layer 2 — and each layer holds exactly one kind of thing.

flowchart LR
    L0["Layer 0 — Telemetry<br/>OpenTelemetry from Claude Code"] --> L1["Layer 1 — Measurement<br/>Metrics · PES · SSE"]
    L1 --> L2["Layer 2 — Prism Score<br/>Speed · Skill · Efficiency"]
    L1 -.-> PIQ["Insight<br/>Intent-adaptive agents"]
    PIQ -.-> L1

    style L0 fill:#22c55e,stroke:#16a34a,color:#fff
    style L1 fill:#f59e0b,stroke:#d97706,color:#fff
    style L2 fill:#8b5cf6,stroke:#7c3aed,color:#fff
    style PIQ fill:#a855f7,stroke:#9333ea,color:#fff

Layer 0 — Telemetry

The single source of truth. Claude Code emits cost, tokens, events, links, and labels via OpenTelemetry. No git webhook, no repo scanner, no CI integration.

Three flags need to be set for full functionality:

CLAUDE_CODE_ENABLE_TELEMETRY=1
OTEL_LOG_USER_PROMPTS=1
OTEL_LOG_TOOL_DETAILS=1

Without OTEL_LOG_USER_PROMPTS, Prompt Efficiency scoring and sub-session detection both run in degraded mode — the prompt text isn’t available to the rubric agents. Pure-counter metrics (tokens, turns, active time) are unaffected. Without OTEL_LOG_TOOL_DETAILS, Quality Retention file-path extraction breaks; sub-session efficiency still runs on timestamps and token counters.

The /prism:setup flow wires these flags up automatically.

Layer 1 — Measurement

Three parallel verticals, each measuring something structurally different:

Metrics — aggregations over sub-session or week. Iteration Efficiency (IE), Context Reset Rate (CRR), Flow Continuity (FC), Response Latency Ratio, Quality Retention, Weekly Token Usage.
Prompt Efficiency Score (PES) — per-prompt anchor. The Efficiency Rubric combines four LLM-judged dimensions — Context Leverage, Information Density, Turn Economy, Ambiguity Cost — into a single 0–10 score. Language-agnostic by construction: the agents judge semantic content, so prompts in any human language score the same for the same underlying behavior.
Sub-Session Efficiency Score (SSE) — ground truth. Weighted log-mean of four baseline ratios (tokens, turns, human think time, response latency), clipped per-axis and mapped to 0–10 with a B-centered asymmetric curve. One goal = one sub-session = one score.

The two scoring verticals form a calibration pair: the per-prompt score predicts, the sub-session score verifies. Target correlation is Pearson ≥ 0.65 on a rolling window.

Layer 2 — The Prism Score

Three developer-facing scores, derived from Layer 1 by pure arithmetic — no LLM judgment at this layer.

Score	Formula (summary)	Unit
Speed	`Σ(active_cli_seconds) / 3600`, discounted when Quality Retention < 85%	Hours/week
Skill	`100 × (0.45·SSE + 0.20·PES + 0.15·IE + 0.10·CRR + 0.10·FC)`	0–100
Efficiency	`Σ(tokens) / Σ(active_cli_hours)` — lower is better	Tokens/hour

See PRISM Scores for the full definitions and tier tables.

Insight — intelligence pipeline

Insight runs alongside the layers, not above them. It’s a multi-agent pipeline that turns raw prompts into the per-prompt score that feeds Layer 1:

Language detection + intent classification — classifies each prompt into one of seven intents (new_code, fix, refactor, question, meta, continuation, system_callback), with confidence.
Rubric routing — intents that describe actionable work are routed to an intent-specific rubric agent (authoring, debugging, planning). Intents like question, meta, continuation, and system_callback short-circuit and don’t get rubric-scored.
Rubric scoring — the rubric agent scores the four PES dimensions on a 0–10 scale.
Aggregation + confidence — the four dimensions combine into the PES score; a confidence value is stored alongside for audit.

Everything Insight produces is auditable end-to-end: each agent’s output is logged so a human can see why a prompt scored the way it did.

How it flows end-to-end

sequenceDiagram
    participant Dev as You
    participant Plugin as Prism Plugin
    participant Ingest as Ingest Service
    participant Engine as Prism Engine
    participant Dash as Dashboard

    Dev->>Plugin: Write a prompt
    Plugin-->>Dev: Single-line nudge (if needed)
    Plugin->>Ingest: Send OTLP + prompt text
    Ingest->>Engine: NATS publish
    Engine->>Engine: Parquet write → Insight → SSE → Skill
    Dev->>Dash: Open /prism
    Dash-->>Dev: Speed / Skill / Efficiency

Everything the dashboard shows is recomputable from S3 Parquet + Postgres — no scores live only in memory.

Authentication

All communication uses your gck_* API key:

The plugin includes your key on every request to Ingest
Ingest validates the key and associates data with your organization
Dashboard access uses Supabase login (email/password or OAuth)

Your key is stored locally in ~/.prism/config.json with restricted file permissions.

Two distinct `gck_*` keys

There are two kinds of gck_* keys in the system, and they do different jobs:

Key	Who holds it	What it’s used for
Plugin key	You, locally in `~/.prism/config.json`	Authenticates your Claude Code traffic when it’s redirected through the Optra gateway — the one you set with `/prism:setup`
Platform key	The Prism Engine, as `OPTRA_GATEWAY_KEY` in its environment	Authenticates server-side LLM calls for rubric scoring, session summaries, and the dashboard advisor

The two are not interchangeable. The plugin key is scoped to your developer identity and gateway governance; the platform key is a service credential the engine uses to call the Optra gateway on your behalf. Mixing them up is the usual cause of “gateway 401” errors during engine boot.