Analysis System Features Guide

Last Updated: 2026-03-04 Implemented by: Analysis System Master Plan (6 workstreams)

This document describes the analysis system features that were built, where they appear in the UI, and how to use them. For the strategic vision and architecture, see docs/reference/architecture/analysis-system.md.

Feature 1: Evidence Readiness Assessment (Two-Phase Analysis)

What it does: Before running an AI analysis, the system performs a fast evidence check that shows you what evidence is available. It completes in under 100ms with no AI cost.

Where it appears: In the Generate Analysis dialog on the Analysis page.

How to use it:

Navigate to your investigation's Analysis page
Click Generate Analysis
Select your analysis type (Question, Topic, Summary, Gap, or Error Check)
If applicable, select the specific question or topic
Click Generate Analysis — the evidence summary appears automatically
You'll see:
- Evidence Items: Total count of evidence in the investigation
- Linked to Question: How many evidence items are linked to the selected question (for question analyses)
- Background Docs: Count of background documents, with indicator of whether AI inclusion is enabled
- Source Type Breakdown: Distribution across content, attachments, and background docs
- Informational note (blue): Shown when no evidence is directly linked to the question — evidence will still be searched via semantic retrieval
- Blocking message (red): Shown only when zero evidence exists in the investigation
Click "Generate Analysis" to proceed

Why it matters: Gives investigators a quick snapshot of what evidence the AI will work with, without false precision from similarity scoring. The previous similarity-based readiness scoring was removed because abstract investigation questions ("Was there fraud?") consistently scored low against concrete evidence (witness statements, financial records), producing misleading "Insufficient" warnings that investigators learned to ignore.

API endpoint: POST /api/analysis/assess

Feature 2: Async Quality Checks (Faithfulness & Coverage)

What it does: After an analysis is generated, the system automatically runs two quality checks in the background:

Faithfulness Check: Verifies that claims in the analysis are supported by the evidence
Coverage Check: Verifies that the analysis addresses all aspects of the question/topic

These run asynchronously and don't block the initial analysis result.

Where it appears: The analysis detail view in the Analysis page — look for the quality metrics badge and expandable panel.

How it works:

Generate an analysis (any type) — the initial request returns immediately and generation runs in the background
The UI polls for completion (every 3 seconds, up to 5 minutes)
Once the analysis is generated, quality checks start running automatically (status transitions: complete → checking → verified)
Within ~30-60 seconds, quality scores appear:
- Quality Confidence Badge in the analysis list: "established", "probable", "possible", "insufficient"
- Quality Metrics Panel (expandable) showing:
  - Faithfulness score (0-100%)
  - Coverage score (0-100%)
  - Evidence coverage stats (items considered vs. total)
  - Retrieval quality stats
  - Validation status

Quality confidence scoring: The confidence level is computed primarily from faithfulness and coverage scores. Retrieval similarity acts as a softening cap (can lower the level by at most one step) but cannot drag the score to "insufficient" on its own. This means an analysis with strong faithfulness and coverage will show at least "possible" even with low retrieval similarity.

If quality checks fail: The analysis stays at "complete" status — quality never degrades. You can manually trigger a re-check via the quality-check endpoint.

API endpoints:

POST /api/analysis/{analysis_id}/quality-check — manually trigger quality checks
GET /api/analysis/{analysis_id}/quality-status — poll for quality check progress

Feature 3: Evidence Retrieval Transparency

What it does: Shows you exactly which evidence chunks the AI considered when generating an analysis, including similarity scores and whether each chunk was included or excluded.

Where it appears: In the analysis detail view, as an expandable "Evidence Considered" panel between "Evidence Cited" and "Analysis Feedback."

How to use it:

Generate an analysis, then click on it to expand
Look for the "Evidence Considered" section (loads on demand when you expand it)
You'll see each evidence chunk with:
- Rank: Order by relevance
- Source Title: Which evidence item it came from (resolved from parent evidence item title)
- Type Badge: "content", "attachment", or "background_doc"
- Included/Excluded Badge: Whether the chunk was used in the analysis context
- Similarity Bar: Color-coded relevance score:
  - Green (85%+): Highly relevant
  - Yellow (70-84%): Relevant
  - Orange (60-69%): Somewhat relevant
  - Gray (<60%): Low relevance
- Exclusion Reason: If excluded, explains why (e.g., "Below similarity threshold")
Summary stats show total chunks retrieved, included, and excluded

Why it matters: Answers the key trust question: "What evidence did the AI actually look at?" If you see important evidence was excluded, you may want to re-run with different parameters.

API endpoint: GET /api/analysis/{analysis_id}/retrieval

Feature 4: Prompt Evaluation Framework (Developer Tool)

What it does: A CLI tool for systematically testing prompt quality against fixtures. Detects regressions when prompts are changed.

Where it appears: Command line only — not in the UI.

How to use it:

# Run all fixtures in offline mode (no LLM judge, fast)
npx tsx scripts/evaluate-prompts.ts --offline

# Run with LLM-as-judge scoring (requires AI credits)
npx tsx scripts/evaluate-prompts.ts

# Filter by prompt type
npx tsx scripts/evaluate-prompts.ts --offline --prompt-type analysis_question

# Filter by category
npx tsx scripts/evaluate-prompts.ts --offline --category adversarial

# Compare against a baseline
npx tsx scripts/evaluate-prompts.ts --offline --compare __tests__/evaluation-results/baseline.json

# Save results with version label
npx tsx scripts/evaluate-prompts.ts --offline --version v2.1 --output baseline-v2.1.json

What it checks:

Structural metrics (no LLM needed): JSON parseability, schema validation, citation counts, confidence levels, evidence assessment counts
Content metrics (LLM judge): Relevance, completeness, accuracy, professional tone
Expectation checks: Per-fixture pass/fail based on expected outcomes
Regression detection: Compares two runs and flags any metric that dropped >5%

Test fixtures location: __tests__/fixtures/prompts/{basic,edge,adversarial}/

Feature 5: Engagement Gates + Conclusion-Based Feedback

What it does: Ensures investigators engage with the analysis before recording their judgment. Uses a gated flow that requires reviewing evidence before providing feedback.

Where it appears: In the analysis detail view, within the conclusion section (purple box).

Engagement gates (must be completed in order):

Gate A — Detail Expansion: Judgment controls remain disabled until the investigator expands the analysis detail section at least once. A detail_reviewed_at timestamp is persisted.
Gate B — Citation Spot-Check: After Gate A, judgment remains disabled until the investigator opens at least one citation in the Evidence Side Panel. A citation_checked_at timestamp is persisted.

Gate progress survives page refresh and session changes.

How to use it:

Open an analysis in the detail view
Expand the detail section (Gate A unlocks)
Click on at least one cited evidence item to view it (Gate B unlocks)
The conclusion section now shows three options:
- Agree — marks this analysis as trustworthy (maps to accepted)
- Disagree — prompts for a reason (maps to rejected)
- Unsure — prompts "What would help you decide?" and pre-populates regeneration feedback (maps to needs_revision)
To regenerate: Select a reason from the dropdown (Inaccurate conclusions, Missing evidence, Too vague, Wrong focus, Other), add direction text, click "Regenerate"
Each action is tracked with a timestamp and shown as a badge in the analysis list

Feedback status badges appear in the analysis list:

Green "Agreed"
Red "Disagreed"
Blue "Regenerated"
Amber "Edited"
Purple "Needs Revision"
Gray "Viewed"

Metrics dashboard: GET /api/analysis/feedback-metrics?investigation_id={id} returns:

Acceptance/rejection/regeneration/edit rates
Breakdown by analysis type
Top regeneration reasons
Weekly trend (8 weeks)

API endpoints:

POST /api/analysis/{analysis_id}/feedback — record an action
GET /api/analysis/{analysis_id}/feedback — get feedback history

Feature 6: Prompt Editor Version History, Diff & Rollback

What it does: The admin prompt editor now includes full version management — view any historical version of a prompt template, compare it side-by-side with the current version, and roll back to a previous version.

Where it appears: Admin → Prompt Templates → Version History panel for any prompt.

How to use it:

Navigate to Admin → Prompt Templates
Select a prompt template
The Version History list shows up to 20 versions
View: Click any version number to see its complete system prompt and user prompt template
Diff: Click "Compare with Current" to see a side-by-side diff with red/green line-level highlighting
Rollback: Click "Revert to vN" and confirm. This creates a new version with the historical content (non-destructive — the rollback itself is tracked in history)

API endpoints:

GET /api/admin/prompts/history?prompt_type=...&version=N — retrieve a historical version
POST /api/admin/prompts/history — execute rollback (creates new version with historical content)

Feature 7: AI Provider Routing (Admin Toggle)

What it does: Allows switching between AWS Bedrock and Anthropic Direct API for AI operations via an admin toggle. No redeployment required — the setting is database-backed with a 30-second cache TTL.

Where it appears: Admin → Settings.

How to use it:

Navigate to Admin → Settings
The current AI provider is shown as a card ("AWS Bedrock" or "Anthropic Direct")
Click to toggle between providers
The switch takes effect within 30 seconds for all new AI operations
If the ANTHROPIC_API_KEY environment variable is not set, the toggle to Anthropic will show a validation error

Why it exists: Bedrock quota limits can bottleneck throughput (e.g., 2 RPM on Sonnet). The Anthropic direct API offers higher rate limits. This toggle allows switching without code changes or redeployment.

Technical details:

Separate rate limiters per provider (Bedrock: 1 concurrency, Anthropic: 5 concurrency)
All changes are audit-logged
Setting stored in app_settings table as key/value JSONB

API endpoint: GET/PUT /api/admin/settings

Feature 8: Future Output Types (Design Only)

What was created: Design specifications and test fixtures for 4 future analysis types. These are NOT implemented yet — they're ready for when the team decides to build them.

Future types:

Type	Purpose	Use Case
Comparative Analysis	Compare evidence across subjects/time periods	Procurement investigations with multiple vendors
Timeline Analysis	Reconstruct chronological events	Cybersecurity incidents, fraud timelines
Witness Credibility	Assess testimonial consistency	Interview-heavy investigations
Risk Assessment	Prioritize issues by impact/likelihood	Compliance audits, HIPAA reviews

Documentation: docs/reference/future-analysis-types.md Implementation guide: docs/reference/adding-analysis-types.md Test fixtures: __tests__/fixtures/prompts/future/f001-f004

Testing Checklist

Use the "Dr. Marcus Chen - Professional Conduct Review" or "The Disappearance of Mr. Davenheim" investigation to test these features.

Feature 1: Evidence Assessment

Open Analysis page → click "Generate Analysis"
Select "Question Analysis" → pick a question
Click "Generate Analysis" — evidence count summary should appear
Verify: evidence items count, linked to question count, background docs count
Verify: blue info note appears if no evidence is directly linked
Click "Generate Analysis" to proceed
Repeat for Summary type (no question/topic selection needed)

Feature 2: Quality Checks

Generate an analysis (any type) — should return immediately and poll for completion
Wait for analysis to complete (~2-3 minutes), then quality checks run
Look for quality confidence badge in the analysis list (established/probable/possible)
Click analysis → check for Quality Metrics Panel (expandable) with faithfulness, coverage, evidence coverage

Feature 3: Retrieval Transparency

Click on any completed analysis
Look for "Evidence Considered" expandable section
Verify: chunk list with source titles (not "Unknown Source"), similarity bars, included/excluded badges

Feature 5: Engagement Gates + Feedback

Click on a completed analysis
Verify judgment buttons are disabled
Expand the detail section (Gate A)
Click a cited evidence item to open it (Gate B)
Verify Agree/Disagree/Unsure buttons are now enabled in the conclusion section
Click "Agree" — verify green badge appears
On another analysis, click "Disagree" — verify prompt for reason, red badge
Try "Unsure" — verify it pre-populates regeneration feedback

Feature 6: Prompt Version History

Admin → Prompt Templates → select a prompt with multiple versions
Click a version number — verify full content view
Click "Compare with Current" — verify diff with red/green highlighting
Click "Revert to vN" — verify new version created with historical content

Feature 7: AI Provider Toggle

Admin → Settings → verify current provider card shown
Toggle provider → run analysis → verify it completes
Toggle back → run analysis → verify still works

Key Files Reference

File	Purpose
`app/api/analysis/assess/route.ts`	Evidence count assessment endpoint
`app/inquiries/.../evidence-assessment.tsx`	Assessment UI component
`lib/ai/quality/run-quality-checks.ts`	Async quality check orchestrator
`lib/ai/quality/confidence-calculator.ts`	Quality confidence scoring algorithm
`app/api/analysis/{id}/quality-check/route.ts`	Manual quality check trigger
`app/api/analysis/{id}/quality-status/route.ts`	Quality check polling
`app/api/analysis/{id}/retrieval/route.ts`	Retrieval transparency data
`app/inquiries/.../evidence-retrieval-panel.tsx`	Retrieval UI component
`lib/ai/evaluation/`	Prompt evaluation framework
`scripts/evaluate-prompts.ts`	Evaluation CLI runner
`app/api/analysis/{id}/feedback/route.ts`	User feedback tracking
`hooks/use-analysis-feedback.ts`	React feedback hook
`app/api/admin/prompts/history/route.ts`	Prompt version history & rollback
`app/admin/prompts/prompt-editor.tsx`	Prompt editor with diff/rollback UI
`lib/ai/client.ts`	Dual AI provider routing
`lib/settings/index.ts`	App settings DAL (provider toggle)
`app/api/admin/settings/route.ts`	Admin settings API
`app/admin/settings/page.tsx`	AI provider toggle UI
`components/file-viewer.tsx`	In-app file viewer component
`app/api/storage/view/route.ts`	File viewer S3 proxy endpoint
`app/api/background/extract-text/route.ts`	Background doc text extraction

Feature 1: Evidence Readiness Assessment (Two-Phase Analysis)​

Feature 2: Async Quality Checks (Faithfulness & Coverage)​

Feature 3: Evidence Retrieval Transparency​

Feature 4: Prompt Evaluation Framework (Developer Tool)​

Feature 5: Engagement Gates + Conclusion-Based Feedback​

Feature 6: Prompt Editor Version History, Diff & Rollback​

Feature 7: AI Provider Routing (Admin Toggle)​

Feature 8: Future Output Types (Design Only)​

Testing Checklist​

Feature 1: Evidence Assessment​

Feature 2: Quality Checks​

Feature 3: Retrieval Transparency​

Feature 5: Engagement Gates + Feedback​

Feature 6: Prompt Version History​

Feature 7: AI Provider Toggle​

Key Files Reference​

Feature 1: Evidence Readiness Assessment (Two-Phase Analysis)

Feature 2: Async Quality Checks (Faithfulness & Coverage)

Feature 3: Evidence Retrieval Transparency

Feature 4: Prompt Evaluation Framework (Developer Tool)

Feature 5: Engagement Gates + Conclusion-Based Feedback

Feature 6: Prompt Editor Version History, Diff & Rollback

Feature 7: AI Provider Routing (Admin Toggle)

Feature 8: Future Output Types (Design Only)

Testing Checklist

Feature 1: Evidence Assessment

Feature 2: Quality Checks

Feature 3: Retrieval Transparency

Feature 5: Engagement Gates + Feedback

Feature 6: Prompt Version History

Feature 7: AI Provider Toggle

Key Files Reference