Security Remediation Plan

Created: 2026-01-31 Status: In Progress Target Completion: Before Production Launch

This document tracks security and reliability issues identified in the 2026-01-31 senior engineer audit. Claude Code must read this file at the start of every session and continue work on incomplete items.

Session Continuity Protocol

At the start of each session:

Read this file
Check the "Current Sprint" section for in-progress work
Continue where the previous session left off
Update checkboxes and dates as work completes

Phase 1: Blocking Issues (Week 1)

These issues must be fixed before any other feature work.

1.1 Storage Route Authorization

Risk: HIGH - Authenticated users can access files from other organizations Files:

app/api/storage/download/route.ts
app/api/storage/download-url/route.ts
app/api/storage/delete/route.ts
app/api/storage/upload/route.ts

Required Changes:

Extract organization_id from file path
Verify user has org membership via hasOrgRole()
Return 403 if not authorized
Add audit logging for access attempts
Add tests for authorization checks

Verification:

# Run storage authorization tests
npm test -- storage-authorization.test
# Expected: 22 passed, 9 skipped (FormData tests skipped due to test env)

Completed: [x] Date: 2026-01-31

1.2 Audit Logging Resilience

Risk: HIGH - FedRAMP non-compliance if audit writes fail silently File: lib/shared/audit.ts

Current Behavior: Returns false on failure, callers ignore return value Required Behavior: Failures must be visible and recoverable

Required Changes:

Option A: Throw on audit failure (breaks user operation)
Option B: Queue failed audits to dead letter table for retry
Option C: Alert on failure but don't block operation
Decision: Option C with enhancements (tracking, metrics, recovery logging)
Implement chosen approach
Add in-memory failure tracking with getAuditHealth() function
Add CloudWatch alarm for audit failures (post-launch)
Add strict mode option for critical operations
Add admin endpoint /api/admin/audit-health for monitoring

Implementation Details:

Failures tracked in memory with count and recent failure history
Console logs prefixed with [AUDIT FAILURE] include full entry JSON for recovery
getAuditHealth() returns failure count and recent failures
Admin endpoint at /api/admin/audit-health for monitoring
Optional strict mode: logAudit(entry, { strict: true }) throws on failure

Verification:

Audit failures are tracked and logged with full details
Check CloudWatch alarm fires (requires CloudWatch setup)

Completed: [x] Date: 2026-01-31

1.3 Redis Rate Limiting

Risk: HIGH - Current in-memory rate limiting resets on cold start File: lib/shared/rate-limit.ts

Status: CODE COMPLETE - AWAITING INFRASTRUCTURE

Infrastructure Required:

AWS ElastiCache Redis cluster in same VPC as Amplify
Security group allowing Amplify → ElastiCache access
VPC configuration for Amplify to access private subnets
Terraform updates for ElastiCache provisioning

Code Changes:

Install ioredis package
Create lib/shared/redis.ts client factory
Update checkRateLimit() to use Redis INCR + EXPIRE
Add checkRateLimitAsync() for Redis support
Add graceful fallback to in-memory if Redis unavailable

Terraform Tasks:

Add ElastiCache module (infrastructure/terraform/modules/elasticache/)
Add security group for ElastiCache
Add ElastiCache to dev environment

To Deploy:

cd infrastructure/terraform/environments/dev
terraform init
terraform plan -var="redis_auth_token=YOUR_SECURE_TOKEN"
terraform apply -var="redis_auth_token=YOUR_SECURE_TOKEN"

Environment Variables (after Terraform apply):

REDIS_URL=rediss://<endpoint>:6379
REDIS_AUTH_TOKEN=<your_auth_token>

Verification:

# Hit rate limit, restart server, verify limit still enforced

Completed: [x] Date: 2026-01-31 (code + Terraform, pending infrastructure deploy)

1.4 Fix Failing Tests

Risk: MEDIUM - Indicates test maintenance not keeping pace File: __tests__/integration/api/billing.test.ts

Issues:

createCheckoutSession mock expects 4 params, implementation passes 5
teamMembers undefined access in free plan test

Required Changes:

Update mock to include plan parameter
Fix undefined access in free plan test
Run full test suite, verify all passing
Mock server-only package in vitest.setup.ts

Verification:

npm test
# Result: 217 passed, 9 skipped, 166 todo

Completed: [x] Date: 2026-01-31

Phase 2: Technical Enforcement (Week 2)

Automated safeguards to prevent future issues.

2.1 API Route Wrapper

Purpose: Make secure patterns the default File to create: lib/api/route-wrapper.ts

Features:

Automatic authentication check
Organization context extraction
Consistent error handling
Automatic audit logging for failures
Request timing/logging

Implementation:

// Pattern for all routes to use
export function createProtectedRoute(
  handler: ProtectedRouteHandler,
  options?: { requiredRole?: OrgRole; rateLimit?: RateLimitKey },
): RouteHandler;

Migration:

Create wrapper
Migrate 5 highest-risk routes first (storage/*)
Migrate remaining routes incrementally
Add CI check that all routes use wrapper (Phase 2.3)

Completed: [x] Date: 2026-01-31 (wrapper + storage routes)

2.2 Semgrep Security Scanning

Purpose: Catch security issues automatically in CI File: .github/workflows/ci.yml

Required Changes:

Add Semgrep step to CI workflow
Configure rules for:
- Missing auth checks
- Unvalidated input
- SQL injection patterns
- Hardcoded secrets
Run initial scan, fix any findings (requires SEMGREP_APP_TOKEN)
Block PRs on security findings (job fails on findings)

Configuration:

- name: Security scan
  uses: returntocorp/semgrep-action@v1
  with:
    config: >-
      p/typescript
      p/security-audit
      p/owasp-top-ten
      p/secrets

Completed: [x] Date: 2026-01-31 (CI config added, needs token for full scan)

2.3 CI Enforcement Gates

Purpose: Automated quality gates File: .github/workflows/ci.yml

Required Changes:

Test failure blocks merge (verified - test job fails CI)
Add coverage threshold (70% minimum - warns, enforcement pending)
Add route wrapper check (commented out, enable when more routes migrated)
Configure Amplify to only deploy on CI success (infrastructure task)

Completed: [x] Date: 2026-01-31 (coverage check added with warning)

2.4 Error Handling Standardization

Purpose: Consistent error responses across all routes File: lib/api/errors.ts

Standard Format: { error: { code: "ERROR_CODE", message: "Human readable message" } }

Required Changes:

Audit all routes for error handling patterns
Create migration checklist of routes to update
Update routes to use standardized errors
Add error codes to all responses for debugging

Routes Migrated:

Storage routes (using route wrapper)
Billing routes (checkout, portal, usage)
Auth routes (mfa/challenge, webauthn/credentials)
Analysis/AI routes (analysis/generate, ai-agent/chat, reports/generate-section)
Search route

All routes migrated: Error library now used across all API routes (storage, billing, admin, evidence, db/query, db/mutate, etc.)

Completed: [x] Date: 2026-01-31 (initial), 2026-02-04 (all routes)

Phase 3: Testing Debt (Weeks 3-4)

Critical path coverage.

3.1 Integration Test Infrastructure

Purpose: Test real service interactions Requirement: Separate test database, test S3 bucket

Required Changes:

Create test environment configuration
Set up test database (can be same RDS, different schema)
Set up test S3 bucket
Create test user in Cognito (test@nquiry.ai exists)
Add integration test npm script (npm run test:integration)
Document test environment setup (docs/test-environment-setup.md)

Completed: [x] Date: 2026-01-31 (partial - docs and config complete, infra pending)

3.2 Critical Path Integration Tests

One test per week until launch:

Week	Critical Path	Status	File
1	[x] User signup → org creation → first investigation	COMPLETE	`signup-to-investigation.test.ts`
2	[x] File upload → download → delete (with auth checks)	COMPLETE	`file-lifecycle.test.ts`
3	[x] Trial signup → checkout → subscription active	COMPLETE	`trial-to-subscription.test.ts`
4	[x] MFA setup → login with MFA challenge	COMPLETE	`mfa-flow.test.ts`
5	[x] AI analysis generation → quota tracking	COMPLETE	`ai-analysis-quota.test.ts`
6	[x] Account deletion (GDPR) with cascade verification	COMPLETE	`gdpr-deletion.test.ts`
7	[x] Data export (GDPR) completeness	COMPLETE	`gdpr-export.test.ts`
8	[x] Organization invitation → accept → access granted	COMPLETE	`org-invitation.test.ts`

Test results: __tests__/integration/critical-paths/ (140 tests passing, 4 skipped)

Completed: [x] Date: 2026-01-31

3.3 E2E Tests in CI

Purpose: Run Playwright tests automatically File: .github/workflows/ci.yml

Required Changes:

Add E2E test credentials to GitHub Secrets (E2E_TEST_EMAIL, E2E_TEST_PASSWORD)
Add E2E test stage to CI workflow
Configure to run on main branch merges (not every PR - too slow)

CI Job Details:

Runs after lint-and-build and test jobs pass
Only triggers on push to main branch
Installs Playwright browsers
Uploads Playwright report as artifact on failure

Completed: [x] Date: 2026-01-31 (CI config added, secrets needed)

Phase 4: Process & Documentation (Ongoing)

4.1 CLAUDE.md Updates

Required Additions:

Reference this remediation plan in session protocol
Add API route requirements checklist
Add non-negotiable security requirements
Add "before marking complete" verification steps

Completed: [x] Date: 2026-01-31

4.2 Production Blockers Tracking

File: docs/production-blockers.md

Purpose: Track known technical debt that must be resolved before launch

Current Status:

0 open blockers
8 resolved blockers (PB-001 through PB-008)

Rule: Any TODO comment or known limitation gets added here immediately.

Completed: [x] Date: 2026-01-31

4.3 Monthly Security Review Schedule

File: docs/admin/security/review-schedule.md

Week	Focus Area	Checklist
1st of month	Auth & Authorization	All routes have auth, org checks enforced
2nd of month	Error Handling	Failures logged, no silent swallowing
3rd of month	Test Coverage	Coverage stable, critical paths tested
4th of month	Compliance	Audit logs complete, GDPR flows working

Also includes: Quarterly deep dive checklists, escalation procedures, review log

Completed: [x] Date: 2026-01-31

Phase 5: Pre-Launch Gate

5.1 Pre-Launch Checklist

File: docs/pre-launch-checklist.md

Comprehensive checklist covering:

Security (6 sections):

Authentication & Authorization checklist
Infrastructure Security checklist
Code Security checklist

Testing (2 sections):

Test Suite Health checklist
Critical Path Coverage tracking (8 paths)

Compliance (3 sections):

Audit Logging verification
GDPR compliance checklist
FedRAMP Readiness checklist

Operations (3 sections):

Monitoring & Alerting checklist
Backup & Recovery checklist
Deployment checklist

Documentation tracking + Final Sign-Off section

Completed: [x] Date: 2026-01-31 (checklist created, items pending verification)

5.2 External Security Review

Tracked in: docs/pre-launch-checklist.md Section 6

Before production launch:

Note: External security review is required before FedRAMP pursuit.

Completed: [ ] Date: **__** (requires vendor engagement)

Current Sprint

Active Work (update each session):

Task	Started	Assignee	Status
1.4 Fix failing tests	2026-01-31	Claude	COMPLETE (217 passing)
1.1 Storage route auth	2026-01-31	Claude	COMPLETE (with tests)
1.2 Audit resilience	2026-01-31	Claude	COMPLETE
1.3 Redis rate limiting	2026-01-31	Claude	COMPLETE (needs deploy)
2.1 API Route Wrapper	2026-01-31	Claude	COMPLETE
2.2 Semgrep scanning	2026-01-31	Claude	COMPLETE (CI config)
2.3 CI gates	2026-01-31	Claude	COMPLETE (coverage)
3.1 Test infrastructure	2026-01-31	Claude	COMPLETE (docs/config)
3.2 Critical path tests	2026-01-31	Claude	COMPLETE (140 tests)
3.3 E2E tests in CI	2026-01-31	Claude	COMPLETE (needs secrets)
4.1 CLAUDE.md updates	2026-01-31	Claude	COMPLETE
4.2 Production blockers	2026-01-31	Claude	COMPLETE
4.3 Security review sched	2026-01-31	Claude	COMPLETE
5.1 Pre-launch checklist	2026-01-31	Claude	COMPLETE (doc created)
5.2 External security	-	Joe	PENDING (vendor needed)
SEC-001 Encryption key	2026-02-04	Claude	COMPLETE (Secrets Mgr)
SEC-003 Audit alarms	2026-02-04	Claude	COMPLETE (CloudWatch)
SEC-004 CSRF protection	2026-02-04	Claude	COMPLETE (middleware)
SEC-005 Upload validation	2026-02-04	Claude	COMPLETE (MIME+magic)
SEC-006 Security headers	2026-02-04	Claude	COMPLETE (next.config.ts)
Security documentation	2026-02-04	Claude	COMPLETE (docs/security)
Bedrock Guardrails	2026-02-04	Claude	COMPLETE (lib/ai/guardrails.ts + terraform)
HIPAA risk assessment	2026-02-04	Claude	COMPLETE (docs/admin/security/)
Asset inventory	2026-02-04	Claude	COMPLETE (docs/admin/security/)
BAA template	2026-02-04	Claude	COMPLETE (docs/admin/legal/)
E2E happy path tests	2026-02-04	Claude	COMPLETE (tests/e2e/happy-path.spec.ts)
Load testing baseline	2026-02-04	Claude	COMPLETE (tests/load/)
Evidence route org auth	2026-02-06	Claude	COMPLETE (hasOrgRole + audit logging)
NQU-166 Tenant isolation	2026-02-13	Claude	COMPLETE (all 3 batches, 12 new tests)

Last Updated: 2026-02-14 Last Session Summary: NQU-179 Part 1: Enabled ECS Exec on dev ECS service (enable_execute_command + SSM Messages IAM policy). Unblocks running diagnostic scripts against RDS from inside VPC. Also fixed tsconfig excluding scripts/dump-prompts.ts (missing supabase dep breaking type-check). 1993 tests passing.

Verification Log

Record of completed verifications:

Date	Item	Verified By	Method	Result

Notes

This plan was generated from a comprehensive security audit on 2026-01-31
All blocking issues (Phase 1) must be complete before feature work resumes
Phase 2-4 can proceed in parallel with careful prioritization
External security review is non-negotiable before FedRAMP pursuit

Session Continuity Protocol​

Phase 1: Blocking Issues (Week 1)​

1.1 Storage Route Authorization​

1.2 Audit Logging Resilience​

1.3 Redis Rate Limiting​

1.4 Fix Failing Tests​

Phase 2: Technical Enforcement (Week 2)​

2.1 API Route Wrapper​

2.2 Semgrep Security Scanning​

2.3 CI Enforcement Gates​

2.4 Error Handling Standardization​

Phase 3: Testing Debt (Weeks 3-4)​

3.1 Integration Test Infrastructure​

3.2 Critical Path Integration Tests​

3.3 E2E Tests in CI​

Phase 4: Process & Documentation (Ongoing)​

4.1 CLAUDE.md Updates​

4.2 Production Blockers Tracking​

4.3 Monthly Security Review Schedule​

Phase 5: Pre-Launch Gate​

5.1 Pre-Launch Checklist​

5.2 External Security Review​

Current Sprint​

Verification Log​

Notes​

Session Continuity Protocol

Phase 1: Blocking Issues (Week 1)

1.1 Storage Route Authorization

1.2 Audit Logging Resilience

1.3 Redis Rate Limiting

1.4 Fix Failing Tests

Phase 2: Technical Enforcement (Week 2)

2.1 API Route Wrapper

2.2 Semgrep Security Scanning

2.3 CI Enforcement Gates

2.4 Error Handling Standardization

Phase 3: Testing Debt (Weeks 3-4)

3.1 Integration Test Infrastructure

3.2 Critical Path Integration Tests

3.3 E2E Tests in CI

Phase 4: Process & Documentation (Ongoing)

4.1 CLAUDE.md Updates

4.2 Production Blockers Tracking

4.3 Monthly Security Review Schedule

Phase 5: Pre-Launch Gate

5.1 Pre-Launch Checklist

5.2 External Security Review

Current Sprint

Verification Log

Notes