Skip to main content

Security Remediation Plan

Created: 2026-01-31 Status: In Progress Target Completion: Before Production Launch

This document tracks security and reliability issues identified in the 2026-01-31 senior engineer audit. Claude Code must read this file at the start of every session and continue work on incomplete items.


Session Continuity Protocol

At the start of each session:

  1. Read this file
  2. Check the "Current Sprint" section for in-progress work
  3. Continue where the previous session left off
  4. Update checkboxes and dates as work completes

Phase 1: Blocking Issues (Week 1)

These issues must be fixed before any other feature work.

1.1 Storage Route Authorization

Risk: HIGH - Authenticated users can access files from other organizations Files:

  • app/api/storage/download/route.ts
  • app/api/storage/download-url/route.ts
  • app/api/storage/delete/route.ts
  • app/api/storage/upload/route.ts

Required Changes:

  • Extract organization_id from file path
  • Verify user has org membership via hasOrgRole()
  • Return 403 if not authorized
  • Add audit logging for access attempts
  • Add tests for authorization checks

Verification:

# Run storage authorization tests
npm test -- storage-authorization.test
# Expected: 22 passed, 9 skipped (FormData tests skipped due to test env)

Completed: [x] Date: 2026-01-31


1.2 Audit Logging Resilience

Risk: HIGH - FedRAMP non-compliance if audit writes fail silently File: lib/shared/audit.ts

Current Behavior: Returns false on failure, callers ignore return value Required Behavior: Failures must be visible and recoverable

Required Changes:

  • Option A: Throw on audit failure (breaks user operation)
  • Option B: Queue failed audits to dead letter table for retry
  • Option C: Alert on failure but don't block operation
  • Decision: Option C with enhancements (tracking, metrics, recovery logging)
  • Implement chosen approach
  • Add in-memory failure tracking with getAuditHealth() function
  • Add CloudWatch alarm for audit failures (post-launch)
  • Add strict mode option for critical operations
  • Add admin endpoint /api/admin/audit-health for monitoring

Implementation Details:

  • Failures tracked in memory with count and recent failure history
  • Console logs prefixed with [AUDIT FAILURE] include full entry JSON for recovery
  • getAuditHealth() returns failure count and recent failures
  • Admin endpoint at /api/admin/audit-health for monitoring
  • Optional strict mode: logAudit(entry, { strict: true }) throws on failure

Verification:

  • Audit failures are tracked and logged with full details
  • Check CloudWatch alarm fires (requires CloudWatch setup)

Completed: [x] Date: 2026-01-31


1.3 Redis Rate Limiting

Risk: HIGH - Current in-memory rate limiting resets on cold start File: lib/shared/rate-limit.ts

Status: CODE COMPLETE - AWAITING INFRASTRUCTURE

Infrastructure Required:

  1. AWS ElastiCache Redis cluster in same VPC as Amplify
  2. Security group allowing Amplify → ElastiCache access
  3. VPC configuration for Amplify to access private subnets
  4. Terraform updates for ElastiCache provisioning

Code Changes:

  • Install ioredis package
  • Create lib/shared/redis.ts client factory
  • Update checkRateLimit() to use Redis INCR + EXPIRE
  • Add checkRateLimitAsync() for Redis support
  • Add graceful fallback to in-memory if Redis unavailable

Terraform Tasks:

  • Add ElastiCache module (infrastructure/terraform/modules/elasticache/)
  • Add security group for ElastiCache
  • Add ElastiCache to dev environment

To Deploy:

cd infrastructure/terraform/environments/dev
terraform init
terraform plan -var="redis_auth_token=YOUR_SECURE_TOKEN"
terraform apply -var="redis_auth_token=YOUR_SECURE_TOKEN"

Environment Variables (after Terraform apply):

REDIS_URL=rediss://<endpoint>:6379
REDIS_AUTH_TOKEN=<your_auth_token>

Verification:

# Hit rate limit, restart server, verify limit still enforced

Completed: [x] Date: 2026-01-31 (code + Terraform, pending infrastructure deploy)


1.4 Fix Failing Tests

Risk: MEDIUM - Indicates test maintenance not keeping pace File: __tests__/integration/api/billing.test.ts

Issues:

  • createCheckoutSession mock expects 4 params, implementation passes 5
  • teamMembers undefined access in free plan test

Required Changes:

  • Update mock to include plan parameter
  • Fix undefined access in free plan test
  • Run full test suite, verify all passing
  • Mock server-only package in vitest.setup.ts

Verification:

npm test
# Result: 217 passed, 9 skipped, 166 todo

Completed: [x] Date: 2026-01-31


Phase 2: Technical Enforcement (Week 2)

Automated safeguards to prevent future issues.

2.1 API Route Wrapper

Purpose: Make secure patterns the default File to create: lib/api/route-wrapper.ts

Features:

  • Automatic authentication check
  • Organization context extraction
  • Consistent error handling
  • Automatic audit logging for failures
  • Request timing/logging

Implementation:

// Pattern for all routes to use
export function createProtectedRoute(
handler: ProtectedRouteHandler,
options?: { requiredRole?: OrgRole; rateLimit?: RateLimitKey },
): RouteHandler;

Migration:

  • Create wrapper
  • Migrate 5 highest-risk routes first (storage/*)
  • Migrate remaining routes incrementally
  • Add CI check that all routes use wrapper (Phase 2.3)

Completed: [x] Date: 2026-01-31 (wrapper + storage routes)


2.2 Semgrep Security Scanning

Purpose: Catch security issues automatically in CI File: .github/workflows/ci.yml

Required Changes:

  • Add Semgrep step to CI workflow
  • Configure rules for:
    • Missing auth checks
    • Unvalidated input
    • SQL injection patterns
    • Hardcoded secrets
  • Run initial scan, fix any findings (requires SEMGREP_APP_TOKEN)
  • Block PRs on security findings (job fails on findings)

Configuration:

- name: Security scan
uses: returntocorp/semgrep-action@v1
with:
config: >-
p/typescript
p/security-audit
p/owasp-top-ten
p/secrets

Completed: [x] Date: 2026-01-31 (CI config added, needs token for full scan)


2.3 CI Enforcement Gates

Purpose: Automated quality gates File: .github/workflows/ci.yml

Required Changes:

  • Test failure blocks merge (verified - test job fails CI)
  • Add coverage threshold (70% minimum - warns, enforcement pending)
  • Add route wrapper check (commented out, enable when more routes migrated)
  • Configure Amplify to only deploy on CI success (infrastructure task)

Completed: [x] Date: 2026-01-31 (coverage check added with warning)


2.4 Error Handling Standardization

Purpose: Consistent error responses across all routes File: lib/api/errors.ts

Standard Format: { error: { code: "ERROR_CODE", message: "Human readable message" } }

Required Changes:

  • Audit all routes for error handling patterns
  • Create migration checklist of routes to update
  • Update routes to use standardized errors
  • Add error codes to all responses for debugging

Routes Migrated:

  1. Storage routes (using route wrapper)
  2. Billing routes (checkout, portal, usage)
  3. Auth routes (mfa/challenge, webauthn/credentials)
  4. Analysis/AI routes (analysis/generate, ai-agent/chat, reports/generate-section)
  5. Search route

All routes migrated: Error library now used across all API routes (storage, billing, admin, evidence, db/query, db/mutate, etc.)

Completed: [x] Date: 2026-01-31 (initial), 2026-02-04 (all routes)


Phase 3: Testing Debt (Weeks 3-4)

Critical path coverage.

3.1 Integration Test Infrastructure

Purpose: Test real service interactions Requirement: Separate test database, test S3 bucket

Required Changes:

  • Create test environment configuration
  • Set up test database (can be same RDS, different schema)
  • Set up test S3 bucket
  • Create test user in Cognito (test@nquiry.ai exists)
  • Add integration test npm script (npm run test:integration)
  • Document test environment setup (docs/test-environment-setup.md)

Completed: [x] Date: 2026-01-31 (partial - docs and config complete, infra pending)


3.2 Critical Path Integration Tests

One test per week until launch:

WeekCritical PathStatusFile
1[x] User signup → org creation → first investigationCOMPLETEsignup-to-investigation.test.ts
2[x] File upload → download → delete (with auth checks)COMPLETEfile-lifecycle.test.ts
3[x] Trial signup → checkout → subscription activeCOMPLETEtrial-to-subscription.test.ts
4[x] MFA setup → login with MFA challengeCOMPLETEmfa-flow.test.ts
5[x] AI analysis generation → quota trackingCOMPLETEai-analysis-quota.test.ts
6[x] Account deletion (GDPR) with cascade verificationCOMPLETEgdpr-deletion.test.ts
7[x] Data export (GDPR) completenessCOMPLETEgdpr-export.test.ts
8[x] Organization invitation → accept → access grantedCOMPLETEorg-invitation.test.ts

Test results: __tests__/integration/critical-paths/ (140 tests passing, 4 skipped)

Completed: [x] Date: 2026-01-31


3.3 E2E Tests in CI

Purpose: Run Playwright tests automatically File: .github/workflows/ci.yml

Required Changes:

  • Add E2E test credentials to GitHub Secrets (E2E_TEST_EMAIL, E2E_TEST_PASSWORD)
  • Add E2E test stage to CI workflow
  • Configure to run on main branch merges (not every PR - too slow)

CI Job Details:

  • Runs after lint-and-build and test jobs pass
  • Only triggers on push to main branch
  • Installs Playwright browsers
  • Uploads Playwright report as artifact on failure

Completed: [x] Date: 2026-01-31 (CI config added, secrets needed)


Phase 4: Process & Documentation (Ongoing)

4.1 CLAUDE.md Updates

Required Additions:

  • Reference this remediation plan in session protocol
  • Add API route requirements checklist
  • Add non-negotiable security requirements
  • Add "before marking complete" verification steps

Completed: [x] Date: 2026-01-31


4.2 Production Blockers Tracking

File: docs/production-blockers.md

Purpose: Track known technical debt that must be resolved before launch

Current Status:

  • 0 open blockers
  • 8 resolved blockers (PB-001 through PB-008)

Rule: Any TODO comment or known limitation gets added here immediately.

Completed: [x] Date: 2026-01-31


4.3 Monthly Security Review Schedule

File: docs/admin/security/review-schedule.md

WeekFocus AreaChecklist
1st of monthAuth & AuthorizationAll routes have auth, org checks enforced
2nd of monthError HandlingFailures logged, no silent swallowing
3rd of monthTest CoverageCoverage stable, critical paths tested
4th of monthComplianceAudit logs complete, GDPR flows working

Also includes: Quarterly deep dive checklists, escalation procedures, review log

Completed: [x] Date: 2026-01-31


Phase 5: Pre-Launch Gate

5.1 Pre-Launch Checklist

File: docs/pre-launch-checklist.md

Comprehensive checklist covering:

Security (6 sections):

  • Authentication & Authorization checklist
  • Infrastructure Security checklist
  • Code Security checklist

Testing (2 sections):

  • Test Suite Health checklist
  • Critical Path Coverage tracking (8 paths)

Compliance (3 sections):

  • Audit Logging verification
  • GDPR compliance checklist
  • FedRAMP Readiness checklist

Operations (3 sections):

  • Monitoring & Alerting checklist
  • Backup & Recovery checklist
  • Deployment checklist

Documentation tracking + Final Sign-Off section

Completed: [x] Date: 2026-01-31 (checklist created, items pending verification)


5.2 External Security Review

Tracked in: docs/pre-launch-checklist.md Section 6

Before production launch:

  • Budget approved for penetration test
  • Vendor selected
  • Scope defined (Web app, API, AWS infrastructure)
  • Test scheduled
  • Test completed
  • Critical/high findings remediated
  • Re-test passed

Note: External security review is required before FedRAMP pursuit.

Completed: [ ] Date: **__** (requires vendor engagement)


Current Sprint

Active Work (update each session):

TaskStartedAssigneeStatus
1.4 Fix failing tests2026-01-31ClaudeCOMPLETE (217 passing)
1.1 Storage route auth2026-01-31ClaudeCOMPLETE (with tests)
1.2 Audit resilience2026-01-31ClaudeCOMPLETE
1.3 Redis rate limiting2026-01-31ClaudeCOMPLETE (needs deploy)
2.1 API Route Wrapper2026-01-31ClaudeCOMPLETE
2.2 Semgrep scanning2026-01-31ClaudeCOMPLETE (CI config)
2.3 CI gates2026-01-31ClaudeCOMPLETE (coverage)
3.1 Test infrastructure2026-01-31ClaudeCOMPLETE (docs/config)
3.2 Critical path tests2026-01-31ClaudeCOMPLETE (140 tests)
3.3 E2E tests in CI2026-01-31ClaudeCOMPLETE (needs secrets)
4.1 CLAUDE.md updates2026-01-31ClaudeCOMPLETE
4.2 Production blockers2026-01-31ClaudeCOMPLETE
4.3 Security review sched2026-01-31ClaudeCOMPLETE
5.1 Pre-launch checklist2026-01-31ClaudeCOMPLETE (doc created)
5.2 External security-JoePENDING (vendor needed)
SEC-001 Encryption key2026-02-04ClaudeCOMPLETE (Secrets Mgr)
SEC-003 Audit alarms2026-02-04ClaudeCOMPLETE (CloudWatch)
SEC-004 CSRF protection2026-02-04ClaudeCOMPLETE (middleware)
SEC-005 Upload validation2026-02-04ClaudeCOMPLETE (MIME+magic)
SEC-006 Security headers2026-02-04ClaudeCOMPLETE (next.config.ts)
Security documentation2026-02-04ClaudeCOMPLETE (docs/security)
Bedrock Guardrails2026-02-04ClaudeCOMPLETE (lib/ai/guardrails.ts + terraform)
HIPAA risk assessment2026-02-04ClaudeCOMPLETE (docs/admin/security/)
Asset inventory2026-02-04ClaudeCOMPLETE (docs/admin/security/)
BAA template2026-02-04ClaudeCOMPLETE (docs/admin/legal/)
E2E happy path tests2026-02-04ClaudeCOMPLETE (tests/e2e/happy-path.spec.ts)
Load testing baseline2026-02-04ClaudeCOMPLETE (tests/load/)
Evidence route org auth2026-02-06ClaudeCOMPLETE (hasOrgRole + audit logging)
NQU-166 Tenant isolation2026-02-13ClaudeCOMPLETE (all 3 batches, 12 new tests)

Last Updated: 2026-02-14 Last Session Summary: NQU-179 Part 1: Enabled ECS Exec on dev ECS service (enable_execute_command + SSM Messages IAM policy). Unblocks running diagnostic scripts against RDS from inside VPC. Also fixed tsconfig excluding scripts/dump-prompts.ts (missing supabase dep breaking type-check). 1993 tests passing.


Verification Log

Record of completed verifications:

DateItemVerified ByMethodResult

Notes

  • This plan was generated from a comprehensive security audit on 2026-01-31
  • All blocking issues (Phase 1) must be complete before feature work resumes
  • Phase 2-4 can proceed in parallel with careful prioritization
  • External security review is non-negotiable before FedRAMP pursuit