Skip to main content

Deployment Flow

Last updated: 2026-03-27


Overview

Code flows from local development to production via:

git push main -> GitHub Actions CI -> Docker build -> ECR push -> ECS deploy -> smoke test

There is one environment (see docs/admin/ops/environment-strategy.md). All merges to main trigger the full pipeline.


CI/CD Pipeline (.github/workflows/ci.yml)

Trigger

  • Push to main: runs full pipeline including deploy
  • Pull request to main: runs lint, build, security scan, and tests (no deploy)

Jobs

1. lint-and-build (all pushes and PRs)

Runs in parallel with security-scan. Concurrency group cancels superseded runs.

  1. Checkout code
  2. Setup Node.js (version from .nvmrc)
  3. npm ci (falls back to npm install)
  4. npm run lint
  5. npm run type-check
  6. npm run build

2. security-scan (all pushes and PRs)

Runs Semgrep with rulesets: p/typescript, p/security-audit, p/secrets, p/eslint-plugin-security. Fails on findings (--error).

3. test (all pushes and PRs)

Depends on lint-and-build. Runs npm test -- --coverage. Coverage threshold check (70%) is currently warn-only.

4. e2e-tests (main branch push only)

Depends on lint-and-build + test. Spins up a pgvector/pgvector:pg16 service container, bootstraps the CI database (scripts/ci-bootstrap-db.sql), runs migrations (npm run db:migrate), builds the app, and runs Playwright E2E tests against the standalone build. Uses real Cognito credentials from GitHub secrets.

5. deploy (main branch push only)

Depends on lint-and-build + test + security-scan. Has its own concurrency group with cancel-in-progress: false (in-progress deploys are never cancelled).

Steps:

  1. Change detection: Compares HEAD~1..HEAD. If only docs/tests/config changed, skips the deploy entirely.

  2. AWS auth: OIDC federation via aws-actions/configure-aws-credentials@v4. Role ARN stored in AWS_ROLE_ARN GitHub secret. No long-lived IAM keys.

  3. ECR login: aws-actions/amazon-ecr-login@v2

  4. Docker build + push: Multi-stage build (see Dockerfile):

    • Stage 1 (deps): npm ci in Alpine Node 24.12
    • Stage 2 (builder): Copy deps, copy source, npm run build. NEXT_PUBLIC_* vars passed as build args (baked into client JS). Server-side secrets are NOT build args.
    • Stage 3 (runner): Alpine Node 24.12, copies standalone output only, runs as non-root nextjs user.
    • Image tagged with git SHA: <ecr-registry>/invapp-dev-app:<sha>
  5. Register task definition: Fetches current ECS task definition, updates container image to new SHA tag, upserts environment variables (quality model config), registers new revision.

  6. Update ECS service: Points service at new task definition revision, forces new deployment. If desired count is 0, sets it to 1.

  7. Wait for stabilization: aws ecs wait services-stable blocks until the new task is healthy.

  8. Smoke test: Curls https://app.nquir.ai/api/health and checks for HTTP 200.

Key CI Environment Variables

VariableSourcePurpose
AWS_ROLE_ARNGitHub secretOIDC role for AWS access
ECR_REPOSITORYHardcoded invapp-dev-appECR repo name
ECS_CLUSTERHardcoded invapp-dev-clusterECS cluster name
ECS_SERVICEHardcoded invapp-dev-app-serviceECS service name
COGNITO_USER_POOL_IDGitHub secretAuth config (baked into client build)
COGNITO_CLIENT_IDGitHub secretAuth config (baked into client build)

Other Workflows

  • eval-check.yml: Evaluation/quality checks (separate from deploy)
  • retention-cron.yml: Scheduled retention/cleanup tasks

Infrastructure Architecture

Route53 (app.nquir.ai)
-> CloudFront (CDN + WAF + Basic Auth gate)
-> ALB (HTTPS termination, health checks)
-> ECS Fargate (private subnet, 1 task)
-> RDS PostgreSQL 15 (private subnet, encrypted, pgvector)
-> ElastiCache Redis (private subnet, TLS + auth token)
-> S3 (evidence files, signed URLs)
-> Bedrock (Claude Sonnet 4, Haiku 4.5, Titan embeddings, Cohere rerank)
-> Cognito (auth, MFA enabled)

All compute and data resources are in private subnets. Outbound traffic (Bedrock, external APIs) goes through NAT Gateway.


Database Access

Via SSM Port Forwarding (Bastion)

The bastion is a t3.micro EC2 instance in a private subnet with SSM agent. No SSH keys, no inbound security group rules. Access is via AWS Systems Manager only.

Prerequisites:

  • AWS CLI v2
  • Session Manager plugin installed (brew install --cask session-manager-plugin on macOS)
  • IAM permissions for ssm:StartSession

Get the bastion instance ID:

# From terraform output
cd infrastructure/terraform/environments/dev
terraform output bastion_instance_id

# Or find it in AWS console: EC2 -> Instances -> invapp-dev-bastion

Start a port forwarding session to RDS:

aws ssm start-session \
--target <bastion-instance-id> \
--document-name AWS-StartPortForwardingSessionToRemoteHost \
--parameters '{
"host": ["<rds-endpoint>"],
"portNumber": ["5432"],
"localPortNumber": ["5433"]
}'

This forwards localhost:5433 to the RDS instance on port 5432. Keep this terminal open.

Connect with psql:

psql -h localhost -p 5433 -U app_admin -d investigation_app

Get RDS endpoint:

cd infrastructure/terraform/environments/dev
terraform output database_endpoint

Running Migrations

Migrations use the custom runner at scripts/run-migration.ts, NOT the Supabase CLI. The runner connects via pg and tracks applied migrations in a _migrations table.

Locally (against local DB):

npm run db:migrate # Run all pending
npm run db:migrate:run <file> # Run specific file

Against production RDS (via bastion tunnel):

  1. Start the SSM port forwarding session (see above)
  2. Set environment variables pointing to the tunnel:
DB_HOST=localhost DB_PORT=5433 DB_NAME=investigation_app \
DB_USER=app_admin DB_PASSWORD=<password> DB_SSL=true \
npm run db:migrate

Or for a specific migration:

DB_HOST=localhost DB_PORT=5433 DB_NAME=investigation_app \
DB_USER=app_admin DB_PASSWORD=<password> DB_SSL=true \
npm run db:migrate:run supabase/migrations/20260327000000_example.sql

Creating a new migration:

touch supabase/migrations/$(date +%Y%m%d%H%M%S)_migration_name.sql
# Edit the file, then run it

ECS Exec (Container Shell)

For debugging the running container:

aws ecs execute-command \
--cluster invapp-dev-cluster \
--task <task-id> \
--container invapp-dev-app \
--interactive \
--command "/bin/sh"

enable_execute_command = true is set in the ECS module.


Secrets Management

Server-side secrets are stored in AWS Secrets Manager and injected into ECS tasks at runtime via the task definition's secrets block. They are NOT baked into the Docker image.

Secrets managed:

  • DB_PASSWORD
  • REDIS_AUTH_TOKEN
  • STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET
  • RESEND_API_KEY
  • NEXT_PUBLIC_SENTRY_DSN
  • ANTHROPIC_API_KEY
  • CRON_SECRET

To update a secret, modify it in AWS Secrets Manager and force a new ECS deployment (the task definition references the secret by name, so the new value is pulled on next task start).


Rollback

There is no automated rollback. To roll back:

  1. Identify the last known-good git SHA
  2. Update the ECS service to use the task definition revision that used that SHA's image
  3. Or: revert the commit on main and let CI redeploy

ECR retains all pushed images (tagged by git SHA), so any previous version can be deployed without rebuilding.