security-mtls
mTLS for Core ↔︎ Runner (v0.1)
Decision: - Core↔︎Runner gRPC supports TLS/mTLS in deployment config. - Dev tooling can generate short-lived local certs for test environments. - Core is behind a L7 proxy/LB that terminates TLS for gRPC (HTTP/2).
Current repo status: - Core and Runner can run plaintext on a trusted internal network, or TLS/mTLS when cert paths are configured. - Runner can request a Core-signed certificate renewal by submitting a CSR over the authenticated internal management endpoint when issuer CA material is configured. - Runner can still execute a local renewal command before client-cert expiry and notify Core through the same authenticated internal endpoint as an operator-managed fallback. - Runner authentication now supports Core-issued per-runner tokens with enrollment and rotation; the legacy shared token remains a compatibility fallback. - Core can denylist runner IDs so revoked runners cannot reconnect, rotate tokens, or renew certificates until the denylist entry is cleared.
Goals
- Strongly authenticate each runner (device) to Core.
- Allow rapid credential revocation (compromise/employee offboarding).
- Avoid inbound firewall rules at customer sites (Runner uses outbound 443).
High-level design
Components
- Core (cloud): job orchestration + policy + audit + runner registry.
- Edge proxy (cloud): Envoy/ALB/NLB/NGINX that terminates TLS and forwards gRPC to Core.
- Runner (on-prem): keeps an outbound streaming gRPC connection to Core.
Certificates
- Root CA: controlled by the service operator.
- Server certificate: used by the edge proxy (public DNS name).
- Runner client certificates:
- 1 certificate per runner
- Subject/SAN contains
runner_idand optionaltenant_id - short-lived certs are recommended; local dev scripts generate sample certs
Current renewal model
- Runner has a current client cert + key.
- Runner reconnects periodically and presents its cert.
- Before expiry (e.g. T-7 days), runner can execute a configured local renewal command and then report the new cert expiry back to Core telemetry.
Current renewal options
- Core-signed CSR renewal
- Runner generates a CSR locally from its configured client key.
- Runner sends the CSR over the authenticated internal management path.
- Core signs and returns a new cert plus CA bundle when issuer config is present.
- Operator-managed local renewal
- Runner executes a local command and reports the new cert expiry back to Core.
Revocation / offboarding
Current pragmatic approach:
- Keep a Core-side denylist of runner_id (and/or cert fingerprint).
- Use short expiry (30 days) to cap compromise window.
Future options: - CRL/OCSP if needed for stricter compliance.
Operational notes
- The edge proxy must be configured for:
- gRPC over HTTP/2
require_client_certificate: true- trusted client CA bundle
- LB/proxy forwards client identity to Core (either via mTLS passthrough or validated headers; validated headers only).
Dev/test
See certs/dev/ and scripts/dev-mtls-certs.sh for local self-signed CA + server/client cert generation.