Reference Deployment Pattern: Triad-HA with Keepalived Failover
Pattern Name
Triad-HA — Three-node single-zone deployment pattern with active-passive BSFG controller failover and quorum-backed JetStream durability.
For host-level embodiment of this pattern, see Reference Physical Realization: Triad-HA Zone.
Classification
- Layer: Substrate
- Kind: Reference deployment pattern
- Scope: One BSFG zone
- Failure model: Any single node may fail without loss of acknowledged data, subject to the configured JetStream durability tier
- Availability model: Active-passive controller failover via Keepalived VIP
- Persistence model: Three-node JetStream quorum
- Operational model: Host-based services, no Kubernetes, no shared storage
Intent
Defines the minimal host-level deployment pattern for one BSFG zone that provides:
- Automatic controller failover between two service-bearing nodes
- Quorum-backed JetStream durability across three nodes
- No shared storage between nodes
- Bounded operational complexity for industrial IT/OT boundary deployments
Applicability
Use this pattern when:
- One BSFG zone must tolerate any single node failure
- Brief failover interruption is acceptable
- Host-based operations are preferred over cluster orchestration
- Cross-zone replication is acceptable for artifact recovery
- Dual-node failure may render the zone unavailable
Non-Goals
This pattern does not provide:
- Zone availability after simultaneous loss of both controller-bearing nodes
- Synchronous within-zone artifact replication
- Point-in-time recovery
- Dynamic orchestration or automatic workload rescheduling beyond host-level failover
Logical Node Roles
| Node | Role | Controller | JetStream |
|---|---|---|---|
| Alpha | Primary service-bearing | Active when VIP held | RAFT voter |
| Beta | Secondary service-bearing | Standby; promotable | RAFT voter |
| Gamma | Non-controller quorum | None | RAFT voter with reduced workload expectations |
Canonical host placement and physical embodiment are defined in the physical realization document.
Network Model
Logical Bindings
| Service | Bind Target | Notes |
|---|---|---|
| BSFG Connect RPC | Zone VIP | mTLS required |
| NATS client | Localhost | BSFG connects only to local JetStream |
| JetStream cluster | Node IPs | Full mesh among Alpha, Beta, Gamma |
| Keepalived VRRP | Alpha ↔ Beta only | Dedicated coordination path |
Communication Requirements
- VIP on active service-bearing node
- JetStream cluster communication among the three nodes
- VRRP coordination between Alpha and Beta
Physical NIC, segment, and bind details are defined in the physical realization document.
Security Profile
TLS
- Minimum version: TLS 1.2
- Preferred version: TLS 1.3
- Authentication mode: Mutual TLS
- Cipher policy: Enterprise-approved baseline; Mozilla Intermediate is acceptable if no stricter standard exists
- Certificate rotation window: Alert at 30 days before expiry; rotate via rolling restart
VRRP / Keepalived
- VRRP traffic is not treated as a security boundary
- Keepalived peers must communicate only across a dedicated, restricted segment
- Unicast peering is preferred over multicast
- VRRP exposure outside the zone-local control segment is prohibited
Storage Model
Artifact Semantics
| Aspect | Behavior |
|---|---|
| Storage location | Local storage on current active node |
| Intra-zone replication | None |
| Cross-zone replication | Via BSFG only |
| Failover behavior | Artifacts may be temporarily unavailable after controller failover |
| Recovery model | Rehydrate from peer zones or restore from backup |
Storage Principles
- Artifact storage is local to the active service-bearing node
- No shared storage is required between nodes
- JetStream durability is quorum-backed across all three nodes
- Artifact availability after failover depends on prior cross-zone replication
Mount paths, RAID configuration, and physical disk specifications are defined in the physical realization document.
Service Model
The canonical runtime substrate is host-based:
- Host OS with systemd for service lifecycle
- Docker Compose or equivalent host-local container runner
- Keepalived for VIP failover on service-bearing nodes
The examples below illustrate one canonical host-based realization. They are not the sole valid packaging format, provided the pattern invariants are preserved.
Example Service Layout (Alpha / Beta)
# Example: /opt/bsfg/docker-compose.yml
services:
jetstream:
image: nats:2.10-jetstream
volumes:
- /data/jetstream:/data/jetstream
- ./jetstream.conf:/etc/nats/nats-server.conf:ro
network_mode: host
bsfg-controller:
image: bsfg:v1.x
environment:
- ZONE_ID=${ZONE_ID}
- NODE_NAME=${NODE_NAME}
- PEER_ENDPOINTS=${PEER_ZONE_ENDPOINTS}
- JETSTREAM_URL=nats://localhost:4222
- BIND_ADDRESS=${VIP}:9443
volumes:
- ./certs:/certs:ro
- /artifacts:/artifacts:ro
network_mode: host
# Started only by promotion flow
Example Service Layout (Gamma)
# Example: /opt/bsfg/docker-compose.yml
services:
jetstream:
image: nats:2.10-jetstream
volumes:
- /data/jetstream:/data/jetstream
- ./jetstream.conf:/etc/nats/nats-server.conf:ro
network_mode: host
# No BSFG controller
Detailed host placement, resource controls, and systemd configuration are defined in the physical realization document.
JetStream Durability Profile
Baseline Settings
| Parameter | Value | Rationale |
|---|---|---|
| Stream replicas | 3 | Survives any single node loss |
| Consumer replicas | 3 | Consumer state survives single node loss |
| Node count | 3 | Smallest practical quorum configuration |
sync_interval |
Tier-dependent | Controls window between acknowledgment and durable flush |
Durability Tiers
Standard
sync_interval: default- Intended for general zone traffic
- Survives any single node loss
- May lose recently acknowledged messages under correlated crash or sudden power-loss scenarios before flush
Critical
sync_interval: always- Intended for streams whose acknowledged writes must survive correlated crash conditions
- Higher latency and lower throughput accepted as tradeoff for stronger durability
Semantics
- Single-node failure tolerance is guaranteed only within the selected durability tier
- The document does not claim immunity to all correlated crash modes under the Standard tier
- Durability claims apply to acknowledged JetStream data, not to local artifact files stored only on the active node
Failover Contract
A node may run the BSFG controller in active mode only when all of the following are true:
- The node currently holds the zone VIP
- Local JetStream is reachable
- JetStream cluster quorum is present
- Required artifact storage is available
- Required certificates satisfy the local certificate-validity policy
- The controller binds successfully to the VIP address
Loss of any of the above must cause controller demotion or failed promotion.
Additionally:
- The BSFG controller must not be independently enabled for unconditional boot-time start
- Active startup is governed by VIP ownership and promotion gates, not by generic service auto-start
Promotion Gates
Promotion to active controller is gated by:
- Local JetStream health
- Cluster quorum presence
- Artifact storage availability
- Certificate validity satisfying local policy
- VIP ownership confirmation
The exact certificate-validity threshold is a local policy choice. Implementation thresholds belong in the physical realization, runbooks, or local operations policy—not in this pattern.
Failover Mechanics
Keepalived Integration
- Keepalived manages VIP ownership between Alpha and Beta
- State transitions trigger controller promotion/demotion
- VRRP health tracking ensures JetStream availability before VIP acquisition
Detailed Keepalived configuration, notification scripts, and health probes are defined in the physical realization document and deployment runbook.
Dual-Active Prevention
Dual-active operation is prevented by all of the following:
- Keepalived triggers explicit stop on BACKUP and FAULT transitions
- BSFG binds specifically to the VIP, not to
0.0.0.0 - Controller start is gated by local dependency checks
- A node lacking the VIP cannot successfully bind the service endpoint
- The controller is not independently auto-started outside promotion flow
Failure Semantics
| Scenario | Result | Recovery Action |
|---|---|---|
| Alpha fails | VIP moves to Beta; Beta promotes if all gates pass | Repair Alpha; rejoin as standby |
| Beta fails | No role change; Alpha remains active | Repair Beta; rejoin as standby |
| Gamma fails | JetStream remains available with 2-node quorum | Repair Gamma; rejoin cluster |
| Alpha and Beta fail | Zone unavailable; no controller and no quorum | Recover at least one service-bearing node |
| Alpha isolated from Beta+Gamma | Alpha loses effective quorum; Beta may promote if healthy | Restore network; Alpha rejoins as standby |
| Gamma isolated from Alpha+Beta | No controller change; Alpha+Beta retain quorum | Restore Gamma connectivity |
| Artifact storage unavailable on promoted node | Promotion denied | Restore storage or perform operator intervention |
| Certificate validity below policy threshold | Promotion denied or warned according to local policy | Rotate certificates or apply approved exception |
Bootstrap Semantics
- Static cluster configuration is assumed
- Startup order is not semantically significant
- Gamma-first startup is conventional, not required
- BSFG controller startup is driven by Keepalived state transitions, not by container orchestration
- Alpha is expected to acquire the VIP first under normal conditions
- Beta remains standby until promotion criteria are met
Step-by-step bootstrap procedures are defined in the deployment runbook.
Backup and Recovery
This pattern supports snapshot-based recovery only. It does not provide point-in-time recovery.
| Failure | Recovery |
|---|---|
| Single disk failure | Replace disk; rebuild RAID; verify JetStream rejoin |
| Single node loss | Rebuild node; restore from peers or snapshot as appropriate |
| Logical corruption / ransomware | Restore clean snapshot to new cluster; replay from peer zones if available |
| Complete zone loss | Rebuild zone and rehydrate from peer-zone replication and retained snapshots |
Snapshot procedures and recovery commands are defined in the deployment runbook.
Monitoring Requirements
| Check | Target | Alert Threshold |
|---|---|---|
| VIP held by expected active node | Alpha / Beta | VIP absent from both, or present on unexpected host |
| Keepalived role transition churn | Alpha / Beta | Excessive transition rate |
| BSFG controller health | Zone VIP | Non-200 health response |
| Controller bound to VIP only | Active node | Bound to non-VIP address |
| Unexpected controller on standby node | Beta / Gamma | BSFG process running unexpectedly |
| Local JetStream health | All nodes | Health probe failure |
| JetStream cluster membership | Any node | Fewer than 3 configured voters visible |
| JetStream quorum availability | Any node | Quorum not confirmed |
| Artifact storage availability | Active node | Storage unavailable |
| Certificate expiry | All nodes | Less than policy threshold remaining |
| Cross-zone replication lag | Peer zones | Excessive lag |
Detailed monitoring configuration, alert thresholds, and escalation tiers are defined in the physical realization document and deployment runbook.
Accepted Limitations
| Limitation | Rationale |
|---|---|
| Zone unavailable if both Alpha and Beta are lost | Higher-node-count patterns rejected for complexity |
| Artifacts may be temporarily unavailable after failover | Within-zone synchronous artifact replication rejected |
| Failover may take 5–30 seconds | Health gates plus service startup are accepted operational cost |
| Standard durability tier may lose very recent acknowledged messages under correlated crash conditions | Flush interval tradeoff accepted for general traffic |
| No point-in-time recovery | Snapshot recovery sufficient for intended use |
| No automatic recovery of local artifacts absent peer-zone copy | Cross-zone replay accepted as recovery mechanism |
Invariants Preserved by This Pattern
- No shared storage is required for zone operation
- No Kubernetes or distributed scheduler dependency is introduced
- A single node failure does not require operator intervention to preserve message durability
- Controller leadership is coupled to VIP ownership and local health gates
- Durable message state remains quorum-backed
- Artifact handling remains outside durable middleware semantics at the zone boundary
References
- Reference Physical Realization: Triad-HA Zone
- Runbook: Triad-HA Zone Deployment
- Checklist: Triad-HA Commissioning
- BSFG Architecture Map: Three-layer ontology
- ADR-0001: Boundary Must Contain No Durable Middleware
- ADR-0002: Four-Buffer Topology Is the Minimal Partition-Tolerant Boundary
- ADR-0042: Four-Buffer Entities Are Boundary Roles Implemented by BSFG Nodes
- NATS JetStream clustering documentation
- NATS disaster-recovery documentation
- Keepalived configuration reference