Deployment

BSFG Reference Interaction Pattern

Reference Interaction Pattern: Cross-Zone BSFG Federation

Pattern Name

Cross-Zone BSFG Federation — Reference interaction pattern for asynchronous, authenticated exchange among autonomous BSFG zones.

Classification

Attribute Value
Layer Inter-zone interaction
Kind Reference interaction pattern
Scope Two or more autonomous BSFG zones
Consistency model Eventual, asynchronous, cursor-driven
Availability model No zone depends on remote zone reachability to accept local durable work
Security model mTLS-authenticated peer federation with explicit authorization
Recovery model Replay and reconciliation from durable checkpoints

Intent

Defines the architectural contract by which autonomous BSFG zones exchange durable state, recover from partitions, and re-establish monotonic progress without introducing cross-zone availability dependencies or shared control-plane assumptions.

This pattern complements the Triad-HA deployment pattern, which specifies how one zone survives its own failures. Cross-Zone BSFG Federation specifies how autonomous zones interact when they cannot rely on each other being reachable.

Key Distinctions

Distinction Meaning
Intra-zone quorum ≠ cross-zone consistency RAFT-backed durability within a zone does not imply synchronous consistency across zones
Notification ≠ durable acceptance A peer's advisory signal does not constitute confirmed durable receipt
Connectivity ≠ authorization Network reachability does not imply permission to exchange data
Replay ≠ conflict-free merge Recovery replays from checkpoint; it does not reconcile divergent mutable state
Artifact recovery ≠ fact replication Large binary artifacts use distinct fetch semantics from message stream replay
Peer federation ≠ cluster formation Zones cooperate; they do not form a single distributed system with shared control plane
Durable append ≠ downstream processing completion A fact may be durably replicated into a zone before local consumers have processed it

Applicability

Use this pattern when:

  • Zones must remain operationally autonomous during network partitions
  • Cross-zone exchange must not block local durable work
  • Peers may be slow, intermittently reachable, or offline for extended periods
  • Recovery from outage must not require destructive resynchronization
  • Hub-and-spoke, chain, or selective mesh federation variants are required
  • Enterprise/IDMZ/Plant boundaries must be traversed without shared middleware

Non-Goals

This pattern explicitly does not provide:

  • Intra-zone failover mechanics (see Triad-HA)
  • Local host sizing or JetStream clustering details
  • Kubernetes, service meshes, or distributed orchestration primitives
  • Cross-zone synchronous commit or two-phase transaction semantics
  • Globally consistent total ordering across all zones
  • Cross-zone consensus, leader election, or shared control plane
  • Automatic conflict resolution for divergent mutable state
  • Guaranteed artifact availability during initial cross-zone handshake

Invariants

The following invariants must hold in all deployments using this pattern:

  1. Local autonomy under partition — A zone accepts locally durable work without requiring remote zone availability.

  2. Asynchronous exchange — Cross-zone propagation is explicitly non-blocking; no zone waits for peer acknowledgment before local commit.

  3. Idempotent replay — Replayed or re-delivered facts must be harmless to idempotent consumers.

  4. Cursor-driven recovery — Post-partition reconciliation advances from last durable checkpoint, not from arbitrary state comparison.

  5. No global ordering — Zones do not assume or enforce total order across zone boundaries; ordering is scoped to an exported stream.

  6. Artifact/fact separation — Binary artifacts and fact messages use distinct durability and retrieval semantics.

  7. Autonomous mode persistence — Partitioned zones continue local operation using ISB/IFB/ESB/EFB without cross-zone coordination.

  8. Non-destructive reconnection — Rejoining after partition must not require wiping local state or full re-initialization.

  9. Durable receipt precedes progress publication — A zone must not advance or advertise cursor progress beyond what is durably appended locally.

Interaction Model

Cursor Semantics

A cursor is a monotonically advancing, durable checkpoint representing the highest cross-zone fact position that a receiving zone has durably appended for a specific exported stream from a specific peer.

A cursor is:

  • scoped to one receiver, one sender, and one exported stream
  • advanced only after durable local append
  • not equivalent to consumer processing completion
  • not a global sequence number across zones

Cursor Initialization

Each peer/exported-stream relationship must define an explicit cursor initialization policy at federation bring-up time.

Permitted initialization forms include:

  • start-now
  • bounded historical backfill
  • full backfill
  • operator-approved seed position or timestamp

This pattern requires initialization to be explicit. Operational selection, defaulting, and justification belong in federation bring-up procedures rather than this architectural reference.

Zone Identity

Each zone possesses a stable, cryptographically bound identity:

  • Zone name — Deployment-scoped identifier (e.g., enterprise, plant-a)
  • Certificate identity — Peer certificate subject or SAN must match configured zone identity according to local policy
  • Identity scope — Authorization policy determines which peer zones may connect

Peer Relationship Model

Aspect Semantics
Relationship Explicitly configured, not auto-discovered
Directionality Bidirectional capability; unidirectional data flow per exported stream
Cardinality One-to-one, one-to-many, or many-to-many as configured
Lifecycle Long-lived; reconnection resumes from checkpoint

Sender vs Receiver Roles

Role Responsibility
Sender (originating zone) Appends facts to local outbound boundary roles; makes them available for fetch; does not await remote confirmation
Receiver (target zone) Polls or accepts advisory notification; fetches via cursor; durably appends before advancing progress

Default Exchange Mode: Receiver-Driven Cursor-Based Fetch

The canonical cross-zone interaction is receiver-driven:

  1. Receiving zone maintains durable cursor position per peer and exported stream
  2. Receiving zone periodically initiates fetch against peer endpoint, supplying last known durable cursor
  3. Peer responds with facts from that position, bounded by batch constraints
  4. Receiving zone durably appends to local inbound boundary roles, then advances cursor
  5. Optional: receiving zone may emit advisory progress confirmation as a non-authoritative optimization

Advisory notification is permitted as a latency optimization:

  • Sender may notify that new facts are available
  • Receiver must not treat notification as durable acceptance
  • Receiver still performs explicit fetch and local durable append before progress advancement

No correctness property depends on advisory notification or advisory confirmation paths.

Acknowledgment Semantics

Signal Meaning Reliability
Fact append (local) Durable in local JetStream Guaranteed by RAFT quorum
Fetch response Facts transmitted over mTLS At-least-once transport
Remote durable append Facts durably appended in receiving zone Reflected by cursor advancement
Advisory notification Hint only; no durability claim May be lost or reordered

Exchange Primitives

This document defines architectural primitives and semantics, not a normative wire format, CLI shape, or endpoint schema. Concrete operator interfaces belong in runbooks and implementation documentation.

Primitive Direction Durable? Purpose
FetchFacts Receiver → Sender No (transport only) Replay facts from supplied cursor
NotifyAvailable Sender → Receiver No Advisory latency optimization
QueryCursor Either → Either No Discover peer-reported durable cursor position
FetchArtifact Receiver → Sender No (fetch) Retrieve binary payload by reference
HealthCheck Either → Either No Verify reachability and identity
BackfillRange Receiver → Sender No Replay bounded historical range or equivalent bounded recovery request on gap detection

Primitive Semantics

FetchFacts (canonical)

  • Request includes: receiver's last durable cursor and fetch bounds such as batch size and wait limit
  • Response includes: facts from that position, next cursor, and end-of-stream indicator
  • Retry: On transport failure, receiver retries with the same cursor

NotifyAvailable (advisory)

  • Sender indicates approximate availability of new facts
  • Receiver may choose to fetch immediately or continue its polling schedule
  • Lost notifications are harmless; polling remains the correctness baseline

FetchArtifact

  • Request uses an artifact reference from a fact, such as URI or content address
  • Response returns binary payload or redirect to zone-local object store
  • Content-addressed verification is recommended where applicable

Consistency and Ordering Semantics

What Is Guaranteed

Property Scope Mechanism
Local durability Within zone JetStream RAFT, configured sync policy
Monotonic cursor Per peer/exported-stream relationship Durable checkpoint in receiving zone
Idempotent append Cross-zone putIfAbsent or equivalent at storage interface
Per-stream ordering Within one exported stream Stream semantics of originating zone

What Is Not Guaranteed

Property Why Absent
Global total order No cross-zone clock synchronization or sequencing service
Synchronous replication Design explicitly rejects blocking on remote durability
Cross-zone linearizability Zones observe each other via replay, not shared memory
Immediate artifact availability Artifacts may require separate fetch; not inlined in fact stream

Duplicate Handling

  • Facts carry stable message_id derived from business event
  • Receiving zone storage interface enforces putIfAbsent
  • Replayed duplicates are discarded at storage layer
  • Consumers must also be idempotent as defense in depth

Cursor Advancement

  • Cursor represents durable local append position, not processed position
  • Consumer processing lag is separate from replication cursor
  • Cursor advancement is irreversible within a zone
  • Cross-zone progress is monotonic per peer/exported-stream relationship, not globally synchronized

Failure Model

Failure Class Scenario Required Behavior
Peer unreachable Complete network partition to a specific peer Continue local acceptance; accumulate backlog for that peer relationship; retry with backoff
Asymmetric reachability A→B reachable, B→A not Receiver on blocked side cannot fetch; sender may notify into void; no blocking; eventual retry when path restores
Stale/invalid certificates mTLS handshake fails Reject connection; alert operator; do not bypass or degrade
High latency / intermittent >5s round-trip, packet loss Exponential backoff; batch size adaptation; alert on threshold breach
Long partition Hours to days of isolation from a peer Autonomous mode persists for that peer relationship; buffers accumulate; operator alert on threshold
Complete peer zone loss Peer permanently destroyed or decommissioned Local zone continues; treat peer as unavailable until explicit replacement or re-authorization
Local zone survives, peers lost Network or peer outage Full local autonomy; no local durability degradation; outbound backlog growth monitored
Peer returns with stale cursor Peer restored from older backup Cursor comparison detects lag; automatic backfill or operator intervention
Peer returns with incompatible history Non-prefix history or corrupted cursor Halt automatic reconciliation; require operator investigation

Failure Outcome Summary

Outcome Condition
Continue Local zone always continues accepting work
Queue Outbound facts accumulate for the affected peer relationship
Deny Peer exchange stops; no remote blocking
Retry Automatic with exponential backoff
Replay On reconnection, resume from durable cursor
Operator Only when invariants cannot be re-established automatically

Partition Behavior

Entry into Autonomous Mode

Triggered per peer relationship by:

  • Peer unreachable after retry threshold
  • mTLS authentication failure
  • Explicit administrative partition command

Local behavior:

  • Local producer append continues
  • Local consumer processing continues against already durable local state
  • Fetch to the affected peer stops
  • Cursor position for the affected peer/exported-stream relationship freezes
  • Alert generated: partition_detected

During Partition

  • Producers: Non-blocking append to local outbound boundary roles
  • Consumers: Continue from local inbound boundary roles and may become stale relative to remote peers
  • Buffers: Accumulate for the affected peer relationship
  • Artifacts: References remain valid locally; fetch from unreachable peer fails

Exit from Autonomous Mode

Triggered per peer relationship by:

  • Peer reachable and authenticated
  • Health check passes
  • Optional: operator explicit reconcile command

Recovery sequence:

  1. Health handshake verifies peer identity and liveness
  2. Cursor query compares positions
  3. If gap detected: backfill from lower cursor
  4. If incompatible history detected: operator intervention
  5. Normal fetch resumes
  6. Alert cleared: partition_resolved

Reconciliation and Recovery

Post-Partition Reconciliation

Step Action Actor
1 Verify peer identity and health Both zones
2 Query peer-reported cursor position Receiving zone
3 Compare with local durable cursor Receiving zone
4a If local receiver is behind: fetch from peer cursor Receiving zone
4b If peer receiver is behind: peer fetches from local Peer zone
4c If bounded gap detected: request backfill range Receiving zone
5 Resume normal fetch Both zones

Gap Handling

  • Small gap (< batch size): Automatic backfill via extended fetch
  • Large gap: Explicit bounded range request or operator decision
  • Cursor invalid: Operator resets cursor or performs full re-initialization; destructive actions require explicit justification

Artifact Rehydrate

  • Facts replicate via cursor-based replay
  • Artifacts may be missing on receiving zone after failover or long partition
  • Artifact fetch is on-demand or background, not inline with fact replication
  • Missing artifact on fetch triggers retry with backoff, alerting, and optional background rehydrate job
  • Artifact exchange may be enabled or disabled per peer relationship and policy scope; fact replay does not imply unrestricted artifact access

When Replay Is Insufficient

Scenario Action
Peer zone replaced after loss Restore from backup or initialize replacement zone; require explicit authorization
Complete logical corruption Operator wipe and re-initialize; replay from surviving peers or backup
Invariant violation detected Halt cross-zone exchange; operator investigation

Backpressure and Buffer Semantics

The four-buffer names below refer to logical boundary roles. Implementations may realize them as one or more physical streams, stores, or queue views.

Buffer Direction Cross-Zone Role Backpressure Trigger
ISB Inbound Logical ingress role for accepted peer facts Ingress fill > threshold
IFB Inbound Logical handoff role for local consumers Consumer lag > threshold
ESB Outbound Logical egress staging for facts made available to peers Egress fill > threshold
EFB Outbound Logical delivery-facing role used during peer transfer Delivery-facing fill > threshold

Buffer Thresholds and Policy

Condition Threshold Policy Alert
Ingress fill high >80% Reject or defer additional peer intake to preserve local capacity Tier 1
Outbound fill high >80% Apply producer backpressure, rejection, or defer policy per stream class Tier 1
Delivery-facing fill high >80% Continue fetch where possible; escalate producer backpressure if sustained Tier 1
Consumer lag high >10,000 facts Scale consumers or investigate downstream slowness; alert if sustained Tier 2
Cross-zone replication lag high >60 seconds Investigate network, peer health, or policy mismatch Tier 1

Retry and Backoff

Operation Initial Interval Backoff Max Interval Circuit Breaker
FetchFacts (normal) 1 second Exponential 2x 30 seconds After 5 consecutive failures
FetchFacts (partitioned) 5 seconds Exponential 2x 5 minutes Manual or health-check reset
Artifact fetch 1 second Linear + jitter 60 seconds Per-artifact failure tracking

Security Contract

Authentication

Layer Mechanism Verification
Transport mTLS 1.2+ Certificate chain to shared or cross-signed CA
Identity Certificate subject or SAN Must match configured zone identity per local policy
Authorization Explicit allow-list Zone A explicitly authorized to exchange with Zone B

Authorization Scope

  • Peer matrix: Configuration specifies which zones may connect
  • Stream scoping: Authorization may restrict which exported streams are visible per peer
  • Artifact scoping: Artifact references may be filtered or redirected based on peer authorization

Certificate Lifecycle

Event Action
Rotation (planned) Rolling restart across zone nodes; peer reconnection with new certificate
Expiry approaching (<30 days) Alert Tier 2; schedule rotation
Expiry imminent (<7 days) Alert Tier 1; prepare partition if rotation fails
Post-expiry connection attempt Reject; alert; require operator intervention

Trust Failure Behavior

Scenario Response
Unknown CA Reject; log; alert
Mismatched identity Reject; log; alert
Revoked certificate Reject; log; alert; check CRL/OCSP if configured
Clock skew (TLS validity window) Reject; alert; investigate NTP

Federation Variants

Chain: Enterprise ↔ IDMZ ↔ Plant

[Enterprise] ←→ [IDMZ] ←→ [Plant A]
                    ←→ [Plant B]
  • Purpose: Mediated boundary with inspection or mapping zone
  • Preserves: IDMZ as non-transparent relay; no direct Enterprise–Plant connectivity
  • Complicates: Latency, additional cursor hop, IDMZ bottleneck risk

Hub-and-Spoke: Enterprise Center

       [Plant A]
          ↑
[Plant B] ← [Enterprise] → [Plant C]
          ↓
       [Plant D]
  • Purpose: Central aggregation and control-point federation
  • Preserves: Simple peer matrix; Enterprise as integration hub
  • Complicates: Central load concentration; broader blast radius of Enterprise partition

Selective Mesh: Plant-to-Plant

[Plant A] ←→ [Plant B]
    ↕           ↕
[Plant C] ←→ [Enterprise]
  • Purpose: Direct peer coordination where justified
  • Preserves: Autonomy without mandatory hub
  • Complicates: O(N²) peer matrix and multiplied partition paths

Bilateral: Two-Zone Partnership

[Zone A] ←→ [Zone B]
  • Purpose: Simplest direct federation
  • Preserves: All invariants with minimal complexity
  • Complicates: No structural indirection or traffic isolation layer

Assisted Transfer: Intermittently Connected

[Plant A] ←→ [Satellite Link] ←→ [Enterprise]
  • Purpose: High-latency or intermittently connected environments
  • Preserves: Local autonomy with large backlog tolerance
  • Complicates: Extended autonomous periods and large replay windows

Operational Procedures

Planned Maintenance Partition

  • Verify available buffer headroom before partition
  • Suspend fetch to the affected peer relationship while ensuring local autonomy remains intact
  • Require authenticated health handshake and cursor reconciliation before normal flow resumes

Unplanned Partition Recovery

  • Detect via health-check failure and raise alert
  • Verify local autonomous operation continues
  • Attempt reconnection with exponential backoff
  • On reachability restore: verify identity and compare cursors
  • Perform automatic backfill for small gaps; require operator decision for large gaps or incompatible history
  • Resume normal fetch and clear alert

Peer Certificate Rollover

  • Generate new certificate with overlapping validity
  • Deploy to zone nodes via rolling restart
  • Verify peer accepts new certificate
  • Monitor for rejection errors

Zone Rejoin After Outage

Scenario Procedure
Zone restored from backup Verify cursor position; backfill from peers; explicit re-authorization if identity changed
Zone rebuilt as new identity Treat as new peer; require explicit authorization; no automatic trust
Zone returns with incompatible history Operator investigation; possible wipe and re-initialize

Backlog Drain

  • Monitor drain rate after partition
  • Investigate sustained high lag: network capacity, peer capacity, or policy mismatch
  • Check for persistent partition or peer rejection if backlog does not drain

Recovery Validation

Check Method
Peer connectivity Health check
Authentication mTLS handshake success
Authorization Authorization policy permits fetch for configured exported streams
Cursor monotonicity Query returns expected durable cursor progression
Replication flow Lag metric decreases toward zero
Idempotency Duplicate replay test, optional

Validation Note

Validation of incompatible-history handling, large-gap recovery, and partition behavior may require controlled drills. Such drills are not part of ordinary steady-state operation.

Relationship to Triad-HA

Triad-HA specifies the recommended intra-zone substrate:

  • Three-node JetStream quorum
  • Keepalived-based controller failover
  • Host-level deployment with no Kubernetes

Cross-Zone BSFG Federation assumes each zone implements Triad-HA, or an equivalent autonomy-preserving substrate, and specifies:

  • How autonomous zones interact
  • What cross-zone contracts must hold
  • How partition and recovery behave

Critical separation:

  • Cross-zone correctness must not depend on the internal failover details of any peer
  • A zone's published interaction contract — endpoints, certificates, cursor behavior, and authorization policy — is the only assumption permitted
  • Peers treat each other as black boxes that honor the federation contract

References

  • BSFG Architecture Map: Three-layer ontology (principle, logical, substrate)
  • ADR-0001: Boundary Must Contain No Durable Middleware
  • ADR-0002: Four-Buffer Topology Is the Minimal Partition-Tolerant Boundary
  • ADR-0006: Boundary Communication Is Asynchronous Replay
  • ADR-0011: Boundary Identity Uses Mutual TLS
  • ADR-0029: Cross-Zone Synchronization Uses BSFG Peer Protocol, Not Native Stream Mirroring
  • ADR-0032: Cross-Zone Transfer Is Pull-Driven by the Receiving Zone
  • ADR-0042: Four-Buffer Entities Are Boundary Roles Implemented by BSFG Nodes
  • Triad-HA Deployment Pattern: Intra-zone substrate realization
  • NATS JetStream clustering and replication documentation