Architecture Decision Record

BSFG ADR-0019

Status: Accepted · Date: 2026-03-03

Status: Accepted

Date: 2026-03-03

Context

BSFG nodes synchronize facts between zones using receiver-driven replay. The receiving node repeatedly fetches new facts from a peer BSFG node.

The transport protocol for this replication loop must satisfy several constraints:

  • simplicity of implementation
  • clear recovery semantics
  • bounded resource usage
  • deterministic replay behavior

Several RPC interaction models were considered for implementing the FetchFacts operation.

Considered Options

Option Description Advantages Disadvantages
A Server streaming Continuous data flow Complex connection lifecycle and harder recovery semantics
B Bidirectional streaming High throughput potential Complex protocol state and operational fragility
C Unary paged pull Simple protocol, bounded response size, clear replay semantics Requires repeated requests for continuous replication

Decision

BSFG replication uses a unary paged pull model.

The receiving node periodically issues a FetchFacts request that specifies:

  • the durable consumer identity
  • a maximum number of facts to return

The sending node uses the durable consumer's confirmed position to determine the replay starting point and responds with a bounded page of facts.

FetchFactsRequest {
  consumer_name
  limit
}

FetchFactsResponse {
  facts[]
}

The receiver appends the returned facts to its local durability substrate and later confirms receipt using a ConfirmReceipt request.

Unary paged pull defines the interaction shape of the replication loop, not a client-managed offset protocol. Fetch progress remains server-side durable state keyed by consumer_name, consistent with BSFG durable named consumers.

Consequences

Benefits:

  • simple and robust protocol semantics
  • bounded response sizes
  • clear recovery and retry behavior
  • aligns with durable named consumer progress tracking
  • no long-lived streaming connections

Tradeoffs:

  • slightly higher request overhead compared to streaming
  • replication latency depends on polling interval