Deployment

BSFG Cross Zone Federation Bring Up Runbook

Runbook: Cross-Zone Federation Bring-Up

Purpose

Operationalize the establishment of authenticated, authorized, cursor-driven interaction between already-functional autonomous BSFG zones.

This runbook defines how operators create federation relationships without breaking local autonomy or introducing cross-zone availability dependencies.

Scope

This runbook covers:

  • Peer identity and trust establishment
  • Authorization configuration
  • Exported stream configuration
  • Cursor initialization policy
  • Initial replay/fetch validation
  • Partition and recovery validation

This runbook does not cover:

  • Local zone deployment (see Runbook: Triad-HA Zone Deployment)
  • Intra-zone failover mechanics
  • Application-level integration

Reference

This runbook operationalizes the Reference Interaction Pattern: Cross-Zone BSFG Federation.


1. Preconditions

Verify all before proceeding. Halt if any precondition fails.

Check Method Expected Result
Both zones healthy locally Checklist: Triad-HA Commissioning passed Both zones show "Passed" or "Passed with Exception"
Endpoints reachable and authenticatable Approved health/TLS probe from each zone Peer endpoint reachable; TLS handshake succeeds with configured trust; authenticated health response returned
Certificates issued for both zones openssl x509 -in /opt/bsfg/certs/server.crt -noout -dates Valid, not expired
Trust anchors distributed CA certificates present and verify peer chain ca.crt readable; peer certificate chains to installed trust anchor
Authorization policy approved Ticket or document reference Matrix of allowed streams per peer approved
Exported streams identified Message catalog or stream list facts.operational, facts.audit, etc. defined
Artifact access policy defined Document reference Which artifact types accessible per peer
Monitoring available on both sides Dashboard verification Metrics and alerts visible for both zones

2. Inputs Required

Input Description Example
ZONE_A_IDENTITY Initiating zone name and certificate identity enterprise, identity enterprise-bsfg (subject or SAN per policy)
ZONE_B_IDENTITY Target zone name and certificate identity plant-a, identity plant-a-bsfg (subject or SAN per policy)
ZONE_A_ENDPOINT Zone A VIP and port 10.1.1.10:9443
ZONE_B_ENDPOINT Zone B VIP and port 10.3.1.10:9443
TRUST_CHAIN CA certificate or cross-signed trust anchor /opt/bsfg/certs/ca.crt
AUTHORIZATION_MATRIX Approved stream and artifact permissions Document or config reference
EXPORTED_STREAMS List of streams Zone A exports to Zone B facts.operational, facts.batch_completed
CURSOR_INIT_MODE How to initialize cursor for this relationship bounded_backfill_24h (default), start_now, full_backfill, or explicit timestamp
ARTIFACT_ACCESS_RULES Which artifact types accessible batch-files: read, documents: read

3. Trust and Identity Setup

3.1 Certificate Deployment Verification

On Zone A (and symmetrically on Zone B):

# Verify local certificate
openssl x509 -in /opt/bsfg/certs/server.crt -noout -subject -dates
# Expected: subject=CN = enterprise-bsfg, notBefore valid, notAfter future

# Verify peer CA trust (conceptual)
# Trust anchor must be present and must anchor peer certificate chain
# Example verification approaches:
# - openssl verify -CAfile /opt/bsfg/certs/ca.crt /path/to/peer/cert.pem
# - Verify via configured TLS client during health probe
# - Check certificate chain in TLS handshake output

Expected result: Local certificate valid; trust anchor present and correctly anchors peer certificate chain; peer identity matches configured zone identity policy.

Halt if: Certificate expired, identity does not match policy, chain broken, or trust anchor missing.

3.2 Peer Identity Verification

From Zone A, verify Zone B identity:

# Extract and verify Zone B certificate via TLS handshake
echo | openssl s_client -connect 10.3.1.10:9443 -servername plant-a-bsfg 2>/dev/null | openssl x509 -noout -subject
# Expected: subject or SAN contains identity matching zone policy (e.g., plant-a-bsfg)

From Zone B, verify Zone A identity (symmetric):

echo | openssl s_client -connect 10.1.1.10:9443 -servername enterprise-bsfg 2>/dev/null | openssl x509 -noout -subject -ext subjectAltName
# Expected: subject or SAN contains identity matching zone policy (e.g., enterprise-bsfg)

Expected result: Peer certificate identity (subject or SAN) matches configured zone identity policy.

Halt if: Identity mismatch, certificate chain broken, or hostname verification fails.

3.3 Revocation and Validity Policy Check

# Verify certificate not revoked (if CRL/OCSP configured)
openssl x509 -in /opt/bsfg/certs/server.crt -noout -ocsp_uri  # Check if OCSP available
# If OCSP available: verify with ocsp command

# Verify certificate validity window
openssl x509 -checkend 2592000 -noout -in /opt/bsfg/certs/server.crt
# Expected: exit 0 (30+ days remaining)

Expected result: Certificate valid, not near expiry, not revoked.

Halt if: Expiry < 30 days or revocation detected.


4. Authorization Setup

4.1 Peer Allow-List Configuration

On Zone A, install rendered peer authorization configuration:

# Install managed configuration for peer authorization
# /opt/bsfg/config/peers.yaml or equivalent managed config artifact

# Example structure (illustrative only):
# peers:
#   - id: plant-a
#     identity: plant-a-bsfg
#     endpoint: 10.3.1.10:9443
#     authorized: true
#     streams:
#       - facts.operational
#       - facts.batch_completed
#     artifacts:
#       - batch-files
#       - documents

# Apply using approved configuration management:
# - Configuration management system (Ansible, Puppet, etc.)
# - Kubernetes ConfigMap/Secret if applicable
# - Manual install with change control approval

On Zone B, configure Zone A as authorized peer (symmetric or asymmetric as policy requires):

# Same approach: install rendered managed configuration
# Example structure (illustrative only):
# peers:
#   - id: enterprise
#     identity: enterprise-bsfg
#     endpoint: 10.1.1.10:9443
#     authorized: true
#     streams:
#       - facts.orders
#       - facts.shipments
#     artifacts:
#       - order-files

Expected result: Peer allow-list installed and active, streams and artifacts explicitly authorized, unauthorized peers denied.

Verification:

  • Query active configuration to confirm peer present in allow-list
  • Attempt connection from unauthorized peer and verify denial

Halt if: Authorization matrix undefined, peer not in allow-list, or config management error.

4.2 Stream Export Permissions

Verify exported streams are correctly configured and BSFG authorization policy allows export:

# On Zone A: verify stream exists and durability is correct
nats stream info facts.operational --server nats://localhost:4222
# Expected: Stream exists, replicas: 3, durability confirmed

# Verify BSFG authorization policy (via config or admin query)
# Check that peer is authorized for this stream per peers.yaml or equivalent

Expected result: Streams exist, durability confirmed, BSFG authorization policy allows export to peer.

Note: NATS stream configuration shows substrate durability; federation authorization is enforced by BSFG policy layer.

4.3 Denial Behavior Verification

Test that unauthorized access is rejected:

# Attempt connection from unauthorized source (or simulate wrong identity)
# Using wrong certificate or untrusted CA:
curl --cacert /opt/bsfg/certs/ca.crt \
  --cert /wrong/cert.pem --key /wrong/key.pem \
  https://10.3.1.10:9443/health
# Expected: TLS handshake failure or application-layer rejection

# Or attempt without client cert (if mutual TLS required):
curl --cacert /opt/bsfg/certs/ca.crt https://10.3.1.10:9443/health
# Expected: TLS handshake failure (client cert required)

Expected result: Unauthorized peers cannot establish TLS or are rejected at application layer.


5. Cursor Initialization Policy

5.1 Select Initialization Mode

Mode When to Use Implication
bounded_backfill_24h (default) Normal production bring-up Replays only the configured lookback window; earlier history is not requested unless separately backfilled
bounded_backfill_Nh Known recent start point Replay N hours; operator specifies N
start_now Greenfield streams, no history needed No backfill; only facts from now forward
full_backfill Disaster recovery, complete reconstruction Replay all history; may be massive
explicit_timestamp Specific recovery point Operator provides ISO timestamp

Default: bounded_backfill_24h unless explicitly overridden per-stream.

5.2 Configure Cursor Initialization

On receiving zone (Zone B for Zone A→B flow), install cursor initialization configuration:

# Install managed configuration for cursor initialization
# /opt/bsfg/config/cursors.yaml or equivalent managed config artifact

# Example structure (illustrative only):
# cursors:
#   - peer: enterprise
#     stream: facts.operational
#     init_mode: bounded_backfill_24h
#     # Alternative: explicit_timestamp with value
#     # init_timestamp: "2025-01-15T10:00:00Z"

# Apply using approved configuration management

Expected result: Cursor initialization policy documented, configured, and active.

Verification: Query active configuration to confirm policy applied.

Halt if: Policy undefined, contradicts business requirements (e.g., required history outside backfill window), or config management error.

5.3 Per-Stream Override Capability

Some streams may require different initialization. Document and install per-stream overrides:

# Example per-stream overrides (illustrative structure):
#
# Critical audit stream: full backfill
#   - peer: enterprise
#     stream: facts.audit
#     init_mode: full_backfill
#     justification: compliance requirement
#
# High-volume telemetry: start now
#   - peer: enterprise
#     stream: facts.telemetry
#     init_mode: start_now
#     justification: volume too high, only recent data valuable

# Install via approved configuration management with documented justification

Expected result: Per-stream overrides documented with justification and installed.


6. Initial Federation Bring-Up

6.1 Health Handshake

From Zone B (receiving), verify Zone A health with mutual TLS:

# Use approved health probe with client certificate and CA trust
curl --cacert /opt/bsfg/certs/ca.crt \
  --cert /opt/bsfg/certs/server.crt --key /opt/bsfg/certs/server.key \
  https://10.1.1.10:9443/health
# Expected: 200 OK, JSON with zone identity and health status

From Zone A, verify Zone B health (symmetric):

curl --cacert /opt/bsfg/certs/ca.crt \
  --cert /opt/bsfg/certs/server.crt --key /opt/bsfg/certs/server.key \
  https://10.3.1.10:9443/health

Expected result: Both zones healthy, identities confirmed, TLS trust verified (not skipped).

Halt if: Health check fails, identity mismatch, TLS trust failure, or authentication error.

6.2 Authorization Verification

Test that authorized streams are accessible:

# Zone B queries Zone A for available streams (if query primitive available)
# Or: attempt first fetch and verify authorization succeeds

Expected result: Authorization allows configured streams, denies others.

6.3 First Fetch

Initiate first cursor-based fetch from Zone B to Zone A using the approved BSFG operator interface:

# Using the approved BSFG operator interface (CLI or API), initiate first fetch.
# The exact command depends on your deployed BSFG realization.
#
# Example (illustrative only):
# bsfg fetch --peer enterprise --stream facts.operational --cursor-init bounded_backfill_24h
#
# Or via API if available:
# curl --cert /opt/bsfg/certs/server.crt --key /opt/bsfg/certs/server.key \
#   -X POST https://10.1.1.10:9443/v1/fetch \
#   -H "Content-Type: application/json" \
#   -d '{"stream":"facts.operational","cursor_policy":"bounded_backfill_24h"}'

Expected result: Fetch succeeds, facts returned, no authorization error.

Note: Use the interface (CLI, API, or control plane) defined by your BSFG realization. The examples above are illustrative.

Halt if: Authorization denied, stream not found, or cursor initialization rejected.

6.4 First Durable Append

Verify fetched facts are durably appended to Zone B's local inbound durable store (inward-facing boundary role):

# Check local durable store for new facts
# Example using NATS JetStream (if that's your substrate realization):
nats stream info facts.operational --server nats://localhost:4222
# Expected: Messages count increased, LastSeq advanced

# Verify cursor position advanced using approved interface
# Example (illustrative): bsfg cursor query --peer enterprise --stream facts.operational
# Expected: Cursor position > initial, matches last durable append

Expected result: Facts durably appended to configured inbound realization, cursor advanced, monotonic progress confirmed.

Note: IFB (inward-facing boundary) is a logical role; your substrate realization may use different concrete names.

6.5 Cursor Advancement Confirmation

Verify cursor semantics:

# Cursor represents durable local append, not just fetch
# Re-query cursor
bsfg cursor query --peer enterprise --stream facts.operational

# Verify matches JetStream state
nats consumer info facts.operational enterprise-from-plant-a --server nats://localhost:4222
# Expected: Delivered matches cursor, AckFloor matches or lags (processing separate)

Expected result: Cursor monotonically advanced, durable position confirmed.

6.6 Advisory Notification Test (Optional)

If using push notifications for latency optimization:

# Zone A sends advisory (simulated or actual)
# Zone B receives notification and initiates early fetch

# Verify notification received (if monitoring available)
grep "notify_available" /var/log/bsfg/  # or metric

# Verify fetch initiated promptly after notification

Expected result: Notification received, fetch initiated, but correctness does not depend on notification (polling would also work).


7. Artifact Retrieval Validation

7.1 Fact References Artifact

Identify a fact that references an artifact:

# Inspect fetched facts for artifact references
nats consumer next facts.operational enterprise-from-plant-a --server nats://localhost:4222
# Look for artifact_uri field in fact body

Expected result: Fact JSON contains artifact_uri or equivalent reference.

7.2 Artifact Fetch

Retrieve referenced artifact from peer zone:

# Fetch artifact (via BSFG or direct object store if redirected)
bsfg artifact fetch --uri s3://enterprise-bsfg-artifacts/batch-files/2025/001/batch-123.json

# Or via API:
curl -k --cert /opt/bsfg/certs/server.crt --key /opt/bsfg/certs/server.key \
  -X GET "https://10.1.1.10:9443/v1/artifacts?uri=s3://enterprise-bsfg-artifacts/..."

Expected result: Artifact retrieved, content matches reference, integrity verified (content-addressed or checksum).

7.3 Integrity and Identity Policy Validation

Verify artifact integrity:

# If content-addressed: verify hash matches
sha256sum downloaded-file  # Compare to reference in fact

# If redirected to object store: verify signature/checksum

Expected result: Artifact integrity confirmed, policy enforced.

7.4 Missing Artifact Behavior Test

Test handling of missing artifact:

# Request non-existent artifact
bsfg artifact fetch --uri s3://enterprise-bsfg-artifacts/batch-files/invalid/nonexistent.json
# Expected: 404 or equivalent, retry scheduled, alert generated

Expected result: Graceful degradation, retry with backoff, operational alert.


8. Partition and Recovery Drill (Controlled / Maintenance-Window Only)

Warning: This drill simulates network partition and uses firewall rules. Execute only in:

  • Lab/test environments
  • Scheduled maintenance windows with explicit change control
  • With rollback plan documented and ready

8.1 Simulate Peer Unreachability

Block connectivity from Zone B to Zone A using approved network administration procedures:

# Example: block outbound to Zone A VIP (illustrative)
# Use your approved network administration interface:
# - iptables (example shown, use with caution)
# - Network ACLs
# - Administrative partition command if available

# Example (illustrative only):
# iptables -A OUTPUT -d 10.1.1.10 -j DROP

# Or if your BSFG realization provides partition simulation:
# bsfg admin partition --peer enterprise --reason "drill"

# Always have rollback ready:
# iptables -D OUTPUT -d 10.1.1.10 -j DROP  # to restore

8.2 Verify Local Autonomy

On Zone B, verify local durable work continues:

# Producer append to local ESB must succeed
nats pub facts.operational.test "{\"test\": \"partition-drill\"}" --server nats://localhost:4222
# Expected: OK

# Local consumer from IFB must continue
nats consumer next facts.operational local-consumer --server nats://localhost:4222
# Expected: Facts available (may be stale if no local production)

Expected result: Local autonomy preserved, no blocking on remote unavailability.

8.3 Verify Backlog Accumulation

Monitor outbound buffer (ESB) growth:

nats stream info facts.operational --server nats://localhost:4222
# Expected: Messages count increasing (if producers active)
# Or: specific ESB/EFB metrics showing accumulation

Expected result: Backlog accumulates for affected peer relationship, no data loss.

8.4 Restore Connectivity

Remove block or end administrative partition:

iptables -D OUTPUT -d 10.1.1.10 -j DROP
# Or: bsfg admin reconcile --peer enterprise

8.5 Verify Cursor Reconciliation

Monitor automatic recovery:

# Watch logs for reconciliation
journalctl -u bsfg-controller -f | grep -E "(reconcile|cursor|replay)"

# Expected sequence:
# - "peer enterprise reachable"
# - "cursor comparison: local=X, peer=Y"
# - "backfill required: Y-X facts"
# - "replay initiated"
# - "cursor advanced to Y"

Expected result: Automatic cursor comparison, gap detection, backfill initiation.

8.6 Verify Replay/Backfill

Confirm facts replayed successfully:

# Check that missing facts were backfilled
nats stream info facts.operational --server nats://localhost:4222
# Expected: Messages count includes backfilled facts, no duplicates (idempotent)

# Verify cursor recovery (use approved interface)
# Example (illustrative): bsfg cursor query --peer enterprise --stream facts.operational
# Expected: Cursor has advanced monotonically from pre-recovery value;
#           backlog cleared or decreasing; no duplicate side effects observed

Expected result: Backfill complete, cursor monotonically advanced from pre-recovery position, no destructive re-init required.

Note: Cursor values may not be directly comparable across zones; verify monotonic recovery and completeness, not equality.

8.7 Verify No Destructive Re-Init

Confirm local state preserved using continuity-based checks:

# Verify stream/store identity persisted (not replaced)
nats stream info facts.operational --server nats://localhost:4222
# Expected: Stream identity stable, FirstSeq <= previous FirstSeq (no reset)
#           LastSeq > previous LastSeq, sequences monotonic

# Verify no operator-triggered reset occurred
# Check configuration management logs for unapproved changes

# Logs may provide supporting evidence but are not primary proof:
# grep -i "wipe\|reset\|re-initialize" /var/log/bsfg/  # Supporting only

Expected result: Normal reconciliation, state continuity preserved, no state destruction.


9. Handoff Criteria

The federation relationship is accepted only when all criteria pass:

Criterion Verification Required Status
Authentication works mTLS handshake succeeds, peer identity verified per policy Yes [ ]
Authorization works Allowed streams fetch successfully, denied streams rejected Yes [ ]
Replay works Fetch returns facts, cursor advances Yes [ ]
Cursor advances correctly Monotonic, durable, matches local append Yes [ ]
Duplicate replay harmless Idempotent append confirmed (re-fetch same cursor, no duplicates) Yes [ ]
Partition recovery works Simulated partition, autonomous operation, clean reconciliation Yes [ ]
Artifact retrieval works Referenced artifacts fetchable, integrity verified Where enabled by policy [ ]
Alerts/metrics visible Replication lag, backlog, auth failures visible in monitoring Yes [ ]
Cursor initialization policy documented Per-stream init mode recorded with justifications Yes [ ]
Authorization matrix documented Peer allow-list, stream permissions, artifact access recorded Yes [ ]
Duplicate replay drill performed Explicit re-fetch test confirms idempotency Where exercised here; may be deferred to validation checklist [ ]

Sign-off:

Role Name Date Signature
Federation Engineer
Zone A Platform Lead
Zone B Platform Lead
Security/Compliance (if required)

10. Post-Bring-Up Reference

Document Purpose
Checklist: Cross-Zone Federation Validation Formal acceptance and audit
Runbook: Triad-HA Zone Deployment Add more zones to federation
Reference Interaction Pattern: Cross-Zone BSFG Federation Architecture reference
Reference Deployment Pattern: Triad-HA Intra-zone substrate reference