Checklist: Cross-Zone Federation Validation
Purpose
Provide a concise validation and acceptance checklist for an operational cross-zone BSFG federation relationship.
This checklist verifies that a federation relationship preserves the intended cross-zone guarantees.
Scope
This checklist validates:
- One federation relationship or peer pair
- Authentication and trust
- Authorization enforcement
- Replay semantics and cursor behavior
- Artifact retrieval semantics, where artifact exchange is enabled
- Partition tolerance
- Reconciliation behavior
- Operational visibility
This checklist does not validate:
- Intra-zone deployment
- Application-level correctness
- Business logic validation
Reference
This checklist validates conformance to the Reference Interaction Pattern: Cross-Zone BSFG Federation.
Usage Notes
- Complete only the controls applicable to the federation relationship under review.
- Mark controls as waived only with explicit justification and approval.
- Controls involving simulated certificate failure, incompatible history, large-gap replay, or induced partition should be treated as controlled drills and executed only in an approved test or maintenance window.
- CLI, API, and metric examples below are illustrative unless the implementation has already standardized them as canonical operator interfaces.
1. Identity and Trust Checks
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Peer certificate identity matches policy | Inspect peer-presented certificate through approved TLS probe or operator interface | Certificate identity matches configured peer identity policy | [ ] | Subject or SAN per local policy |
| Trust chain valid | Verify peer certificate chains to configured trust anchor | Chain validates to trusted root or approved cross-signed anchor | [ ] | |
| Certificate validity acceptable | Check certificate validity horizon | Validity window meets policy threshold | [ ] | Example: 30+ days remaining |
| Unauthorized peer rejected | Attempt connection using certificate not authorized for this relationship | TLS handshake fails or application rejects peer | [ ] | Controlled drill |
| Expired certificate rejected | Use expired test certificate or controlled time-window simulation | Connection rejected; alert generated | [ ] | Controlled drill |
| Mismatched identity rejected | Present certificate with non-matching configured identity | Rejected by identity validation path | [ ] | Controlled drill |
| Revocation checked | Verify CRL/OCSP behavior if deployed | Revoked certificate rejected, or revocation control explicitly not in scope | [ ] | Optional if CRL/OCSP not deployed |
2. Authorization Checks
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Allowed exported streams fetch successfully | Use approved fetch interface against an authorized stream | Facts returned without authorization error | [ ] | |
| Disallowed exported streams denied | Use approved fetch interface against a non-authorized stream | Request denied by authorization policy | [ ] | |
| Artifact access respects policy | Request artifact of authorized type | Access succeeds where policy allows | [ ] | Only if artifact exchange enabled |
| Artifact access denied for unauthorized type | Request artifact outside authorized scope | Request denied by authorization policy | [ ] | Only if artifact exchange enabled |
| Connectivity without authorization does not grant access | Establish authenticated connectivity with peer not present in allow-list or not authorized for resource | Application-level denial despite transport success | [ ] | |
| Authorization matrix documented | Review approved configuration or policy artifact | Peer relationship, exported streams, and artifact permissions explicitly recorded | [ ] |
3. Replication and Cursor Checks
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Initial fetch succeeds | Execute first approved fetch after bring-up | Facts returned without replay error | [ ] | |
| Cursor advances only after durable append | Fetch facts, inspect durable local append state, then inspect cursor | Cursor reflects durable local append progress rather than transport receipt alone | [ ] | |
| Cursor monotonic | Record cursor, fetch additional facts, re-query cursor | Cursor does not regress | [ ] | |
| Duplicate replay harmless | Re-fetch overlapping cursor range and verify stable IDs / uniqueness behavior | Duplicate replay does not create duplicate durable effects | [ ] | Do not rely solely on raw count |
| Notification loss does not break correctness | Disable or ignore advisory notification path and rely on polling | Replication continues correctly using receiver-driven fetch | [ ] | Optional where notifications are enabled |
| Replay resumes from durable cursor | Pause fetch, resume later, inspect replay start point | Replay resumes from previously durable cursor position | [ ] | |
| Per-stream cursor independence | Advance one exported stream while leaving another idle | Cursor movement for one stream does not alter another | [ ] | |
| Cursor initialization policy documented | Review cursor policy configuration or approved operator record | Initialization mode recorded per stream with justification where needed | [ ] |
4. Partition Checks
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Local zone continues durable work while peer unavailable | Block peer connectivity in controlled drill and attempt local append | Local durable append succeeds without remote dependency | [ ] | Controlled drill |
| Backlog accumulates for affected peer relationship | Observe backlog metrics during partition | Backlog grows for affected peer relationship without unexpected data loss | [ ] | Controlled drill |
| Reconnect triggers cursor comparison | Restore connectivity and inspect logs or recovery telemetry | Recovery path performs cursor comparison or equivalent reconciliation step | [ ] | Controlled drill |
| Small-gap backfill automatic | Induce short partition and restore | Small recovery gap handled automatically without operator intervention | [ ] | Controlled drill |
| Large-gap handling matches policy | Induce longer partition or seeded gap | Behavior matches configured policy: extended replay, bounded backfill, or operator escalation | [ ] | Controlled drill |
| Incompatible history halts automatic reconciliation | Simulate incompatible cursor/history condition | Automatic reconciliation halts; operator alert raised | [ ] | Controlled drill, lab preferred |
| No destructive re-init in normal recovery | Review continuity of stream/store identity, cursor progression, and operator actions | Recovery preserves existing state; no unplanned reset or re-initialization occurs | [ ] | Logs may support but not prove this |
| Autonomous-mode alert fires | Trigger controlled partition | Partition alert visible | [ ] | Controlled drill |
| Partition-resolved alert clears | Restore connectivity after controlled partition | Resolution alert visible and active partition alert clears | [ ] | Controlled drill |
| Replication lag metric operationally consistent | Compare observed replay delay with lag metric during recovery | Lag signal is directionally and operationally consistent with observed delay | [ ] | Tolerance per local policy |
5. Artifact Checks
Complete this section only if artifact exchange is enabled for the relationship.
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Referenced artifact fetch succeeds when authorized | Fetch artifact using reference carried by a fact | Authorized artifact retrieved successfully | [ ] | |
| Artifact integrity verified | Check content hash, checksum, or content-address match | Retrieved artifact matches integrity reference | [ ] | |
| Missing artifact surfaces retry and alert path | Request known-missing artifact in controlled drill | Failure surfaced cleanly; retry and alert path observable | [ ] | Controlled drill |
| Artifact flow remains distinct from fact replay | Observe transfer behavior and operator telemetry | Artifact retrieval is separately observable from fact replay path | [ ] | Separate endpoint, control path, or telemetry acceptable |
| Large artifact does not block fact replay | Fetch large artifact while observing fact replay | Fact replay continues within expected operating envelope | [ ] | Controlled drill |
| Artifact authorization independent of fact authorization | Test relationship where fact access and artifact access differ | Artifact policy enforced independently where configured | [ ] |
6. Operational Visibility Checks
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Replication lag visible | Dashboard, exporter, or approved operator interface | Lag visible per peer and exported stream | [ ] | |
| Authentication failures visible | Security dashboard, logs, or alert stream | Failed TLS, rejected identity, or denied access visible | [ ] | |
| Partition alerts visible | Alerting system or operations dashboard | Partition-detected and partition-resolved signals visible | [ ] | |
| Backlog metrics visible | Metrics dashboard or approved operator interface | Backlog signals visible per affected relationship | [ ] | |
| Recovery completion visible | Logs, dashboard, or recovery telemetry | Reconciliation start, progress, and completion observable | [ ] | |
| Cursor position queryable | Approved operator interface | Current durable cursor position queryable per peer and stream | [ ] | |
| Authenticated health endpoint responds | Health probe using configured trust and identity path | Health response returned successfully | [ ] | |
| Certificate expiry monitored | Dashboard, script output, or alert source | Certificate validity horizon visible and alertable | [ ] | |
| Fetch rate and error rate visible | Metrics dashboard or approved telemetry | Request rate and error rate visible by peer or stream | [ ] |
7. Federation Variants (If Applicable)
Complete only the subsection(s) applicable to the relationship under review.
7.1 Chain: Enterprise ↔ IDMZ ↔ Plant
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| No direct Enterprise–Plant relationship where prohibited | Attempt direct connection or inspect policy/firewall path | Direct connectivity absent or rejected per design | [ ] | |
| IDMZ mediates correctly | Exercise end-to-end replay through IDMZ | End-to-end flow succeeds through mediated path | [ ] | |
| Latency within approved envelope | Measure hop or end-to-end delay | Observed delay within approved operating envelope | [ ] |
7.2 Hub-and-Spoke
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Hub handles configured number of spokes | Operate configured relationships concurrently | Behavior remains within approved load envelope | [ ] | |
| Spoke isolation preserved | Attempt or inspect direct spoke-to-spoke relationship where prohibited | Non-permitted spoke relationship absent or rejected | [ ] |
7.3 Selective Mesh
| Control | Method | Expected Result | Status | Notes |
|---|---|---|---|---|
| Peer relationships remain independent | Pause one peer relationship while others operate | Unaffected relationships continue normally | [ ] | |
| Partition impact isolated per relationship | Partition one peer in controlled drill | Impact isolated to affected relationship | [ ] | Controlled drill |
8. Acceptance Gate
8.1 Summary
| Category | Checks Passed | Checks Failed | Waived |
|---|---|---|---|
| Identity and Trust | |||
| Authorization | |||
| Replication and Cursor | |||
| Partition | |||
| Artifact | |||
| Operational Visibility | |||
| Federation Variants |
8.2 Critical Failures
List any failed checks that block acceptance.
| Failed Check | Severity | Remediation Required | Owner | Due Date |
|---|---|---|---|---|
8.3 Waivers
List any waived checks with justification.
| Waived Check | Justification | Approved By | Date |
|---|---|---|---|
8.4 Overall Status
Select one:
- Passed — All critical checks passed; federation relationship accepted
- Passed with Exception — Minor issues documented and waived
- Failed — Critical checks failed; remediation required before acceptance
- Requires Escalation — Uncertainty or blocker requires architecture or platform review
8.5 Sign-off
| Role | Name | Date | Signature |
|---|---|---|---|
| Validation Engineer | |||
| Zone A Platform Lead | |||
| Zone B Platform Lead | |||
| Security/Compliance (if required) |
9. Post-Validation Reference
| Document | Purpose |
|---|---|
| Runbook: Cross-Zone Federation Bring-Up | Establish or extend peer relationships |
| Checklist: Triad-HA Commissioning | Validate zones before federation |
| Reference Interaction Pattern: Cross-Zone BSFG Federation | Architecture reference |
| Reference Deployment Pattern: Triad-HA with Keepalived Failover | Intra-zone substrate reference |