Concept

BSFG Artifact Handling

Managing large binaries in zone-local object stores

Out-of-Band Storage Pattern

Large binary artifacts — PDFs, images, batch files, archives, documents — are not embedded inline in BSFG facts. Instead, they are stored out-of-band in zone-local Object Store buckets, and facts carry references to those artifacts.

Why out-of-band?

  • Replay efficiency: Fact streams remain compact; replay latency does not balloon with artifact size.
  • Separate lifecycle: Artifacts can have different retention, access control, and redundancy policies than facts.
  • Zone-local durability: Each zone owns its own object store; no central repository at the boundary.
  • Provenance: Facts remain the authoritative record; artifacts are referenced, not embedded.

Write Sequence: PutObject Then AppendFact

The producer must follow a strict sequence:

  1. PutObject — upload the artifact to the zone-local object store
  2. Wait for acknowledgment — confirm durable storage
  3. AppendFact — create a fact that references the artifact
PutObject(bucket, key, blob) → {digest, size}

AppendFact({
  envelope: {...},
  fact: {
    subject: "document:ABC123",
    predicate: "has_artifact",
    object_json: {
      "bucket": "batch-files",
      "key": "order-2026-03-06-001.pdf",
      "digest": "sha256:abc...def",
      "size": 2048576,
      "media_type": "application/pdf"
    }
  }
})
    

Important: The artifact must exist before the fact is appended. A missing artifact at retrieval time is a producer defect.

Artifact Reference Fields

A fact that references an artifact includes these metadata fields in the object_json:

Field Type Purpose
bucket string Object store bucket name. Scoped to zone and subject kind.
key string Object key within the bucket. Example: order-2026-03-06-001.pdf
digest string Content hash for integrity verification. Format: sha256:<hex>
size integer Artifact size in bytes. Aids consumer planning (bandwidth, storage).
media_type string MIME type. Example: application/pdf, image/png, application/zip.
file_name string (optional) Suggested filename for consumer UI or download. Not authoritative.

Bucket Layout and Organization

Object store buckets are organized by subject kind, reflecting the domain model:

  • batch-files — batch orders, recipes, batch records
  • asset-files — CAD drawings, equipment manuals, specifications
  • alarm-files — historical alarm logs, trend captures
  • document-files — general documents, reports, certificates
  • lot-files — lot traceability records, genealogy
  • recipe-files — process recipes, instructions

This layout allows zone operators to:

  • Apply per-bucket retention policies (e.g., batch-files retained 10 years, alarms 3 months)
  • Manage access control by subject kind (e.g., recipe-files restricted to authorized personnel)
  • Monitor storage usage by domain

Immutability Rule

Once a fact references an artifact via its digest, the artifact becomes immutable. The producer must not:

  • Overwrite the artifact
  • Delete the artifact
  • Modify the artifact in place

If the producer needs to correct or update an artifact, it must:

  1. Upload the corrected artifact as a new object (new key or new digest)
  2. Create a new fact that references the new artifact
  3. Optionally emit a correction fact: predicate: "was_corrected_by", referencing the new artifact

This immutability constraint ensures that the fact-to-artifact link is durable and auditable.

Consumer Artifact Retrieval

Consumers retrieve artifacts by:

  1. Read fact: Extract the artifact reference fields from object_json
  2. Verify digest: (optional) hash the retrieved artifact and confirm it matches the recorded digest
  3. Retrieve artifact: call GetObject(bucket, key) from the zone-local object store
  4. Handle unavailability: if the object is not found, retry (it may be in transit or temporarily unavailable)

Important: Treat artifact unavailability as transient, not permanent. Retry before discarding or alerting.

Artifact Lifecycle Responsibilities

Producer Responsibilities:

  • Upload artifact before appending the fact
  • Ensure artifact persists for the configured TTL (typically 7+ days)
  • Do not modify or delete the artifact once fact-addressed

Consumer Responsibilities:

  • Retrieve artifacts using the reference metadata
  • Verify digest if requiring high integrity assurance
  • Tolerate transient unavailability (retry)
  • Store or process the artifact before the fact is aged out or truncated

Zone Operator Responsibilities:

  • Configure per-bucket TTL and retention policies
  • Monitor object store capacity and trigger cleanup when thresholds are reached
  • Ensure replication or backup for critical artifact buckets

Orphaned Artifacts and Cleanup

If a fact is never appended (producer crash after PutObject, or upload race condition), the object may remain in the store orphaned. Zone operators should:

  • Implement a periodic scan to identify objects not referenced by any fact
  • Delete orphaned objects after a grace period (e.g., 1 day)
  • Alert if orphan volume exceeds threshold (indicates producer defect)