FME — Failure Grid (LOCKED)
Message/asset loss = system failure. Every failure path degrades to "captured raw," never to "lost." Locked 2026-06-05 (ZW-ENGINE-V9).
The #1 rule — capture never blocks
Order is ALWAYS: Capture → Save → Acknowledge, THEN (async) Classify → File → Confirm.
- Nothing is ever rejected. A bad classifier can NEVER break intake.
- Worst case an item's chapter = Unfiled.
- NEVER REQUIRE FILING — the human never picks a chapter. One message / voice / photo → done.
Failure decision grid
| Failure | Behavior |
|---|---|
| Telegram down | save to thread, retry (queue the outbound) |
| Cloudflare KV unavailable | queue locally, replay when back; never drop |
| Claude (summary) down | store raw transcript / raw text only; skip the summary, mark AI_UNREVIEWED |
| Whisper (transcribe) down | forward the raw audio file; keep original_uri |
| Drive down | store a pending-attachment marker + retry; keep the Telegram file_id |
| Bot token revoked / deploy fail | alert Sam (tg.ps1 -Source System) |
| Worker restart mid-request | idempotent writes (dedup by update_id / FLB-id); replay |
Invariants
- Every inbound writes the original before any processing.
- Every write is idempotent (dedup by Telegram
update_idand/orFLB-id). - A failed enrichment (summary/classify/transcribe) NEVER fails the capture — it just leaves the enriched fields empty +
review_state: AI_UNREVIEWED. - All of the above is exercised in the Phase 1A test plan (kill KV / Telegram briefly → message still captured raw).