FME — Storage Validation Report (hookstreet-memory / D1)
Demanded by ZW-ENGINE-V9 before any search layer. No retrieval is built until this passes. Verified by live queries against the production D1, not from memory. Generated 2026-06-07. KV is now correctly demoted to cache (see hierarchy).
Architecture hierarchy (LOCKED)
Original Artifact (Telegram file / text) ← source of truth, recoverable via original_ref
↓
D1 hookstreet-memory (PERMANENT memory) ← never rolls off
↓
Family Memory Index (memory table) ← searchable catalog (Phase 2 — not yet populated)
↓
KV PLAID_ITEMS:*:thread (CACHE / buffer) ← fast, last 50 msgs / 14 days. NOT the store.
1. Schema (verified live via sqlite_master)
messages(the permanent log):id TEXT PRIMARY KEY·person TEXT NOT NULL(chanie|family|mildred) ·sender TEXT NOT NULL·text TEXT NOT NULL·kind TEXT(text|voice|photo) ·ts INTEGER NOT NULL(epoch ms, UTC) ·created TEXT(ISO 8601, UTC) ·search_text TEXT·original_ref TEXT.- Indexes:
idx_messages_person_ts (person, ts)·idx_messages_search (search_text). memory(the object-model Index — FLB ids, chapter, entities, tags, summary, original_uri, review_state, lifespan…): created, not yet populated (Phase 2 classification). Honest gap, by design.event_log(immutable audit):id INTEGER PK AUTOINCREMENT·ts·kind·ref·note.
2. Write path (single, audited)
Every family/Mildred message → saveToMemory() in ops-api/src/index.ts — one function, called from /chanie/send, /family/send, /mildred/send after the KV write and after the dedup gate.
- Stores id, person, sender, original text, kind, UTC ts+created, search_text (normalized lowercase), original_ref (the Telegram file_id for voice/photo → original audio re-downloadable via the bot).
- Also writes a capture row to event_log per message. ✅ verified: event_log incremented +2 across the two-message test.
3. Failure behavior (verified by design + code)
- The D1 write is wrapped in
try/catchand runs after the message already landed in KV (and the portal/thread). If D1 is down, the message still delivers — it is never lost; only the permanent copy is skipped (re-syncable). ✅ fail-open for delivery. - KV ↔ D1 independence: KV write happens first and is awaited; D1 is best-effort. KV-success/D1-fail → message delivered, permanent copy missed (acceptable). D1-success/KV-fail → KV failure returns earlier (
kv_unbound500) before D1 is reached, so they don't half-commit silently.
4. Duplicate prevention (verified)
- Dedup gate before the D1 write: identical (normalized
from+text) within 12s of the immediately preceding thread message → returns{deduped:true}, never reaches D1. ✅ tested green earlier (back-to-back identical dropped). - Limitation (honest): it compares only the last message, so an interleaved duplicate (A, B, A) isn't caught. The real-world bug (webhook retry / iOS double-send) is always back-to-back, so this covers it. No D1 unique-constraint yet (the gate is the guard).
5. IDs / timestamps / original preservation
- IDs: every row has
id = msg-<ts>-<rand>(unique). ✅ (ThememoryIndex usesFLB-…ids in Phase 2.) - Timestamps:
tsepoch-ms +createdISO, both UTC. ✅ - Original artifact: text → the
textfield is the original. Voice/photo →original_ref = tg-voice:<file_id>, so the original audio is recoverable by re-fetching from Telegram via the bot. ✅ verifiedoriginal_refstored. Caveat: recoverability depends on Telegram retaining the file (bots can re-fetch; retention is long but not contractually forever) — for true forever, a future step copies the audio to Drive/R2.
6. Backup
- Cloudflare D1 Time Travel = automatic point-in-time restore (≈30 days) at the platform level. No manual backup wired yet; a periodic export to R2/Drive is the future belt-and-suspenders.
Verdict
The storage foundation is REAL and durable for the raw message log: verified schema, fail-open writes, dedup, UTC timestamps, search_text, original_ref, and an append-only audit. Open (by design, not bugs): the memory Index (chapters/entities) is unpopulated until Phase 2; kind=voice detection is correct but only production-verified (CLI can't transmit the emoji); original-audio recoverability leans on Telegram retention. Search is NOT built and will not be until classification populates the Index.