בס״ד

CLOUD MEMORY LAYER — Flagship Spec

docs/CLOUD_MEMORY_LAYER.md · last changed (pre-VM history) · rendered from GitHub master

CLOUD MEMORY LAYER — Flagship Spec

Last updated: 2026-06-25 · LIVING doc (overwrite in place; never rename).
Status: APPROVED-TO-BUILD design. Phase 0 is a HARD GATE — nothing ships through ops-api until it passes.
Owns: the root fix for "my brain is trapped on my PC."
Reconciles 9 facet designs (store/schema · retrieval · API · sync · auto-skills · migration · build-vs-buy · security/privacy · phasing) into ONE buildable plan.
One-line read: the cloud brain already exists in D1 — make it COMPLETE, REACHABLE, and SAFE-to-deploy, in that order.


1. VISION (one paragraph)

Sam's brain — the ~130 .claude topic files + MEMORY.md index + the curated rules in CLAUDE.md — is the most valuable asset in the workspace, and today it is PC-resident: when the laptop is off, the depth is gone, the bot knows titles but not contents, and a new session on another machine starts blind. The fix is not a new store — it is finishing the one Sam already runs. The live D1 database hookstreet-memory (binding MEMORY, id 103ccb68-793a-48b3-97b7-f276f7877a96) already holds a bi-temporal memory catalog (384 rows / 284 current), a working idempotent write path (POST /memory/ingest), a keyword reader (GET /memory/search), and a nightly self-sharpening cron (runNightlyDreaming). The Cloud Memory Layer completes that spine: push the full file BODIES (not just the one-line hooks) into the cloud, add a semantic recall channel beside the exact-token keyword channel, enforce the family/business/Mildred privacy wall in SQL instead of by convention, and harden ops-api so deploying it stops being scary. The result: the PC becomes a writer, not a runtime — Sam's brain answers identically from his phone, the portal, the Telegram bot, and any parallel Claude Code session, with the laptop closed, current, and walled.


2. ARCHITECTURE (diagram-in-words)

ONE store of record, THREE retrieval channels, FOUR assembled tiers, reachable from EVERY surface through ONE Worker.

                         ┌─────────────────── SURFACES (read/write the SAME brain) ──────────────────┐
   Claude Code  ──write/read──┐                                                                       │
   (this + parallel sessions) │   Telegram bot ──read/write──┐   Portal (CF Access) ──read──┐   Mildred ──read(scoped)──┐
                              ▼                               ▼                              ▼                          ▼
                    ╔══════════════════════════════════════ ops-api Worker (Cloudflare) ══════════════════════════════════╗
                    ║  WRITE:  POST /memory/write  (alias: /memory/ingest)   ·   POST /memory/sync   ·   POST /memory/skill ║
                    ║  READ:   GET  /memory/search (keyword | &mode=hybrid)  ·   GET /recall  ·  GET /memory/body  · /skill ║
                    ║  MAINT:  POST /memory/consolidate (manual + nightly cron)                                            ║
                    ║  GATE:   authOf(req,env) → { scopes[], writer? }  — ONE chokepoint, appended to EVERY query          ║
                    ╚══════╤═══════════════════════════╤════════════════════════════╤══════════════════════════╤══════════╝
                           │                           │                            │                          │
                  ┌────────▼─────────┐      ┌──────────▼──────────┐       ┌─────────▼─────────┐      ┌─────────▼─────────┐
                  │  D1  MEMORY      │      │  Vectorize          │       │  R2               │      │  KV  PLAID_ITEMS  │
                  │  (SYSTEM OF      │      │  hookstreet-mem-v1  │       │  hookstreet-      │      │  + 14-day hot pad │
                  │   RECORD)        │      │  768-dim cosine     │       │  memory-bodies    │      │  (threads)        │
                  │  table `memory`  │◄────►│  1 vector / current │◄────► │  COLD full file   │      │                   │
                  │  bi-temporal     │ 1:1  │  row; scope mirrored│ body_ │  bodies (depth)   │      │                   │
                  │  + scope + body_ │ id   │  into metadata      │ uri   │                   │      │                   │
                  │  uri + embed_id  │      │                     │       │                   │      │                   │
                  └──────────────────┘      └─────────────────────┘       └───────────────────┘      └───────────────────┘
                           ▲                                                          ▲
                   git / .claude files (authoring source; PUSHED up, never deleted)   Drive originals (COLD)

2a. STORE (D1 memory — extend in place, never fork)

The memory table is the single fact catalog (the rows ARE the facts — there is no separate physical FACTS table). It already has id, type, chapter, source, visibility, content, status, valid_from, superseded_at + idx_memory_current(status, superseded_at). We add columns additively, NULL-safe, idempotent (WHERE col IS NULL backfill), so all 384 rows and the messages/event_log tables are untouched:

New column Purpose Backfill default
entry_type user|feedback|project|reference|fact|skill|loop — mirrors Sam's filename-prefix taxonomy derived from source/content prefix; profile/context rows → fact
scope business|family|shared|mildred — the cross-domain hard wall (distinct from existing intra-scope visibility) business (fail-CLOSED, not the schema's legacy both)
confidence REAL matches the evidence-discipline rule 0.8
source_tier fill|confirmation|broker_digest|web_snippet|human|ai human for curated rows
superseded_by id of the successor → full supersession CHAIN (completes the flat current/retired flag) NULL
body_uri R2 ref to the full file body (the DEPTH fix) NULL until M3
embed_id 1:1 Vectorize vector id NULL until vectors exist
content_hash re-embed trigger md5(content)
embedded_at nightly re-embed bookkeeping NULL
updated_at freshness now

New indexes: (entry_type, superseded_at), (scope, superseded_at), (superseded_by). M2 adds loop/skill columns (loop_status, owner, bumper, proof_uri) so loops + skills become first-class entry_types in the SAME table — but Action_Queue (the Sheet) stays system-of-record for ACTIVE tasks per #042 §9; memory only mirrors loops ON CLOSE (no dual-write race).

2b. RETRIEVAL (three channels, one assembler)

The GET /recall assembler packs a BOUNDED envelope from four tiers in one round-trip (Hermes always-on core + Letta recall/archival + semantic + temporal), to a hard token ceiling (~6K: core 2K / recent 1K / archival 3K) so retrieval never becomes the new context-bloat:
1. Tier 1 CORE (always-on, ≤2K, never searched — just loads): MEMORY.md index + CLAUDE.md hard-rules, versioned core:v<git-sha>, refuses to serve a core older than the last memory edit.
2. Tier 2 RECALL (recent, ~1K): last N messages + last-touched facts (valid_from DESC).
3. Tier 3 ARCHIVAL (query-driven, ~3K): the ~130 files chunked, hybrid keyword+vector.
4. Tier 4 TEMPORAL (on asof): valid_from <= asof AND (superseded_at IS NULL OR superseded_at > asof).

Returns { core, recent[], facts[], asof, budget_used, sources[] }budget_used makes the bound VISIBLE.

2c. SYNC (git = prose source of truth; D1 = always-on read mirror; direction-scoped, never 3-way merge)

2d. API (thin additive extend of what ships today)

Endpoint Method Auth Role
/memory/write (alias /memory/ingest) POST writer (INBOX_SECRET/ops-key) single-fact or batch; scope-stamped; idempotent upsert + event_log row
/memory/search GET tri-auth (ops-key OR token OR portal referer) — but NOT referer-only for family rows keyword floor + &mode=hybrid
/recall GET same as search bounded 4-tier envelope
/memory/body GET reader COLD R2 full-body fetch by id
/memory/consolidate POST master ops-key OR internal cron server-side merge/dedup/supersede; dryRun returns proposals
/memory/skill POST/GET writer to register; reader to list/get auto-skill register (D1 + KV mirror)
/memory/sync POST master OR INBOX_SECRET push/pull convergence; {direction, source, since?, items?}
/memory/export GET token-gated file-level rehydration for pull

Every read forces WHERE scope IN (caller.scopes) via the single authOf() chokepoint. /health extends to report secret presence (boolean only, never values) for the deploy gate.


3. BUILD-vs-BUY DECISION

BUILD — extend the D1 layer Sam already runs. Do NOT adopt Mem0 / Letta / Honcho / Zep. The decision is not close. Reasoning, settled by codebase facts not preference:

  1. The layer already exists and works. /memory/ingest (idempotent ids, source-scoped reconcile, append-only event_log), /memory/search (keyword ranker over non-superseded ACTIVE rows), the bi-temporal valid_from/superseded_at columns, and the nightly Dreaming consolidation are already in Sam's code, live, with 284 current rows. That is Mem0's tiers + Zep's temporal supersession + Letta's archival paging — already built. Adopting a SaaS means ripping out a working system to re-solve a solved problem.
  2. Privacy is a HARD blocker, not a preference. Sam's memory is a 3-way family/business/Mildred wall enforced in HIS columns. No managed SaaS models that wall; adopting one ships Chanie's messages, kids' info, card last-4s, and account masks to a vendor DB — tripping the stop-and-ask trigger in feedback_privacy_guardrails.md. Self-hosting their OSS to dodge that = running Postgres+vector = MORE ops, off-Cloudflare.
  3. Cloudflare-native, already wired. D1 bound, Workers AI bound (free embeddings), KV is the hot pad, CF Access gates the portal, the bot already calls /memory/search. Vectorize is one wrangler.toml binding away in the same stack. A SaaS adds a second runtime, bill, auth surface, and outage domain to a stack whose whole virtue is "one provider, one deploy."
  4. Cost. D1 + Vectorize + Workers AI = ~$0–5/mo at single-user scale (~400 vectors is far under any tier). Mem0 Pro / Letta Cloud / Zep = $20–100+/mo for a plane Sam would still have to privacy-wrap.

BUY exactly ONE narrow thing: the embedding model — Workers AI bge-base (free, on-Cloudflare, data never leaves), with OpenAI text-embedding-3-small as a paid fallback ONLY if recall proves weak and ONLY for non-family/non-finance content. That is the only "buy" that survives the privacy wall.

Override clause: if Sam still wants Mem0, the migration is one POST loop into its API — but it ships family/finance data off-Cloudflare and needs self-hosting to stay compliant, which costs MORE ops than the D1 path. Name it; don't bury it.


4. AUTO-SKILLS + THE LOOP (the Hermes move)

When a hard/novel task completes, the lesson becomes a reusable skill so it sticks — written to BOTH hookstreet-skills/<name>/SKILL.md (HOT reload next Claude Code session) AND a D1 row (entry_type='skill', source='skill:<name>') so the bot/portal can answer "do we have a skill for X" with the PC off. Skills become a tier of the cloud memory layer, not a PC folder.

Ritual (gated, never silent autonomy):
1. Trigger when ≥2 hold: novel non-runbooked work · >3 tool-calls/a dead-end before it worked · a costly regression · reusable next month · Sam says "remember how to do this."
2. Dedup FIRST: grep hookstreet-skills/ AND GET /memory/search?q=topic → if a skill covers it, EDIT it (bump Current-State date), don't spawn a near-dupe.
3. Template: copy an existing SKILL.md (YAML name + ≥8-phrase trigger description, then Architecture / Current-State(dated) / Runbook / Verify / Gotchas / Source-trail).
4. Register in 3 places: one-line MEMORY.md index entry · D1 via /memory/skill (idempotent by slug; re-register bumps version + supersedes) · rebuild the .skill bundle via build.ps1.
5. Gate: Claude proposes → Sam gives a one-line confirm → THEN commit (Rule 9 proof artifact + no-auto-send).

The self-sharpening loop (Phase 5): the existing 03:00 ET Dreaming cron (runNightlyDreaming) extends to (a) re-embed rows whose content_hash changed, (b) walk supersession chains, (c) propose merges/stale-retirements (read-only /rethink-style proposals), (d) consolidate entry_type='skill' rows. It PROPOSES; Sam confirms; nothing mutates silently. The first skill written is cloudflare-deploy-safe (Phase 0's lesson) — so the freeze can be lifted permanently and the next session inherits the discipline.


5. SECURITY / PRIVACY ENFORCEMENT (enforce in SQL, don't trust the browser)

Three holes are open in the code TODAY and MUST close before more brain is pushed in:
1. /memory/search accepts a spoofable REFERER as sufficient auth. A forged Referer header returns Sam's facts to anyone with the URL. → Memory reads require a TOKEN; referer alone is drive-by deflection only, never sole auth for memory.
2. The visibility column exists but NO read path filters on it. It is decorative until the WHERE clause enforces it. → add the scope wall (business/family/shared/mildred) and force AND scope IN (caller.scopes) on every read via one authOf() helper.
3. /memory/ingest stores content verbatim with no inspection. A card number / Schwab client-id / sk-/pk- API key would persist into the brain. → secret-redaction write gate: regex-detect (16-digit card, SSN shape, API-key prefixes, Schwab client-id shape) → REFUSE to store the value, write a redacted breadcrumb (the value stays in PropertiesService / wrangler secret / CONTROL tab per the never-commit-secrets rule).

Per-surface scope matrix (decided ONCE in authOf):

Credential scopes mode
master ops-key / CF Access (Sam) business, family, shared, mildred writer
INBOX_SECRET (bot/Claude Code via Apps Script) business, family, shared writer — NEVER private/secret
MILDRED_READ_TOKEN business ONLY read-only — structurally cannot receive family/shared/secret rows even with a forged referer
portal referer (Sam-only via CF Access) business, shared read-only
family page ?as=family family, shared read-only

Wall guarantees: user_private_dates.md is NEVER ingested. Family-walled rows carry scope='family'; the Mildred token's SQL filter makes them unreturnable. Vectorize mirrors scope into vector metadata AND it's re-checked in D1 — so a cosine match can't leak family→business as a side-channel. CF Access path-scoping for Mildred must land too (hs-core.js currently carries the master OPS_READ_TOKEN — MILDRED_SERVER_SCOPING.md L33-45 — or she can lift it; memory privacy is undone if that gate is skipped).

Blast radius if a key leaks: INBOX_SECRET leak is worst (a WRITE key that can poison the brain) → split a dedicated MEMORY_WRITE_TOKEN distinct from the queue INBOX_SECRET so a queue-bot leak can't rewrite memory. OPS_READ_TOKEN leak = read of the whole brain except secret-tier; it's the widest-exposed key (lives in portal assets) → keep it rotation-ready, move it out of static assets in Phase 3. Mildred token leak = only her business-scoped cards. /health exposes booleans only, never values; nothing logs secrets.


6. PHASED ROLLOUT (smallest-reversible-first; Phase 0 = deploy-safety, the HARD gate)

Anti-80% rule: each phase ships, Sam tests, THEN the next begins. STOP after any phase and the system is strictly better, never half-broken. Net loop count trends DOWN.

PHASE 0 — DEPLOY-SAFETY (gates EVERYTHING; ~1 session)

ops-api is FROZEN because a bare wrangler deploy is believed to strand the ~30 write-only secrets (broke prod twice; recovered via rollback 8523bfaf). Prove a safe path before ANY code ships:
- 0a — Capture the manifest: wrangler secret listops-api/SECRETS_MANIFEST.md (NAMES only, gitignored, values never read). The recovery sheet.
- 0b — Prove persistence with a no-op: change one comment → wrangler deploy (or wrangler versions upload then versions deploy) → wrangler secret list again → confirm count unchanged. This converts the fear into a tested fact. (Modern wrangler preserves secrets across deploys; the real risk is deploying from a clean checkout lacking vars/bindings, or a service-token reauth.) If any secret IS missing, re-put from a gitignored .dev.vars mirror.
- 0c — Build the rail: scripts/deploy-ops-api-safe.ps1 (Worker analog of tools/svc-deploy): wrangler versions upload (preview, NO traffic) → curl the preview /health → ABORT promotion if any secret-presence boolean or the 5-cron count regressed → only then wrangler versions deploy. Uses CLOUDFLARE_API_TOKEN (the no-reauth path from project_kill_reauth_loop.md).
- 0d — Codify the skill: write cloudflare-deploy-safe (the auto-skill pattern) so the lesson sticks and the freeze lifts permanently.

DONE = no-op deploy verified secret-count-stable + crons stable (5→5) + /health 200, AND the rail exists and refuses a regressing deploy. Reversible: Phase 0 changes ZERO behavior — it only adds a guard.

PHASE 1 — WRITE-THROUGH + DEPTH-OF-HOOKS (the "PC off" win; ~1 session, NO code deploy)

/memory/ingest already exists and already holds 167 claude-code hook rows — Phase 1 makes that push COMPLETE + AUTOMATIC, no new bindings, no Worker change.
- scripts/brain-sync.ps1 push reads MEMORY.md + every memory/*.md, POSTs with source='claude-code:<file>', reconcile:true. Added to session-CLOSE ritual + the Dreaming cron.
- Mirror the 3 living docs the brain needs to answer "where were we": CONTEXT.md latest-2 sessions, SCOPE_BACKLOG.md open loops, CLAUDE.md current-status (source='context-tab'/'scope-tab').

DONE = with the laptop OFF, Sam asks the Telegram bot a fact that lives only in a topic file and the bot returns it from /memory/search. Reversible: pure additive writes; stop the script, nothing breaks.

PHASE 2 — EVERY SURFACE READS THE SAME BRAIN + the scope wall (~1 session)

DONE = one canary fact written from the phone is retrievable identically from bot + portal + a fresh Claude Code session; AND a Mildred-token read of a family row returns empty. Reversible: surfaces are read-only consumers; the wall is additive.

PHASE 3 — DEPTH via R2 (full file BODIES; ~1 session; first BINDING → through the Phase 0 rail)

Add R2 bucket hookstreet-memory-bodies; ingest full bodies keyed body_uri; D1 keeps hook + body_uri + content_hash; GET /memory/body?id= streams COLD. The load-bearing depth fix — the brain stops being shallow.

DONE = bot/portal fetch the FULL text of any topic file (e.g. all of MIS_FSE_ARCHITECTURE reasoning, not its hook) with the PC off; secret + cron counts verified stable through the deploy. Reversible: R2 is a sibling store; drop the binding, hooks still answer.

PHASE 4 — SEMANTIC RECALL (Vectorize hybrid; ~1 session; second binding → the rail)

Bind vectorize (768-dim cosine); embed each chunk via env.AI bge-base; upsert 1:1 by id with scope in metadata. /memory/search gains &mode=hybrid (RRF merge, dedup, scope-filtered in BOTH Vectorize metadata and D1). Falls back to pure keyword if VECTORIZE unbound.

DONE = a paraphrased question ("how do I handle losing positions for taxes") returns the wash-sale fact keyword-only missed, AND a family-scoped fact is provably absent from a business query. Reversible: hybrid is opt-in via &mode; default stays keyword.

PHASE 5 — SELF-SHARPENING LOOP + AUTO-SKILL (ongoing)

Extend the existing Dreaming cron (MULTIPLEX onto 0 7 * * *never add a 6th cron, free-plan cap is 5) to re-embed changed rows, walk supersession chains, propose merges/retirements (read-only), consolidate skills. Auto-skill-generation per §4.

DONE = a 30-day-old changed fact is found superseded (not stale) without Sam typing anything, and ≥1 auto-generated skill exists. Reversible: Dreaming proposes; Sam confirms.

Migration/lossless guarantees across ALL phases: all 384 D1 rows preserved · every ALTER additive + NULL-safe · backfill idempotent (WHERE col IS NULL) · messages/event_log untouched · the ~130 PC files are PUSHED never moved (authoring source stays on disk + git) · supersession is SOFT (superseded_at, never hard-delete) · git + D1 Time Travel are the dual backstops · nightly wrangler d1 export to R2 = off-region backup.


7. WHAT SHIPS FIRST — "brain works from my phone, PC off"

Phase 0 → Phase 1, in two sessions, is the whole headline deliverable. The unlock is an inversion: the surface that already runs without the PC (the Telegram bot, calling /memory/search on Cloudflare's edge) becomes the read path; the PC's only remaining job is write-through (push the brain up at session-close). Once pushed, the brain is reachable with the laptop closed — no new bindings, no risky deploy (Phase 1 uses the endpoints that already exist).

Concrete first-session sequence:
1. Phase 0b no-op deploy test — prove secrets survive (the single cheapest, highest-leverage action; it may show the freeze was a misdiagnosis and lift it on the spot).
2. scripts/brain-sync.ps1 push — backfill all ~130 files + MEMORY.md + the 3 living docs into D1 via /memory/ingest (idempotent; safe to re-run).
3. Acceptance test (Rule 9 proof artifact): close the laptop → ask the bot 5 known facts whose answers live only in topic files → all 5 return. Capture the result to outputs/ as a dated audit.

That is the moment the root pain — "my brain is trapped on my PC" — retires. Everything after (the scope wall, R2 depth, Vectorize semantics, the self-sharpening loop) deepens and hardens a brain that is, by end of session two, already cloud-resident and phone-reachable.


Source trail

Source trail · docs/CLOUD_MEMORY_LAYER.md @ master · rendered 2026-07-02 7:23 PM EDT by scripts/build-docs.py · the .md in the repo is the truth; this page is the phone-readable view