CLOUD MEMORY LAYER — Flagship Spec
Last updated: 2026-06-25 · LIVING doc (overwrite in place; never rename).
Status: APPROVED-TO-BUILD design. Phase 0 is a HARD GATE — nothing ships through ops-api until it passes.
Owns: the root fix for "my brain is trapped on my PC."
Reconciles 9 facet designs (store/schema · retrieval · API · sync · auto-skills · migration · build-vs-buy · security/privacy · phasing) into ONE buildable plan.
One-line read: the cloud brain already exists in D1 — make it COMPLETE, REACHABLE, and SAFE-to-deploy, in that order.
1. VISION (one paragraph)
Sam's brain — the ~130 .claude topic files + MEMORY.md index + the curated rules in CLAUDE.md — is the most valuable asset in the workspace, and today it is PC-resident: when the laptop is off, the depth is gone, the bot knows titles but not contents, and a new session on another machine starts blind. The fix is not a new store — it is finishing the one Sam already runs. The live D1 database hookstreet-memory (binding MEMORY, id 103ccb68-793a-48b3-97b7-f276f7877a96) already holds a bi-temporal memory catalog (384 rows / 284 current), a working idempotent write path (POST /memory/ingest), a keyword reader (GET /memory/search), and a nightly self-sharpening cron (runNightlyDreaming). The Cloud Memory Layer completes that spine: push the full file BODIES (not just the one-line hooks) into the cloud, add a semantic recall channel beside the exact-token keyword channel, enforce the family/business/Mildred privacy wall in SQL instead of by convention, and harden ops-api so deploying it stops being scary. The result: the PC becomes a writer, not a runtime — Sam's brain answers identically from his phone, the portal, the Telegram bot, and any parallel Claude Code session, with the laptop closed, current, and walled.
2. ARCHITECTURE (diagram-in-words)
ONE store of record, THREE retrieval channels, FOUR assembled tiers, reachable from EVERY surface through ONE Worker.
┌─────────────────── SURFACES (read/write the SAME brain) ──────────────────┐
Claude Code ──write/read──┐ │
(this + parallel sessions) │ Telegram bot ──read/write──┐ Portal (CF Access) ──read──┐ Mildred ──read(scoped)──┐
▼ ▼ ▼ ▼
╔══════════════════════════════════════ ops-api Worker (Cloudflare) ══════════════════════════════════╗
║ WRITE: POST /memory/write (alias: /memory/ingest) · POST /memory/sync · POST /memory/skill ║
║ READ: GET /memory/search (keyword | &mode=hybrid) · GET /recall · GET /memory/body · /skill ║
║ MAINT: POST /memory/consolidate (manual + nightly cron) ║
║ GATE: authOf(req,env) → { scopes[], writer? } — ONE chokepoint, appended to EVERY query ║
╚══════╤═══════════════════════════╤════════════════════════════╤══════════════════════════╤══════════╝
│ │ │ │
┌────────▼─────────┐ ┌──────────▼──────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
│ D1 MEMORY │ │ Vectorize │ │ R2 │ │ KV PLAID_ITEMS │
│ (SYSTEM OF │ │ hookstreet-mem-v1 │ │ hookstreet- │ │ + 14-day hot pad │
│ RECORD) │ │ 768-dim cosine │ │ memory-bodies │ │ (threads) │
│ table `memory` │◄────►│ 1 vector / current │◄────► │ COLD full file │ │ │
│ bi-temporal │ 1:1 │ row; scope mirrored│ body_ │ bodies (depth) │ │ │
│ + scope + body_ │ id │ into metadata │ uri │ │ │ │
│ uri + embed_id │ │ │ │ │ │ │
└──────────────────┘ └─────────────────────┘ └───────────────────┘ └───────────────────┘
▲ ▲
git / .claude files (authoring source; PUSHED up, never deleted) Drive originals (COLD)
2a. STORE (D1 memory — extend in place, never fork)
The memory table is the single fact catalog (the rows ARE the facts — there is no separate physical FACTS table). It already has id, type, chapter, source, visibility, content, status, valid_from, superseded_at + idx_memory_current(status, superseded_at). We add columns additively, NULL-safe, idempotent (WHERE col IS NULL backfill), so all 384 rows and the messages/event_log tables are untouched:
| New column | Purpose | Backfill default |
|---|---|---|
entry_type |
user|feedback|project|reference|fact|skill|loop — mirrors Sam's filename-prefix taxonomy | derived from source/content prefix; profile/context rows → fact |
scope |
business|family|shared|mildred — the cross-domain hard wall (distinct from existing intra-scope visibility) |
business (fail-CLOSED, not the schema's legacy both) |
confidence REAL |
matches the evidence-discipline rule | 0.8 |
source_tier |
fill|confirmation|broker_digest|web_snippet|human|ai | human for curated rows |
superseded_by |
id of the successor → full supersession CHAIN (completes the flat current/retired flag) | NULL |
body_uri |
R2 ref to the full file body (the DEPTH fix) | NULL until M3 |
embed_id |
1:1 Vectorize vector id | NULL until vectors exist |
content_hash |
re-embed trigger | md5(content) |
embedded_at |
nightly re-embed bookkeeping | NULL |
updated_at |
freshness | now |
New indexes: (entry_type, superseded_at), (scope, superseded_at), (superseded_by). M2 adds loop/skill columns (loop_status, owner, bumper, proof_uri) so loops + skills become first-class entry_types in the SAME table — but Action_Queue (the Sheet) stays system-of-record for ACTIVE tasks per #042 §9; memory only mirrors loops ON CLOSE (no dual-write race).
2b. RETRIEVAL (three channels, one assembler)
- Channel 1 — keyword (exists,
GET /memory/search): exact-token LIKE ranker + exact-phrase bonus + recency tiebreak. Non-negotiable floor — Sam's facts are exact tokens (tickers, card last-4, invoice #20028, "Di Masi"); pure vector misses these (the code comment already flags it). - Channel 2 — semantic (new, Vectorize):
@cf/baai/bge-base-en-v1.5(free Workers AI, 768-dim, data stays on Cloudflare). Adds conceptual recall ("how do I collect what I'm owed" → Eden/Asher rows sharing no keyword). - Channel 3 — recency/temporal: existing tiebreak +
as_of=time-travel onvalid_from/superseded_at(Zep-style, no graph DB). - Hybrid merge: keyword ∪ vector, reciprocal-rank-fused, dedup by id, drop superseded.
/memory/searchcontract UNCHANGED — it gains&mode=hybrid; default stays keyword so no caller breaks.
The GET /recall assembler packs a BOUNDED envelope from four tiers in one round-trip (Hermes always-on core + Letta recall/archival + semantic + temporal), to a hard token ceiling (~6K: core 2K / recent 1K / archival 3K) so retrieval never becomes the new context-bloat:
1. Tier 1 CORE (always-on, ≤2K, never searched — just loads): MEMORY.md index + CLAUDE.md hard-rules, versioned core:v<git-sha>, refuses to serve a core older than the last memory edit.
2. Tier 2 RECALL (recent, ~1K): last N messages + last-touched facts (valid_from DESC).
3. Tier 3 ARCHIVAL (query-driven, ~3K): the ~130 files chunked, hybrid keyword+vector.
4. Tier 4 TEMPORAL (on asof): valid_from <= asof AND (superseded_at IS NULL OR superseded_at > asof).
Returns { core, recent[], facts[], asof, budget_used, sources[] } — budget_used makes the bound VISIBLE.
2c. SYNC (git = prose source of truth; D1 = always-on read mirror; direction-scoped, never 3-way merge)
- LOCAL→CLOUD (push, common):
scripts/brain-sync.ps1 pushglobs the files, diffs content hashes vs a gitignored manifest, POSTs only CHANGED files to/memory/ingestwithsource='claude-code:<relpath>',reconcile:truescoped to that source-prefix (so it can NEVER retire aprofile-tabfact). Idempotent ids (mem-md5(source|fact)[:16]) → re-runs stay flat. Token =INBOX_SECRETfrom gitignoredcommand-inbox/.claude-notify.json(no new secret). Trigger = session-close hook + the 03:00 ET Dreaming cron as belt-and-braces. - CLOUD→LOCAL (pull, the un-PC-tie guarantee):
GET /memory/export?source_prefix=claude-code→brain-sync.ps1 pullreconstitutes the files on a fresh PC. Pull is reconstitute-on-empty or explicit--force; local wins on conflict (never auto-overwrite dirty/newer local files). Two PCs editing the same file = git rebase, the way the workspace already handles parallel sessions. - Decision: move the memory files OUT of the blanket
.claude/gitignore into a TRACKED private path (e.g.brain/memory/) so git is the real cross-PC prose-merge layer and D1 is purely the read mirror. (Confirm the path can't be caught by a public-repo workflow.)
2d. API (thin additive extend of what ships today)
| Endpoint | Method | Auth | Role |
|---|---|---|---|
/memory/write (alias /memory/ingest) |
POST | writer (INBOX_SECRET/ops-key) |
single-fact or batch; scope-stamped; idempotent upsert + event_log row |
/memory/search |
GET | tri-auth (ops-key OR token OR portal referer) — but NOT referer-only for family rows | keyword floor + &mode=hybrid |
/recall |
GET | same as search | bounded 4-tier envelope |
/memory/body |
GET | reader | COLD R2 full-body fetch by id |
/memory/consolidate |
POST | master ops-key OR internal cron | server-side merge/dedup/supersede; dryRun returns proposals |
/memory/skill |
POST/GET | writer to register; reader to list/get | auto-skill register (D1 + KV mirror) |
/memory/sync |
POST | master OR INBOX_SECRET |
push/pull convergence; {direction, source, since?, items?} |
/memory/export |
GET | token-gated | file-level rehydration for pull |
Every read forces WHERE scope IN (caller.scopes) via the single authOf() chokepoint. /health extends to report secret presence (boolean only, never values) for the deploy gate.
3. BUILD-vs-BUY DECISION
BUILD — extend the D1 layer Sam already runs. Do NOT adopt Mem0 / Letta / Honcho / Zep. The decision is not close. Reasoning, settled by codebase facts not preference:
- The layer already exists and works.
/memory/ingest(idempotent ids, source-scoped reconcile, append-onlyevent_log),/memory/search(keyword ranker over non-superseded ACTIVE rows), the bi-temporalvalid_from/superseded_atcolumns, and the nightly Dreaming consolidation are already in Sam's code, live, with 284 current rows. That is Mem0's tiers + Zep's temporal supersession + Letta's archival paging — already built. Adopting a SaaS means ripping out a working system to re-solve a solved problem. - Privacy is a HARD blocker, not a preference. Sam's memory is a 3-way family/business/Mildred wall enforced in HIS columns. No managed SaaS models that wall; adopting one ships Chanie's messages, kids' info, card last-4s, and account masks to a vendor DB — tripping the stop-and-ask trigger in
feedback_privacy_guardrails.md. Self-hosting their OSS to dodge that = running Postgres+vector = MORE ops, off-Cloudflare. - Cloudflare-native, already wired. D1 bound, Workers AI bound (free embeddings), KV is the hot pad, CF Access gates the portal, the bot already calls
/memory/search. Vectorize is onewrangler.tomlbinding away in the same stack. A SaaS adds a second runtime, bill, auth surface, and outage domain to a stack whose whole virtue is "one provider, one deploy." - Cost. D1 + Vectorize + Workers AI = ~$0–5/mo at single-user scale (~400 vectors is far under any tier). Mem0 Pro / Letta Cloud / Zep = $20–100+/mo for a plane Sam would still have to privacy-wrap.
BUY exactly ONE narrow thing: the embedding model — Workers AI bge-base (free, on-Cloudflare, data never leaves), with OpenAI text-embedding-3-small as a paid fallback ONLY if recall proves weak and ONLY for non-family/non-finance content. That is the only "buy" that survives the privacy wall.
Override clause: if Sam still wants Mem0, the migration is one POST loop into its API — but it ships family/finance data off-Cloudflare and needs self-hosting to stay compliant, which costs MORE ops than the D1 path. Name it; don't bury it.
4. AUTO-SKILLS + THE LOOP (the Hermes move)
When a hard/novel task completes, the lesson becomes a reusable skill so it sticks — written to BOTH hookstreet-skills/<name>/SKILL.md (HOT reload next Claude Code session) AND a D1 row (entry_type='skill', source='skill:<name>') so the bot/portal can answer "do we have a skill for X" with the PC off. Skills become a tier of the cloud memory layer, not a PC folder.
Ritual (gated, never silent autonomy):
1. Trigger when ≥2 hold: novel non-runbooked work · >3 tool-calls/a dead-end before it worked · a costly regression · reusable next month · Sam says "remember how to do this."
2. Dedup FIRST: grep hookstreet-skills/ AND GET /memory/search?q=topic → if a skill covers it, EDIT it (bump Current-State date), don't spawn a near-dupe.
3. Template: copy an existing SKILL.md (YAML name + ≥8-phrase trigger description, then Architecture / Current-State(dated) / Runbook / Verify / Gotchas / Source-trail).
4. Register in 3 places: one-line MEMORY.md index entry · D1 via /memory/skill (idempotent by slug; re-register bumps version + supersedes) · rebuild the .skill bundle via build.ps1.
5. Gate: Claude proposes → Sam gives a one-line confirm → THEN commit (Rule 9 proof artifact + no-auto-send).
The self-sharpening loop (Phase 5): the existing 03:00 ET Dreaming cron (runNightlyDreaming) extends to (a) re-embed rows whose content_hash changed, (b) walk supersession chains, (c) propose merges/stale-retirements (read-only /rethink-style proposals), (d) consolidate entry_type='skill' rows. It PROPOSES; Sam confirms; nothing mutates silently. The first skill written is cloudflare-deploy-safe (Phase 0's lesson) — so the freeze can be lifted permanently and the next session inherits the discipline.
5. SECURITY / PRIVACY ENFORCEMENT (enforce in SQL, don't trust the browser)
Three holes are open in the code TODAY and MUST close before more brain is pushed in:
1. /memory/search accepts a spoofable REFERER as sufficient auth. A forged Referer header returns Sam's facts to anyone with the URL. → Memory reads require a TOKEN; referer alone is drive-by deflection only, never sole auth for memory.
2. The visibility column exists but NO read path filters on it. It is decorative until the WHERE clause enforces it. → add the scope wall (business/family/shared/mildred) and force AND scope IN (caller.scopes) on every read via one authOf() helper.
3. /memory/ingest stores content verbatim with no inspection. A card number / Schwab client-id / sk-/pk- API key would persist into the brain. → secret-redaction write gate: regex-detect (16-digit card, SSN shape, API-key prefixes, Schwab client-id shape) → REFUSE to store the value, write a redacted breadcrumb (the value stays in PropertiesService / wrangler secret / CONTROL tab per the never-commit-secrets rule).
Per-surface scope matrix (decided ONCE in authOf):
| Credential | scopes | mode |
|---|---|---|
| master ops-key / CF Access (Sam) | business, family, shared, mildred | writer |
INBOX_SECRET (bot/Claude Code via Apps Script) |
business, family, shared | writer — NEVER private/secret |
MILDRED_READ_TOKEN |
business ONLY | read-only — structurally cannot receive family/shared/secret rows even with a forged referer |
| portal referer (Sam-only via CF Access) | business, shared | read-only |
family page ?as=family |
family, shared | read-only |
Wall guarantees: user_private_dates.md is NEVER ingested. Family-walled rows carry scope='family'; the Mildred token's SQL filter makes them unreturnable. Vectorize mirrors scope into vector metadata AND it's re-checked in D1 — so a cosine match can't leak family→business as a side-channel. CF Access path-scoping for Mildred must land too (hs-core.js currently carries the master OPS_READ_TOKEN — MILDRED_SERVER_SCOPING.md L33-45 — or she can lift it; memory privacy is undone if that gate is skipped).
Blast radius if a key leaks: INBOX_SECRET leak is worst (a WRITE key that can poison the brain) → split a dedicated MEMORY_WRITE_TOKEN distinct from the queue INBOX_SECRET so a queue-bot leak can't rewrite memory. OPS_READ_TOKEN leak = read of the whole brain except secret-tier; it's the widest-exposed key (lives in portal assets) → keep it rotation-ready, move it out of static assets in Phase 3. Mildred token leak = only her business-scoped cards. /health exposes booleans only, never values; nothing logs secrets.
6. PHASED ROLLOUT (smallest-reversible-first; Phase 0 = deploy-safety, the HARD gate)
Anti-80% rule: each phase ships, Sam tests, THEN the next begins. STOP after any phase and the system is strictly better, never half-broken. Net loop count trends DOWN.
PHASE 0 — DEPLOY-SAFETY (gates EVERYTHING; ~1 session)
ops-api is FROZEN because a bare wrangler deploy is believed to strand the ~30 write-only secrets (broke prod twice; recovered via rollback 8523bfaf). Prove a safe path before ANY code ships:
- 0a — Capture the manifest: wrangler secret list → ops-api/SECRETS_MANIFEST.md (NAMES only, gitignored, values never read). The recovery sheet.
- 0b — Prove persistence with a no-op: change one comment → wrangler deploy (or wrangler versions upload then versions deploy) → wrangler secret list again → confirm count unchanged. This converts the fear into a tested fact. (Modern wrangler preserves secrets across deploys; the real risk is deploying from a clean checkout lacking vars/bindings, or a service-token reauth.) If any secret IS missing, re-put from a gitignored .dev.vars mirror.
- 0c — Build the rail: scripts/deploy-ops-api-safe.ps1 (Worker analog of tools/svc-deploy): wrangler versions upload (preview, NO traffic) → curl the preview /health → ABORT promotion if any secret-presence boolean or the 5-cron count regressed → only then wrangler versions deploy. Uses CLOUDFLARE_API_TOKEN (the no-reauth path from project_kill_reauth_loop.md).
- 0d — Codify the skill: write cloudflare-deploy-safe (the auto-skill pattern) so the lesson sticks and the freeze lifts permanently.
DONE = no-op deploy verified secret-count-stable + crons stable (5→5) + /health 200, AND the rail exists and refuses a regressing deploy. Reversible: Phase 0 changes ZERO behavior — it only adds a guard.
PHASE 1 — WRITE-THROUGH + DEPTH-OF-HOOKS (the "PC off" win; ~1 session, NO code deploy)
/memory/ingest already exists and already holds 167 claude-code hook rows — Phase 1 makes that push COMPLETE + AUTOMATIC, no new bindings, no Worker change.
- scripts/brain-sync.ps1 push reads MEMORY.md + every memory/*.md, POSTs with source='claude-code:<file>', reconcile:true. Added to session-CLOSE ritual + the Dreaming cron.
- Mirror the 3 living docs the brain needs to answer "where were we": CONTEXT.md latest-2 sessions, SCOPE_BACKLOG.md open loops, CLAUDE.md current-status (source='context-tab'/'scope-tab').
DONE = with the laptop OFF, Sam asks the Telegram bot a fact that lives only in a topic file and the bot returns it from /memory/search. Reversible: pure additive writes; stop the script, nothing breaks.
PHASE 2 — EVERY SURFACE READS THE SAME BRAIN + the scope wall (~1 session)
- Add
scopecolumn (additiveALTER, defaultbusiness, fail-closed) +authOf()chokepoint + the referer-only fix + secret-redaction write gate (the §5 work). - Portal: search box on
home.html→/recall. Parallel sessions: session-OPEN pulls/memory/search?q=<topic>BEFORE answering (complements local-file reads → a fresh/reimaged PC still has the brain). Bot's "remember this" writes back (source='bot-remember').
DONE = one canary fact written from the phone is retrievable identically from bot + portal + a fresh Claude Code session; AND a Mildred-token read of a family row returns empty. Reversible: surfaces are read-only consumers; the wall is additive.
PHASE 3 — DEPTH via R2 (full file BODIES; ~1 session; first BINDING → through the Phase 0 rail)
Add R2 bucket hookstreet-memory-bodies; ingest full bodies keyed body_uri; D1 keeps hook + body_uri + content_hash; GET /memory/body?id= streams COLD. The load-bearing depth fix — the brain stops being shallow.
DONE = bot/portal fetch the FULL text of any topic file (e.g. all of MIS_FSE_ARCHITECTURE reasoning, not its hook) with the PC off; secret + cron counts verified stable through the deploy. Reversible: R2 is a sibling store; drop the binding, hooks still answer.
PHASE 4 — SEMANTIC RECALL (Vectorize hybrid; ~1 session; second binding → the rail)
Bind vectorize (768-dim cosine); embed each chunk via env.AI bge-base; upsert 1:1 by id with scope in metadata. /memory/search gains &mode=hybrid (RRF merge, dedup, scope-filtered in BOTH Vectorize metadata and D1). Falls back to pure keyword if VECTORIZE unbound.
DONE = a paraphrased question ("how do I handle losing positions for taxes") returns the wash-sale fact keyword-only missed, AND a family-scoped fact is provably absent from a business query. Reversible: hybrid is opt-in via &mode; default stays keyword.
PHASE 5 — SELF-SHARPENING LOOP + AUTO-SKILL (ongoing)
Extend the existing Dreaming cron (MULTIPLEX onto 0 7 * * * — never add a 6th cron, free-plan cap is 5) to re-embed changed rows, walk supersession chains, propose merges/retirements (read-only), consolidate skills. Auto-skill-generation per §4.
DONE = a 30-day-old changed fact is found superseded (not stale) without Sam typing anything, and ≥1 auto-generated skill exists. Reversible: Dreaming proposes; Sam confirms.
Migration/lossless guarantees across ALL phases: all 384 D1 rows preserved · every ALTER additive + NULL-safe · backfill idempotent (WHERE col IS NULL) · messages/event_log untouched · the ~130 PC files are PUSHED never moved (authoring source stays on disk + git) · supersession is SOFT (superseded_at, never hard-delete) · git + D1 Time Travel are the dual backstops · nightly wrangler d1 export to R2 = off-region backup.
7. WHAT SHIPS FIRST — "brain works from my phone, PC off"
Phase 0 → Phase 1, in two sessions, is the whole headline deliverable. The unlock is an inversion: the surface that already runs without the PC (the Telegram bot, calling /memory/search on Cloudflare's edge) becomes the read path; the PC's only remaining job is write-through (push the brain up at session-close). Once pushed, the brain is reachable with the laptop closed — no new bindings, no risky deploy (Phase 1 uses the endpoints that already exist).
Concrete first-session sequence:
1. Phase 0b no-op deploy test — prove secrets survive (the single cheapest, highest-leverage action; it may show the freeze was a misdiagnosis and lift it on the spot).
2. scripts/brain-sync.ps1 push — backfill all ~130 files + MEMORY.md + the 3 living docs into D1 via /memory/ingest (idempotent; safe to re-run).
3. Acceptance test (Rule 9 proof artifact): close the laptop → ask the bot 5 known facts whose answers live only in topic files → all 5 return. Capture the result to outputs/ as a dated audit.
That is the moment the root pain — "my brain is trapped on my PC" — retires. Everything after (the scope wall, R2 depth, Vectorize semantics, the self-sharpening loop) deepens and hardens a brain that is, by end of session two, already cloud-resident and phone-reachable.
Source trail
- Spec file:
C:\Users\ztrei\OneDrive\2. Hook Street\05. 2026 BH\docs\CLOUD_MEMORY_LAYER.md - Verified live ground truth (read this session):
ops-api/d1-schema.sql(memory table + bi-temporalvalid_from/superseded_at+idx_memory_current);ops-api/wrangler.toml(bindingMEMORYdb103ccb68-793a-48b3-97b7-f276f7877a96, AI bound, KVPLAID_ITEMS, 5-cron cap + day-of-week trap, NO Vectorize/R2 yet);ops-api/src/index.ts(/memory/ingestL2081 idempotent + source-scoped reconcile ·/memory/searchkeyword ranker ·runNightlyDreamingL387 ·OPS_READ_TOKENL621 ·MILDRED_READ_TOKENL38 · refererokRefgate L632). - Reconciled docs:
docs/PROJECT_042_BRAIN_UNIFICATION.md·docs/FME_OBJECT_MODEL.md·docs/AGENT_STATE_OF_ART_2026.md·docs/MILDRED_SERVER_SCOPING.md·docs/SESSION_HANDOFF.md·outputs/tmp-memory-store-schema-design.md(full DDL). - Memories in scope:
feedback_privacy_guardrails·feedback_build_vs_buy·project_kill_reauth_loop·project_svc_deploy_workflow·feedback_clasp_redeploy_breaks_webapp·feedback_capture_is_not_closure.