Research — Frontier Personal-AI Brief (2026)

Recovered 2026-06-07 from the session-42 9-agent web-research sweep, so the #042 build agent has the RAW findings, not just the spec's distillation. Confidence flags: [LIVE] verified 2026 · [VENDOR] self-reported, directional · [BETA] real, not GA · [ROADMAP] reported, not confirmed shipped.

The 5 things that matter

Cloudflare "Agent Memory" [BETA, ~Apr 17 2026] — a managed memory engine: Durable-Object-per-profile + Vectorize-per-profile, 5-channel retrieval (full-text + exact-key + raw-message + vector + HyDE for worded-differently matches), automatic Facts/Events/Instructions/Tasks classification, and supersession chains (versioned fact pointers that retire stale values). Runs on Workers + D1 + Vectorize + Workers AI — Sam's exact stack. This is a managed version of #042; get on the waitlist, copy the architecture regardless.
Cloudflare AI Search [LIVE, ~Apr 16 2026] — SOTA hybrid retrieval as a managed primitive: BM25 + vector fused in one query, RRF fusion, cross-encoder reranking, timestamp-relevance boosting, on Workers/Vectorize/Workers AI. → This is the #042 retrieval layer. Do NOT hand-wire BM25+vector across D1+Vectorize — point an AI Search instance at the memory corpus and get <15s hybrid retrieval with recency-boost (recent facts win — kills "stale beats current").
Nightly "Dreaming"/sleep-time consolidation [LIVE pattern] — background pass that rewrites memory: merge duplicates, replace stale with fresh, resolve contradictions, rewrite future-tense facts to past once their date passes. ChatGPT "Dreaming V3" LIVE Jun 4; Claude "Dreams" [ROADMAP]; Letta's dual-agent (cheap online reader + strong offline consolidator). Key number: append-only memory rots ~93%→49% accuracy in 30 days [VENDOR, mem0] — the consolidator MUST delete + supersede, not just append. (Sam already built this: consolidateProfile_.)
"AI chief of staff that acts first" is a shipping category (Lindy/Vellum/Arahi — cron-fired, send morning briefs before you open the inbox, initiate over Telegram/Slack). Don't buy one — they can't see Schwab/Plaid/STR/LevSMS; Sam owns the hard part (breadth of sources). The only missing piece is the proactive trigger (CF Cron → Worker → Telegram) — ~a day on top of tg.ps1.
"Context rot" [LIVE, Chroma study] — across 18 frontier models, performance degrades as input tokens grow even on trivial tasks, and worse when the query is worded differently from the target fact. Long windows are capacity, not strategy. → tiny hot context (date + top loops + 3–5 retrieved memories), reach for specifics just-in-time. This is the #042 hard rule.

Where Sam is vs the frontier

Capability	Frontier 2026	Sam now	Gap
Memory structure	topic "Memory Files" (Anthropic/Letta)	126 topic files + index + budget rule	Ahead of most — gap = hand-maintained, not self-writing
Self-updating nightly	Dreaming/sleep-time consolidation	manual `.remember` + KB health-check	The real gap — he runs it; it should run on a cron
Retrieval	hybrid BM25+vector+entity, RRF, rerank, recency, <300ms	vector-only + front-loaded context	Behind — pure vector misses exact tokens (tickers, card last-4s, invoice #s); needs the keyword channel
Staleness	bi-temporal validity + supersession chains	SUPERSEDED headers (manual, doc-level)	Behind — needs to be a data field (`valid_from`/`superseded_at`), not a header
Proactive "acts first"	cron-fired agents that initiate	`tg.ps1` (fires on command)	~one day — channel built, trigger missing
Durability	replayable workflows + checkpointing	stateless Apps Script triggers / crons	Behind on reliability — a crash mid-run can double-send or lose state

Honest summary: Sam's architecture instincts are frontier-grade (topic-scoped memory, budget caps, "capture is not closure," never-overwrite-dated-snapshots, privacy partitioning). The gap is mechanization — the patterns he enforces by hand now exist as managed primitives on his exact stack. He's not behind on design; he's behind on automation.

Ranked builds (the research's own order)

The nightly "Dreaming" cron Worker [Effort M] ← START HERE. CF Cron (0 6 * * * or a DO alarm) → reads the day's command-inbox rows + transcript + sheet diffs from D1 → Opus 4.8 (latency-insensitive, 1M context — stuff the whole day in one window) → runs the 4 consolidation ops against the memory store. Wrap in a Planner-Generator-Evaluator loop where the Evaluator enforces the receipts/verify rules before any write. Single highest-leverage build.
Temporal columns on every memory row [Effort S]: valid_from, superseded_at, source. Retrieval prefers non-superseded. (This is #042 Step 1's addition.)
Cloudflare AI Search as the retrieval layer (vs hand-rolling) — the #042 Step 3 decision: AI Search, not D1 FTS, if access allows, because of the keyword+vector+recency fusion the bare D1 LIKE can't match.
Proactive trigger (CF Cron → Worker → Telegram) for the morning spine.

The retrieval decision for #042 Step 3 (the agent's flag)

Recommendation: Cloudflare AI Search (managed hybrid) over hand-rolled D1 FTS. Reason: the research measured that pure vector OR pure keyword each miss cases (vector misses exact tokens like tickers/card-last-4s; keyword misses paraphrase) — AI Search fuses both + reranks + recency-boosts, as a managed primitive on the existing stack. Start with D1 LIKE only if AI Search access isn't available yet, and treat it as a stopgap.

Source trail: session-42 Workflow agi-frontier-sweep-2026 (9 research agents + synthesis, ~1.2M tokens, web-search-verified). Pairs with docs/PROJECT_042_BRAIN_UNIFICATION.md.