23 KiB
RALPH LOOP PROMPT — Memory Subsystem Evolution (MemGPT + Mem0 + Mem0g-light)
How to run:
/ralph-loop "Implement the memory evolution exactly as specified in docs/PROMPT-memory-evolution.md. ALWAYS start each iteration by invoking the /caveman:caveman ultra skill at intensity 'full'. Output <promise>MEMORY EVOLUTION COMPLETE</promise> when all phases pass lint + tests." --max-iterations 40 --completion-promise "MEMORY EVOLUTION COMPLETE"
MANDATORY PER-ITERATION PREAMBLE
Every iteration MUST begin with these two actions, in order:
- Activate caveman mode. Invoke the
caveman:caveman ultraskill at intensityfullbefore any other tool call. All prose you emit during the iteration must follow caveman rules (drop articles, fragments OK, no filler, no pleasantries). Code/commits/PRs stay normal per caveman plugin rules. - Read this file in full (
docs/PROMPT-memory-evolution.md) to re-anchor on the plan.
If caveman already active from prior iteration, re-assert it anyway — ralph loop restarts cold each time.
After preamble:
- Inspect repo state: check which tasks already done by reading target files / running grep.
- Pick next incomplete task in phase order (Phase 1 → 2 → 3 → 4 → 5). No skipping, no out-of-order.
- Implement task.
- Run relevant lint + tests for that phase before exit.
- When ALL phases complete AND lints + tests green → output
<promise>MEMORY EVOLUTION COMPLETE</promise>.
DO NOT implement multiple phases in one iteration unless they are tiny edits in the same file.
LINT + TEST COMMANDS
Run after each phase:
- Backend lint:
cd api && ruff check . --fix - Backend tests:
cd api && pytest -q - Frontend lint:
cd adiuvAI && npx eslint . --fix - Frontend typecheck:
cd adiuvAI && npx tsc --noEmit
SOURCE OF TRUTH
Architectural rationale lives in docs/memory-evolution-strategy.md. This file is the execution plan derived from it. If a conflict appears, the strategy doc wins on why, this doc wins on how.
Zero-trust invariant: all user-content writes/reads go through per-user Fernet in api/app/core/memory_middleware.py. Backend never stores plaintext user content. Embeddings may leak text to OpenAI — already accepted trade-off, documented in privacy policy.
Tier gates live in api/app/billing/tier_manager.py. New capabilities MUST be gated there, not ad-hoc in routes.
WHAT THIS FEATURE DOES
Five goals from the strategy doc, executed in order:
- Activate real pgvector on
associativetier (replace keyword fallback). Pro+ only. - Mem0-style Extract/Update pipeline post-
store_episode. Batch for Free, realtime for Pro+. relationaltier (Mem0g-light): new tablememory_relations— person/project/topic graph in Postgres.- Settings > Memory UI in Electron renderer — view/edit
core+relational, GDPR forget. - Proactive mining (Power tier only, optional last): scheduled job promotes episodic patterns to
proactive.
Architectural anchors already in place (do NOT re-create):
MemoryMiddleware.enrich_contextinjects 4 tiers into orchestrator — extend, not replace.MemoryAssociative.embeddingcolumn exists (JSON fallback); swap topgvector.Vector(1536)in migration.get_llm("gpt-4o-mini", ...)in api/app/core/llm.py is canonical LLM factory.- Tier-gating helper:
TierManager.has_feature(user, feature)— add new feature enums.
PHASE 1 — pgvector on associative tier (Pro+ gated)
TASK 1.1: Alembic migration — switch memory_associative.embedding to vector(1536)
File: api/alembic/versions/XXX_associative_pgvector.py (new)
Contents:
CREATE EXTENSION IF NOT EXISTS vector;(idempotent).ALTER TABLE memory_associative ALTER COLUMN embedding TYPE vector(1536) USING embedding::text::vector;— must handle existing JSON rows. If conversion risky, drop column and re-add:DROP COLUMN embedding; ADD COLUMN embedding vector(1536);(data loss acceptable — keyword fallback still works).- Create IVFFlat index:
CREATE INDEX memory_associative_embedding_idx ON memory_associative USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); downgrade()reverses: drop index,ALTER TYPE ... TYPE jsonb.
Revision id: increment from latest in api/alembic/versions/. Check 004_add_memory_tables.py for style.
Done signal: Migration applies cleanly on a fresh DB: alembic upgrade head exits 0.
TASK 1.2: Update MemoryAssociative.embedding SQLAlchemy column
File: api/app/models.py
Replace:
embedding: Mapped[list | None] = mapped_column(JSON, nullable=True)
with:
from pgvector.sqlalchemy import Vector
...
embedding: Mapped[list | None] = mapped_column(Vector(1536), nullable=True)
Add pgvector>=0.2.5 to api/requirements.txt (or pyproject.toml — check which is authoritative).
Done signal: pgvector import resolves, pytest -q still green on model import.
TASK 1.3: Add TierFeature.REAL_EMBEDDINGS feature flag
File: api/app/billing/tier_manager.py
Add to the feature enum / matrix:
REAL_EMBEDDINGS = "real_embeddings"→ granted forpro,power,team. Free = False.
Done signal: TierManager.has_feature(user, "real_embeddings") returns correct bool per tier.
TASK 1.4: Embedding helper
File: api/app/core/embeddings.py (new)
async def embed_text(text: str) -> list[float] | None:
"""Call OpenAI text-embedding-3-small. Return None on failure (caller falls back to keyword)."""
Use AsyncOpenAI client (already a dep via LiteLLM). Truncate input to 8000 chars. On any exception log warning + return None — MUST not raise.
Done signal: Unit test test_embed_text_returns_1536_floats passes with mocked client.
TASK 1.5: Wire embeddings into _load_associative + store_associative
File: api/app/core/memory_middleware.py
In _load_associative:
- Check user tier via
TierManager.has_feature(user, "real_embeddings"). - If True →
embed_text(message)→ if vector not None run:Use SQLAlchemySELECT * FROM memory_associative WHERE user_id = :uid ORDER BY embedding <=> :qvec LIMIT :k;embedding.cosine_distance(qvec)(pgvector). - Fallback (False or None): keep current keyword-order path.
Add new store_associative(user_id, content) method:
- Encrypt content with user Fernet.
- If tier has real_embeddings → compute embedding, store alongside.
- Else → store with
embedding=NULL(still useful for future upgrade).
Done signal: Associative search returns semantically-closer results on a pro test user, keyword-ordered for free user.
TASK 1.6: Phase 1 checks
cd api && ruff check . --fixcd api && pytest -q tests/test_memory_middleware.py(create minimal test if absent).- Manual smoke: spin up docker compose, insert two associative memories via pro user, query → verify cosine ordering.
Done signal: All three green.
PHASE 2 — Mem0-style Extract/Update pipeline
TASK 2.1: Extraction prompt + schema
File: api/app/core/memory_extraction.py (new)
Define Pydantic models:
class MemoryCandidate(BaseModel):
type: Literal["fact", "preference", "relation", "routine"]
content: str # short canonical statement
target_tier: Literal["core", "associative", "relational", "proactive"]
subject: str | None = None # only for relation
predicate: str | None = None # only for relation
object: str | None = None # only for relation
confidence: float = 0.7
class ExtractionResult(BaseModel):
candidates: list[MemoryCandidate]
Prompt template (system): "You are a memory extractor for a personal AI secretary. Given the last turn + core memory + recent episodes, identify durable facts, preferences, routines, and person/project relations. Output JSON matching the schema. Skip small talk. Max 5 candidates per turn."
Use gpt-4o-mini, temperature=0, response_format={"type": "json_object"}.
Done signal: Calling extract_candidates(last_turn, core, recent) on a fixture returns a valid ExtractionResult.
TASK 2.2: Update decision (ADD / UPDATE / DELETE / NOOP)
File: api/app/core/memory_extraction.py (same file)
async def decide_action(
candidate: MemoryCandidate,
existing: list[str], # plaintext neighbours (top-3 by similarity in target tier)
) -> Literal["ADD", "UPDATE", "DELETE", "NOOP"]:
Uses a second gpt-4o-mini call with small prompt: "Given candidate and existing memories, decide ADD / UPDATE / DELETE / NOOP. Return only the verb."
Heuristic short-circuit: if existing empty → ADD without LLM (save cost).
Done signal: Unit tests for all 4 branches pass with mocked LLM.
TASK 2.3: Pipeline orchestrator
File: api/app/core/memory_extraction.py (same file)
async def run_extraction(
db: AsyncSession,
user_id: str,
last_user_msg: str,
last_assistant_msg: str,
session_id: str | None,
) -> None:
Steps:
- Load small context:
core_memory+ last 5 episodes (via middleware helpers). extract_candidates(...).- For each candidate: similarity-search target tier → top-3 neighbours →
decide_action→ apply viaMemoryMiddleware.update_core/store_associative/ (new)upsert_relation/store_proactive. - Log Langfuse trace with
trace_id. - MUST not raise — wrap in try/except, log warning.
Done signal: Calling run_extraction on a fake "user said my CFO is Giulia" produces a relation candidate and a core candidate, and writes them.
TASK 2.4: Tier-gated dispatch
File: api/app/core/memory_middleware.py
After store_episode success, dispatch extraction:
- Pro / Power / Team → schedule realtime task (
asyncio.create_task(run_extraction(...))— fire-and-forget, exceptions swallowed). - Free → enqueue a daily-batch marker row (new table
extraction_queue(user_id, episode_id, created_at)). A separate cron (Phase 5 stub OK) drains it.
Add TierFeature.REALTIME_EXTRACTION to tier_manager (Free=False).
Done signal: Pro user triggers realtime task (verified via log line); Free user gets queue row.
TASK 2.5: Phase 2 checks
cd api && ruff check . --fixcd api && pytest -q tests/test_memory_extraction.py
PHASE 3 — relational tier (Mem0g-light)
TASK 3.1: Alembic migration — memory_relations table
File: api/alembic/versions/XXX_memory_relations.py (new)
CREATE TABLE memory_relations (
id UUID PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id),
subject_label VARCHAR(128) NOT NULL, -- canonical label (e.g. "Giulia")
subject_type VARCHAR(32) NOT NULL, -- 'person' | 'company' | 'project' | 'topic'
predicate VARCHAR(64) NOT NULL, -- 'works_at' | 'reports_to' | 'stakeholder_of' | 'last_contacted_on' | 'owes_followup' | custom
object_label VARCHAR(128) NOT NULL,
object_type VARCHAR(32) NOT NULL,
confidence FLOAT NOT NULL DEFAULT 0.7,
source_episode_id UUID NULL REFERENCES memory_episodic(id),
notes_encrypted BYTEA NULL, -- Fernet, optional per-user commentary
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_confirmed_at TIMESTAMPTZ NULL -- used by TTL decay
);
CREATE INDEX memory_relations_user_subject_idx ON memory_relations(user_id, subject_label);
CREATE INDEX memory_relations_user_predicate_idx ON memory_relations(user_id, predicate);
Done signal: alembic upgrade head clean.
TASK 3.2: MemoryRelation ORM model
File: api/app/models.py
Mirror the table above. subject_label / object_label are plaintext (entity names — treated as identifiers, not content). notes_encrypted uses Fernet like other tiers.
Done signal: Import of MemoryRelation resolves.
TASK 3.3: Relational middleware methods
File: api/app/core/memory_middleware.py
Add:
async def upsert_relation(user_id, subject, subject_type, predicate, object_, object_type, *, confidence=0.7, source_episode_id=None, notes=None) -> Noneasync def query_relations(user_id, subject=None, predicate=None, object_=None, limit=20) -> list[MemoryRelation]- Extend
enrich_contextreturn dict with keyrelational_memory— list of short strings"{subject} --{predicate}--> {object}"filtered by recent/confident (top 10). - Tier-gate: Free tier → skip (empty list). Pro = base (person/project predicates only). Power = all predicates incl. custom. Use new
TierFeature.RELATIONAL_MEMORY.
Done signal: Unit tests: upsert then query returns row; tier gating enforces limits.
TASK 3.4: Orchestrator prompt injection
File: api/app/core/deep_agent.py
Where core_memory / episodic already injected into system prompt, add a new paragraph labelled "Known people & projects:" listing the relational_memory strings. Keep under 800 chars (truncate if longer).
Done signal: Running a turn with seeded relations — agent uses the info (verified via Langfuse trace + test).
TASK 3.5: Hook into extraction pipeline
File: api/app/core/memory_extraction.py
When candidate.type == "relation" → call upsert_relation(...) instead of update_core / store_associative.
Done signal: End-to-end test: turn saying "Marco is the PM on Project Acme" produces a person --stakeholder_of--> project row.
TASK 3.6: TTL + decay job
File: api/app/core/memory_extraction.py (or new memory_maintenance.py)
async def decay_relations(db, user_id) -> None:
# confidence *= 0.95 every 30 days since last_confirmed_at
# delete rows with confidence < 0.2
Wire into the same daily batch cron as Free extraction (Phase 5 introduces scheduler — OK to define function now and call it from a stub).
Done signal: Function exists + has unit test on a seeded fixture.
TASK 3.7: Phase 3 checks
cd api && ruff check . --fixcd api && pytest -q tests/test_memory_relations.py
PHASE 4 — Settings > Memory UI (Electron renderer)
TASK 4.1: Backend endpoints for UI
File: api/app/api/routes/auth.py (memory sub-section) or new api/app/api/routes/memory.py
Routes (all @require_auth, return user-scoped data only):
GET /auth/me/memory/core→dict[str, str](plaintext, decrypted).GET /auth/me/memory/relational→list[RelationOut](subject/pred/obj/confidence/last_confirmed_at).PATCH /auth/me/memory/relational/{id}→ edit label/confidence; body validates predicate ∈ allowed set.DELETE /auth/me/memory/relational/{id}→ hard delete (GDPR Art. 17).DELETE /auth/me/memory/core/{key}→ remove a core k/v.POST /auth/me/memory/forget-all→ wipe all 4 tiers for user; audit log entry. RequiresX-Confirm: trueheader — reject 400 otherwise. Do NOT delete the User row.
Done signal: OpenAPI schema shows all 6 routes; pytest green.
TASK 4.2: tRPC + auth-manager wrappers
File: adiuvAI/src/main/auth/auth-manager.ts + adiuvAI/src/main/router/index.ts
Add auth-manager methods (6) wrapping each HTTP endpoint. Add tRPC procedures in a new memoryRouter merged into app router.
Done signal: trpc.memory.listRelational.useQuery() resolves from renderer.
TASK 4.3: MemorySection settings page
File: adiuvAI/src/renderer/components/settings/MemorySection.tsx (new)
Sections in order:
- Core preferences — table of k/v from
trpc.memory.getCore. Each row: key, value, edit pencil (inline input), trash icon (deleteCore). Add-row form at bottom. - People & relationships — table of relations. Columns: subject, predicate (select), object, confidence (progress bar), last confirmed (formatted via
formatRow). Pencil → edit in drawer. Trash →deleteRelation. - Danger zone — red Card with "Forget everything" button. Confirm dialog (typed "forget" to enable) → calls
forgetAllwithX-Confirm: true.
Wire into SECTIONS in adiuvAI/src/renderer/components/settings/types.ts as { id: 'memory', label: 'Memory', icon: Brain }. Use Brain from lucide-react.
Free tier gating: if profile.tier === 'free' → relational table hidden with upgrade CTA instead. Use usePlatform() + profile tier check.
Done signal: /settings → Memory tab renders all three sections, edits/deletes round-trip to backend.
TASK 4.4: i18n keys
Add translation keys to all 5 JSON files under namespace settings.memory.*:
corePreferences,peopleRelationships,dangerZone,forgetEverything,forgetConfirm,addEntry,noEntries,upgradeToSeePeople.
Keep common.* reuse for save/cancel/delete/edit (already present).
Done signal: All 5 locale files include the new keys.
TASK 4.5: Phase 4 checks
cd adiuvAI && npx eslint . --fixcd adiuvAI && npx tsc --noEmit- Manual: run
npm run start, log in, open Settings > Memory, edit a core key, verify persisted viaGET /auth/mememory echo.
PHASE 5 — Proactive mining (Power tier only)
TASK 5.1: Scheduler skeleton
File: api/app/core/memory_maintenance.py
Two entrypoints, callable from a cron runner (APScheduler already a dep — if not, add):
drain_extraction_queue()— processesextraction_queuerows (Phase 2.4) for Free tier users, batched.mine_proactive_patterns(user_id)— for Power tier users only. Reads last 30 days episodic, runs a singlegpt-4o-minicall: "Identify recurring temporal/behavioral patterns". Writes results tomemory_proactivewithconfidence. Applies decay (conf *= 0.9 per 7 days since last sighting).
Register jobs in app/main.py startup (only if settings.SCHEDULER_ENABLED=True, default True; false in tests).
Done signal: pytest -q green (scheduler disabled). Manual: setting SCHEDULER_ENABLED=True + dev run logs "memory cron tick" every 1h.
TASK 5.2: Surfacing proactive hints
File: api/app/core/deep_agent.py + adiuvAI/src/renderer/components/home/DailyBrief.tsx (if exists)
Backend already injects proactive_hints into prompt (middleware). Confirm still works after changes; add unit test with seeded proactive row → assert string present in final system prompt.
On renderer, if daily brief component exists, show proactive hints as chips under "I noticed…" header. If not, skip — not a regression.
Done signal: System prompt includes proactive line when row exists + confidence ≥ threshold.
TASK 5.3: Tier gate
Add TierFeature.PROACTIVE_MINING to tier_manager — Power + Team only.
Done signal: Free/Pro user → no cron row for them; Power user → mining runs.
TASK 5.4: Phase 5 checks
cd api && ruff check . --fixcd api && pytest -q
PHASE 6 — Completion
TASK 6.1: Verify all files exist / modified
New files:
api/alembic/versions/*_associative_pgvector.pyapi/alembic/versions/*_memory_relations.pyapi/app/core/embeddings.pyapi/app/core/memory_extraction.pyapi/app/core/memory_maintenance.pyapi/app/api/routes/memory.py(or new routes appended inauth.py)adiuvAI/src/renderer/components/settings/MemorySection.tsx
Modified files:
api/app/models.py(MemoryAssociative.embedding Vector(1536), MemoryRelation class)api/app/core/memory_middleware.py(real pgvector path, relational methods, enrich_context extended, dispatch extraction after store_episode)api/app/billing/tier_manager.py(REAL_EMBEDDINGS, REALTIME_EXTRACTION, RELATIONAL_MEMORY, PROACTIVE_MINING features)api/app/core/deep_agent.py(relational injection)api/app/main.py(scheduler startup)api/requirements.txt(pgvector, APScheduler)adiuvAI/src/main/auth/auth-manager.ts(6 memory methods)adiuvAI/src/main/router/index.ts(memoryRouter merged)adiuvAI/src/renderer/components/settings/types.ts(memory section entry)adiuvAI/src/renderer/locales/{en,it,es,fr,de}/translation.json(settings.memory.* keys)
TASK 6.2: Full gauntlet
Run all four commands, expect exit 0:
cd api && ruff check . --fix
cd api && pytest -q
cd adiuvAI && npx eslint . --fix
cd adiuvAI && npx tsc --noEmit
TASK 6.3: Output completion promise
If gauntlet green and file checklist complete:
<promise>MEMORY EVOLUTION COMPLETE</promise>
DO NOT
- Skip the per-iteration caveman preamble — it is part of the contract of this loop.
- Break zero-trust: never log / return plaintext user content in error paths. Relation
subject_label/object_labelARE treated as identifiers — log OK.notes_encryptednever logged. - Introduce A-Mem-style retroactive memory rewrites. Explicitly out of scope (strategy doc §3.3).
- Introduce AutoGPT-style reflective loops. Out of scope.
- Store format prefs or device-specific UI data in core memory — that's electron-store territory (see PROMPT-onboarding.md for precedent).
- Use Neo4j or any external graph DB — plain Postgres table is the spec.
- Call OpenAI embeddings for Free-tier users.
- Ship proactive mining (Phase 5) before Phase 3 (relational) is green — order matters.
- Delete user rows in
forget-all— only memory rows. - Let extraction pipeline or LLM normalization raise into the request path — always try/except, log, swallow.
REFERENCE — Existing patterns to reuse
| Pattern | Source | Reuse for |
|---|---|---|
| Fernet per-user enc/dec | api/app/core/memory_middleware.py _get_fernet, _safe_decrypt |
New relational notes_encrypted, extraction writes |
| LLM factory | api/app/core/llm.py get_llm |
Extraction + normalization + proactive mining |
| Tier check | api/app/billing/tier_manager.py has_feature |
All tier gates in this plan |
| Alembic async URL split | api/alembic/env.py | New migrations |
| tRPC procedure + authManager wrap | adiuvAI/src/main/router/index.ts, auth-manager.ts | 6 memory routes |
| Settings section pattern | adiuvAI/src/renderer/components/settings/ProfileSection.tsx | MemorySection shape |
| shadcn table + drawer + confirm | Existing Settings sections | Memory tables + forget confirm |
| i18n labelKey pattern | See CLAUDE.md i18n section | All new strings |
CAVEMAN MODE REMINDER
This document's plan is executed under caveman:caveman ultra. Every iteration: activate the skill first, then work. Terse prose in all user-facing text emitted during the loop. Code + commit messages + migration SQL stay normal per caveman plugin boundaries.
If caveman plugin unavailable for any reason, STOP the iteration and report instead of proceeding in default mode — the loop contract requires it.