# RALPH LOOP PROMPT — Memory Subsystem Evolution (MemGPT + Mem0 + Mem0g-light) > **How to run:** > ``` > /ralph-loop "Implement the memory evolution exactly as specified in docs/PROMPT-memory-evolution.md. ALWAYS start each iteration by invoking the /caveman:caveman ultra skill at intensity 'full'. Output MEMORY EVOLUTION COMPLETE when all phases pass lint + tests." --max-iterations 40 --completion-promise "MEMORY EVOLUTION COMPLETE" > ``` --- ## MANDATORY PER-ITERATION PREAMBLE **Every iteration MUST begin with these two actions, in order:** 1. **Activate caveman mode.** Invoke the `caveman:caveman ultra` skill at intensity `full` before any other tool call. All prose you emit during the iteration must follow caveman rules (drop articles, fragments OK, no filler, no pleasantries). Code/commits/PRs stay normal per caveman plugin rules. 2. **Read this file in full** (`docs/PROMPT-memory-evolution.md`) to re-anchor on the plan. If caveman already active from prior iteration, re-assert it anyway — ralph loop restarts cold each time. After preamble: 3. Inspect repo state: check which tasks already done by reading target files / running grep. 4. Pick next incomplete task in phase order (Phase 1 → 2 → 3 → 4 → 5). No skipping, no out-of-order. 5. Implement task. 6. Run relevant lint + tests for that phase before exit. 7. When ALL phases complete AND lints + tests green → output `MEMORY EVOLUTION COMPLETE`. **DO NOT** implement multiple phases in one iteration unless they are tiny edits in the same file. --- ## LINT + TEST COMMANDS Run after each phase: - Backend lint: `cd api && ruff check . --fix` - Backend tests: `cd api && pytest -q` - Frontend lint: `cd adiuvAI && npx eslint . --fix` - Frontend typecheck: `cd adiuvAI && npx tsc --noEmit` --- ## SOURCE OF TRUTH Architectural rationale lives in [docs/memory-evolution-strategy.md](docs/memory-evolution-strategy.md). This file is the execution plan derived from it. If a conflict appears, the strategy doc wins on *why*, this doc wins on *how*. **Zero-trust invariant:** all user-content writes/reads go through per-user Fernet in [api/app/core/memory_middleware.py](api/app/core/memory_middleware.py). Backend never stores plaintext user content. Embeddings may leak text to OpenAI — already accepted trade-off, documented in privacy policy. **Tier gates** live in [api/app/billing/tier_manager.py](api/app/billing/tier_manager.py). New capabilities MUST be gated there, not ad-hoc in routes. --- ## WHAT THIS FEATURE DOES Five goals from the strategy doc, executed in order: 1. **Activate real pgvector** on `associative` tier (replace keyword fallback). Pro+ only. 2. **Mem0-style Extract/Update pipeline** post-`store_episode`. Batch for Free, realtime for Pro+. 3. **`relational` tier (Mem0g-light)**: new table `memory_relations` — person/project/topic graph in Postgres. 4. **Settings > Memory UI** in Electron renderer — view/edit `core` + `relational`, GDPR forget. 5. **Proactive mining** (Power tier only, optional last): scheduled job promotes episodic patterns to `proactive`. **Architectural anchors already in place** (do NOT re-create): - `MemoryMiddleware.enrich_context` injects 4 tiers into orchestrator — extend, not replace. - `MemoryAssociative.embedding` column exists (JSON fallback); swap to `pgvector.Vector(1536)` in migration. - `get_llm("gpt-4o-mini", ...)` in [api/app/core/llm.py](api/app/core/llm.py) is canonical LLM factory. - Tier-gating helper: `TierManager.has_feature(user, feature)` — add new feature enums. --- ## PHASE 1 — pgvector on associative tier (Pro+ gated) ### TASK 1.1: Alembic migration — switch `memory_associative.embedding` to `vector(1536)` **File:** `api/alembic/versions/XXX_associative_pgvector.py` (new) Contents: - `CREATE EXTENSION IF NOT EXISTS vector;` (idempotent). - `ALTER TABLE memory_associative ALTER COLUMN embedding TYPE vector(1536) USING embedding::text::vector;` — must handle existing JSON rows. If conversion risky, drop column and re-add: `DROP COLUMN embedding; ADD COLUMN embedding vector(1536);` (data loss acceptable — keyword fallback still works). - Create IVFFlat index: `CREATE INDEX memory_associative_embedding_idx ON memory_associative USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);` - `downgrade()` reverses: drop index, `ALTER TYPE ... TYPE jsonb`. Revision id: increment from latest in `api/alembic/versions/`. Check `004_add_memory_tables.py` for style. **Done signal:** Migration applies cleanly on a fresh DB: `alembic upgrade head` exits 0. --- ### TASK 1.2: Update `MemoryAssociative.embedding` SQLAlchemy column **File:** [api/app/models.py](api/app/models.py) Replace: ```python embedding: Mapped[list | None] = mapped_column(JSON, nullable=True) ``` with: ```python from pgvector.sqlalchemy import Vector ... embedding: Mapped[list | None] = mapped_column(Vector(1536), nullable=True) ``` Add `pgvector>=0.2.5` to `api/requirements.txt` (or `pyproject.toml` — check which is authoritative). **Done signal:** `pgvector` import resolves, `pytest -q` still green on model import. --- ### TASK 1.3: Add `TierFeature.REAL_EMBEDDINGS` feature flag **File:** `api/app/billing/tier_manager.py` Add to the feature enum / matrix: - `REAL_EMBEDDINGS = "real_embeddings"` → granted for `pro`, `power`, `team`. Free = False. **Done signal:** `TierManager.has_feature(user, "real_embeddings")` returns correct bool per tier. --- ### TASK 1.4: Embedding helper **File:** `api/app/core/embeddings.py` (new) ```python async def embed_text(text: str) -> list[float] | None: """Call OpenAI text-embedding-3-small. Return None on failure (caller falls back to keyword).""" ``` Use `AsyncOpenAI` client (already a dep via LiteLLM). Truncate input to 8000 chars. On any exception log warning + return None — MUST not raise. **Done signal:** Unit test `test_embed_text_returns_1536_floats` passes with mocked client. --- ### TASK 1.5: Wire embeddings into `_load_associative` + `store_associative` **File:** [api/app/core/memory_middleware.py](api/app/core/memory_middleware.py) In `_load_associative`: 1. Check user tier via `TierManager.has_feature(user, "real_embeddings")`. 2. If True → `embed_text(message)` → if vector not None run: ```sql SELECT * FROM memory_associative WHERE user_id = :uid ORDER BY embedding <=> :qvec LIMIT :k; ``` Use SQLAlchemy `embedding.cosine_distance(qvec)` (pgvector). 3. Fallback (False or None): keep current keyword-order path. Add new `store_associative(user_id, content)` method: - Encrypt content with user Fernet. - If tier has real_embeddings → compute embedding, store alongside. - Else → store with `embedding=NULL` (still useful for future upgrade). **Done signal:** Associative search returns semantically-closer results on a pro test user, keyword-ordered for free user. --- ### TASK 1.6: Phase 1 checks - `cd api && ruff check . --fix` - `cd api && pytest -q tests/test_memory_middleware.py` (create minimal test if absent). - Manual smoke: spin up docker compose, insert two associative memories via pro user, query → verify cosine ordering. **Done signal:** All three green. --- ## PHASE 2 — Mem0-style Extract/Update pipeline ### TASK 2.1: Extraction prompt + schema **File:** `api/app/core/memory_extraction.py` (new) Define Pydantic models: ```python class MemoryCandidate(BaseModel): type: Literal["fact", "preference", "relation", "routine"] content: str # short canonical statement target_tier: Literal["core", "associative", "relational", "proactive"] subject: str | None = None # only for relation predicate: str | None = None # only for relation object: str | None = None # only for relation confidence: float = 0.7 class ExtractionResult(BaseModel): candidates: list[MemoryCandidate] ``` Prompt template (system): "You are a memory extractor for a personal AI secretary. Given the last turn + core memory + recent episodes, identify durable facts, preferences, routines, and person/project relations. Output JSON matching the schema. Skip small talk. Max 5 candidates per turn." Use `gpt-4o-mini`, `temperature=0`, `response_format={"type": "json_object"}`. **Done signal:** Calling `extract_candidates(last_turn, core, recent)` on a fixture returns a valid `ExtractionResult`. --- ### TASK 2.2: Update decision (ADD / UPDATE / DELETE / NOOP) **File:** `api/app/core/memory_extraction.py` (same file) ```python async def decide_action( candidate: MemoryCandidate, existing: list[str], # plaintext neighbours (top-3 by similarity in target tier) ) -> Literal["ADD", "UPDATE", "DELETE", "NOOP"]: ``` Uses a second `gpt-4o-mini` call with small prompt: "Given candidate and existing memories, decide ADD / UPDATE / DELETE / NOOP. Return only the verb." Heuristic short-circuit: if `existing` empty → ADD without LLM (save cost). **Done signal:** Unit tests for all 4 branches pass with mocked LLM. --- ### TASK 2.3: Pipeline orchestrator **File:** `api/app/core/memory_extraction.py` (same file) ```python async def run_extraction( db: AsyncSession, user_id: str, last_user_msg: str, last_assistant_msg: str, session_id: str | None, ) -> None: ``` Steps: 1. Load small context: `core_memory` + last 5 episodes (via middleware helpers). 2. `extract_candidates(...)`. 3. For each candidate: similarity-search target tier → top-3 neighbours → `decide_action` → apply via `MemoryMiddleware.update_core` / `store_associative` / (new) `upsert_relation` / `store_proactive`. 4. Log Langfuse trace with `trace_id`. 5. MUST not raise — wrap in try/except, log warning. **Done signal:** Calling `run_extraction` on a fake "user said my CFO is Giulia" produces a relation candidate and a core candidate, and writes them. --- ### TASK 2.4: Tier-gated dispatch **File:** [api/app/core/memory_middleware.py](api/app/core/memory_middleware.py) After `store_episode` success, dispatch extraction: - Pro / Power / Team → schedule realtime task (`asyncio.create_task(run_extraction(...))` — fire-and-forget, exceptions swallowed). - Free → enqueue a daily-batch marker row (new table `extraction_queue(user_id, episode_id, created_at)`). A separate cron (Phase 5 stub OK) drains it. Add `TierFeature.REALTIME_EXTRACTION` to tier_manager (Free=False). **Done signal:** Pro user triggers realtime task (verified via log line); Free user gets queue row. --- ### TASK 2.5: Phase 2 checks - `cd api && ruff check . --fix` - `cd api && pytest -q tests/test_memory_extraction.py` --- ## PHASE 3 — `relational` tier (Mem0g-light) ### TASK 3.1: Alembic migration — `memory_relations` table **File:** `api/alembic/versions/XXX_memory_relations.py` (new) ```sql CREATE TABLE memory_relations ( id UUID PRIMARY KEY, user_id UUID NOT NULL REFERENCES users(id), subject_label VARCHAR(128) NOT NULL, -- canonical label (e.g. "Giulia") subject_type VARCHAR(32) NOT NULL, -- 'person' | 'company' | 'project' | 'topic' predicate VARCHAR(64) NOT NULL, -- 'works_at' | 'reports_to' | 'stakeholder_of' | 'last_contacted_on' | 'owes_followup' | custom object_label VARCHAR(128) NOT NULL, object_type VARCHAR(32) NOT NULL, confidence FLOAT NOT NULL DEFAULT 0.7, source_episode_id UUID NULL REFERENCES memory_episodic(id), notes_encrypted BYTEA NULL, -- Fernet, optional per-user commentary created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), last_confirmed_at TIMESTAMPTZ NULL -- used by TTL decay ); CREATE INDEX memory_relations_user_subject_idx ON memory_relations(user_id, subject_label); CREATE INDEX memory_relations_user_predicate_idx ON memory_relations(user_id, predicate); ``` **Done signal:** `alembic upgrade head` clean. --- ### TASK 3.2: `MemoryRelation` ORM model **File:** [api/app/models.py](api/app/models.py) Mirror the table above. `subject_label` / `object_label` are **plaintext** (entity names — treated as identifiers, not content). `notes_encrypted` uses Fernet like other tiers. **Done signal:** Import of `MemoryRelation` resolves. --- ### TASK 3.3: Relational middleware methods **File:** [api/app/core/memory_middleware.py](api/app/core/memory_middleware.py) Add: - `async def upsert_relation(user_id, subject, subject_type, predicate, object_, object_type, *, confidence=0.7, source_episode_id=None, notes=None) -> None` - `async def query_relations(user_id, subject=None, predicate=None, object_=None, limit=20) -> list[MemoryRelation]` - Extend `enrich_context` return dict with key `relational_memory` — list of short strings `"{subject} --{predicate}--> {object}"` filtered by recent/confident (top 10). - Tier-gate: Free tier → skip (empty list). Pro = base (person/project predicates only). Power = all predicates incl. custom. Use new `TierFeature.RELATIONAL_MEMORY`. **Done signal:** Unit tests: upsert then query returns row; tier gating enforces limits. --- ### TASK 3.4: Orchestrator prompt injection **File:** `api/app/core/deep_agent.py` Where `core_memory` / `episodic` already injected into system prompt, add a new paragraph labelled **"Known people & projects:"** listing the `relational_memory` strings. Keep under 800 chars (truncate if longer). **Done signal:** Running a turn with seeded relations — agent uses the info (verified via Langfuse trace + test). --- ### TASK 3.5: Hook into extraction pipeline **File:** `api/app/core/memory_extraction.py` When `candidate.type == "relation"` → call `upsert_relation(...)` instead of `update_core` / `store_associative`. **Done signal:** End-to-end test: turn saying "Marco is the PM on Project Acme" produces a `person --stakeholder_of--> project` row. --- ### TASK 3.6: TTL + decay job **File:** `api/app/core/memory_extraction.py` (or new `memory_maintenance.py`) ```python async def decay_relations(db, user_id) -> None: # confidence *= 0.95 every 30 days since last_confirmed_at # delete rows with confidence < 0.2 ``` Wire into the same daily batch cron as Free extraction (Phase 5 introduces scheduler — OK to define function now and call it from a stub). **Done signal:** Function exists + has unit test on a seeded fixture. --- ### TASK 3.7: Phase 3 checks - `cd api && ruff check . --fix` - `cd api && pytest -q tests/test_memory_relations.py` --- ## PHASE 4 — Settings > Memory UI (Electron renderer) ### TASK 4.1: Backend endpoints for UI **File:** `api/app/api/routes/auth.py` (memory sub-section) or new `api/app/api/routes/memory.py` Routes (all `@require_auth`, return user-scoped data only): - `GET /auth/me/memory/core` → `dict[str, str]` (plaintext, decrypted). - `GET /auth/me/memory/relational` → `list[RelationOut]` (subject/pred/obj/confidence/last_confirmed_at). - `PATCH /auth/me/memory/relational/{id}` → edit label/confidence; body validates predicate ∈ allowed set. - `DELETE /auth/me/memory/relational/{id}` → hard delete (GDPR Art. 17). - `DELETE /auth/me/memory/core/{key}` → remove a core k/v. - `POST /auth/me/memory/forget-all` → wipe all 4 tiers for user; audit log entry. Requires `X-Confirm: true` header — reject 400 otherwise. Do NOT delete the User row. **Done signal:** OpenAPI schema shows all 6 routes; pytest green. --- ### TASK 4.2: tRPC + auth-manager wrappers **File:** [adiuvAI/src/main/auth/auth-manager.ts](adiuvAI/src/main/auth/auth-manager.ts) + [adiuvAI/src/main/router/index.ts](adiuvAI/src/main/router/index.ts) Add auth-manager methods (6) wrapping each HTTP endpoint. Add tRPC procedures in a new `memoryRouter` merged into app router. **Done signal:** `trpc.memory.listRelational.useQuery()` resolves from renderer. --- ### TASK 4.3: `MemorySection` settings page **File:** `adiuvAI/src/renderer/components/settings/MemorySection.tsx` (new) Sections in order: 1. **Core preferences** — table of k/v from `trpc.memory.getCore`. Each row: key, value, edit pencil (inline input), trash icon (`deleteCore`). Add-row form at bottom. 2. **People & relationships** — table of relations. Columns: subject, predicate (select), object, confidence (progress bar), last confirmed (formatted via `formatRow`). Pencil → edit in drawer. Trash → `deleteRelation`. 3. **Danger zone** — red Card with "Forget everything" button. Confirm dialog (typed "forget" to enable) → calls `forgetAll` with `X-Confirm: true`. Wire into `SECTIONS` in [adiuvAI/src/renderer/components/settings/types.ts](adiuvAI/src/renderer/components/settings/types.ts) as `{ id: 'memory', label: 'Memory', icon: Brain }`. Use `Brain` from `lucide-react`. **Free tier gating:** if `profile.tier === 'free'` → relational table hidden with upgrade CTA instead. Use `usePlatform()` + profile tier check. **Done signal:** `/settings` → Memory tab renders all three sections, edits/deletes round-trip to backend. --- ### TASK 4.4: i18n keys Add translation keys to all 5 JSON files under namespace `settings.memory.*`: - `corePreferences`, `peopleRelationships`, `dangerZone`, `forgetEverything`, `forgetConfirm`, `addEntry`, `noEntries`, `upgradeToSeePeople`. Keep `common.*` reuse for `save`/`cancel`/`delete`/`edit` (already present). **Done signal:** All 5 locale files include the new keys. --- ### TASK 4.5: Phase 4 checks - `cd adiuvAI && npx eslint . --fix` - `cd adiuvAI && npx tsc --noEmit` - Manual: run `npm run start`, log in, open Settings > Memory, edit a core key, verify persisted via `GET /auth/me` memory echo. --- ## PHASE 5 — Proactive mining (Power tier only) ### TASK 5.1: Scheduler skeleton **File:** `api/app/core/memory_maintenance.py` Two entrypoints, callable from a cron runner (APScheduler already a dep — if not, add): - `drain_extraction_queue()` — processes `extraction_queue` rows (Phase 2.4) for Free tier users, batched. - `mine_proactive_patterns(user_id)` — for Power tier users only. Reads last 30 days episodic, runs a single `gpt-4o-mini` call: "Identify recurring temporal/behavioral patterns". Writes results to `memory_proactive` with `confidence`. Applies decay (conf *= 0.9 per 7 days since last sighting). Register jobs in `app/main.py` startup (only if `settings.SCHEDULER_ENABLED=True`, default True; false in tests). **Done signal:** `pytest -q` green (scheduler disabled). Manual: setting `SCHEDULER_ENABLED=True` + dev run logs "memory cron tick" every 1h. --- ### TASK 5.2: Surfacing proactive hints **File:** `api/app/core/deep_agent.py` + `adiuvAI/src/renderer/components/home/DailyBrief.tsx` (if exists) Backend already injects `proactive_hints` into prompt (middleware). Confirm still works after changes; add unit test with seeded proactive row → assert string present in final system prompt. On renderer, if daily brief component exists, show proactive hints as chips under "I noticed…" header. If not, skip — not a regression. **Done signal:** System prompt includes proactive line when row exists + confidence ≥ threshold. --- ### TASK 5.3: Tier gate Add `TierFeature.PROACTIVE_MINING` to tier_manager — Power + Team only. **Done signal:** Free/Pro user → no cron row for them; Power user → mining runs. --- ### TASK 5.4: Phase 5 checks - `cd api && ruff check . --fix` - `cd api && pytest -q` --- ## PHASE 6 — Completion ### TASK 6.1: Verify all files exist / modified New files: - [ ] `api/alembic/versions/*_associative_pgvector.py` - [ ] `api/alembic/versions/*_memory_relations.py` - [ ] `api/app/core/embeddings.py` - [ ] `api/app/core/memory_extraction.py` - [ ] `api/app/core/memory_maintenance.py` - [ ] `api/app/api/routes/memory.py` (or new routes appended in `auth.py`) - [ ] `adiuvAI/src/renderer/components/settings/MemorySection.tsx` Modified files: - [ ] `api/app/models.py` (MemoryAssociative.embedding Vector(1536), MemoryRelation class) - [ ] `api/app/core/memory_middleware.py` (real pgvector path, relational methods, enrich_context extended, dispatch extraction after store_episode) - [ ] `api/app/billing/tier_manager.py` (REAL_EMBEDDINGS, REALTIME_EXTRACTION, RELATIONAL_MEMORY, PROACTIVE_MINING features) - [ ] `api/app/core/deep_agent.py` (relational injection) - [ ] `api/app/main.py` (scheduler startup) - [ ] `api/requirements.txt` (pgvector, APScheduler) - [ ] `adiuvAI/src/main/auth/auth-manager.ts` (6 memory methods) - [ ] `adiuvAI/src/main/router/index.ts` (memoryRouter merged) - [ ] `adiuvAI/src/renderer/components/settings/types.ts` (memory section entry) - [ ] `adiuvAI/src/renderer/locales/{en,it,es,fr,de}/translation.json` (settings.memory.* keys) ### TASK 6.2: Full gauntlet Run all four commands, expect exit 0: ```bash cd api && ruff check . --fix cd api && pytest -q cd adiuvAI && npx eslint . --fix cd adiuvAI && npx tsc --noEmit ``` ### TASK 6.3: Output completion promise If gauntlet green and file checklist complete: ``` MEMORY EVOLUTION COMPLETE ``` --- ## DO NOT - Skip the per-iteration caveman preamble — it is part of the contract of this loop. - Break zero-trust: never log / return plaintext user content in error paths. Relation `subject_label`/`object_label` ARE treated as identifiers — log OK. `notes_encrypted` never logged. - Introduce A-Mem-style retroactive memory rewrites. Explicitly out of scope (strategy doc §3.3). - Introduce AutoGPT-style reflective loops. Out of scope. - Store format prefs or device-specific UI data in core memory — that's electron-store territory (see PROMPT-onboarding.md for precedent). - Use Neo4j or any external graph DB — plain Postgres table is the spec. - Call OpenAI embeddings for Free-tier users. - Ship proactive mining (Phase 5) before Phase 3 (relational) is green — order matters. - Delete user rows in `forget-all` — only memory rows. - Let extraction pipeline or LLM normalization raise into the request path — always try/except, log, swallow. --- ## REFERENCE — Existing patterns to reuse | Pattern | Source | Reuse for | |---------|--------|-----------| | Fernet per-user enc/dec | [api/app/core/memory_middleware.py](api/app/core/memory_middleware.py) `_get_fernet`, `_safe_decrypt` | New relational `notes_encrypted`, extraction writes | | LLM factory | [api/app/core/llm.py](api/app/core/llm.py) `get_llm` | Extraction + normalization + proactive mining | | Tier check | `api/app/billing/tier_manager.py` `has_feature` | All tier gates in this plan | | Alembic async URL split | [api/alembic/env.py](api/alembic/env.py) | New migrations | | tRPC procedure + authManager wrap | [adiuvAI/src/main/router/index.ts](adiuvAI/src/main/router/index.ts), [auth-manager.ts](adiuvAI/src/main/auth/auth-manager.ts) | 6 memory routes | | Settings section pattern | [adiuvAI/src/renderer/components/settings/ProfileSection.tsx](adiuvAI/src/renderer/components/settings/ProfileSection.tsx) | MemorySection shape | | shadcn table + drawer + confirm | Existing Settings sections | Memory tables + forget confirm | | i18n labelKey pattern | See CLAUDE.md i18n section | All new strings | --- ## CAVEMAN MODE REMINDER This document's plan is executed **under caveman:caveman ultra**. Every iteration: activate the skill first, then work. Terse prose in all user-facing text emitted during the loop. Code + commit messages + migration SQL stay normal per caveman plugin boundaries. If caveman plugin unavailable for any reason, STOP the iteration and report instead of proceeding in default mode — the loop contract requires it.