Files
workspace/docs/PROMPT-memory-evolution.md
Roberto Musso 3538050e75 memory
2026-04-17 22:48:19 +02:00

23 KiB

RALPH LOOP PROMPT — Memory Subsystem Evolution (MemGPT + Mem0 + Mem0g-light)

How to run:

/ralph-loop "Implement the memory evolution exactly as specified in docs/PROMPT-memory-evolution.md. ALWAYS start each iteration by invoking the /caveman:caveman ultra skill at intensity 'full'. Output <promise>MEMORY EVOLUTION COMPLETE</promise> when all phases pass lint + tests." --max-iterations 40 --completion-promise "MEMORY EVOLUTION COMPLETE"

MANDATORY PER-ITERATION PREAMBLE

Every iteration MUST begin with these two actions, in order:

  1. Activate caveman mode. Invoke the caveman:caveman ultra skill at intensity full before any other tool call. All prose you emit during the iteration must follow caveman rules (drop articles, fragments OK, no filler, no pleasantries). Code/commits/PRs stay normal per caveman plugin rules.
  2. Read this file in full (docs/PROMPT-memory-evolution.md) to re-anchor on the plan.

If caveman already active from prior iteration, re-assert it anyway — ralph loop restarts cold each time.

After preamble:

  1. Inspect repo state: check which tasks already done by reading target files / running grep.
  2. Pick next incomplete task in phase order (Phase 1 → 2 → 3 → 4 → 5). No skipping, no out-of-order.
  3. Implement task.
  4. Run relevant lint + tests for that phase before exit.
  5. When ALL phases complete AND lints + tests green → output <promise>MEMORY EVOLUTION COMPLETE</promise>.

DO NOT implement multiple phases in one iteration unless they are tiny edits in the same file.


LINT + TEST COMMANDS

Run after each phase:

  • Backend lint: cd api && ruff check . --fix
  • Backend tests: cd api && pytest -q
  • Frontend lint: cd adiuvAI && npx eslint . --fix
  • Frontend typecheck: cd adiuvAI && npx tsc --noEmit

SOURCE OF TRUTH

Architectural rationale lives in docs/memory-evolution-strategy.md. This file is the execution plan derived from it. If a conflict appears, the strategy doc wins on why, this doc wins on how.

Zero-trust invariant: all user-content writes/reads go through per-user Fernet in api/app/core/memory_middleware.py. Backend never stores plaintext user content. Embeddings may leak text to OpenAI — already accepted trade-off, documented in privacy policy.

Tier gates live in api/app/billing/tier_manager.py. New capabilities MUST be gated there, not ad-hoc in routes.


WHAT THIS FEATURE DOES

Five goals from the strategy doc, executed in order:

  1. Activate real pgvector on associative tier (replace keyword fallback). Pro+ only.
  2. Mem0-style Extract/Update pipeline post-store_episode. Batch for Free, realtime for Pro+.
  3. relational tier (Mem0g-light): new table memory_relations — person/project/topic graph in Postgres.
  4. Settings > Memory UI in Electron renderer — view/edit core + relational, GDPR forget.
  5. Proactive mining (Power tier only, optional last): scheduled job promotes episodic patterns to proactive.

Architectural anchors already in place (do NOT re-create):

  • MemoryMiddleware.enrich_context injects 4 tiers into orchestrator — extend, not replace.
  • MemoryAssociative.embedding column exists (JSON fallback); swap to pgvector.Vector(1536) in migration.
  • get_llm("gpt-4o-mini", ...) in api/app/core/llm.py is canonical LLM factory.
  • Tier-gating helper: TierManager.has_feature(user, feature) — add new feature enums.

PHASE 1 — pgvector on associative tier (Pro+ gated)

TASK 1.1: Alembic migration — switch memory_associative.embedding to vector(1536)

File: api/alembic/versions/XXX_associative_pgvector.py (new)

Contents:

  • CREATE EXTENSION IF NOT EXISTS vector; (idempotent).
  • ALTER TABLE memory_associative ALTER COLUMN embedding TYPE vector(1536) USING embedding::text::vector; — must handle existing JSON rows. If conversion risky, drop column and re-add: DROP COLUMN embedding; ADD COLUMN embedding vector(1536); (data loss acceptable — keyword fallback still works).
  • Create IVFFlat index: CREATE INDEX memory_associative_embedding_idx ON memory_associative USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
  • downgrade() reverses: drop index, ALTER TYPE ... TYPE jsonb.

Revision id: increment from latest in api/alembic/versions/. Check 004_add_memory_tables.py for style.

Done signal: Migration applies cleanly on a fresh DB: alembic upgrade head exits 0.


TASK 1.2: Update MemoryAssociative.embedding SQLAlchemy column

File: api/app/models.py

Replace:

embedding: Mapped[list | None] = mapped_column(JSON, nullable=True)

with:

from pgvector.sqlalchemy import Vector
...
embedding: Mapped[list | None] = mapped_column(Vector(1536), nullable=True)

Add pgvector>=0.2.5 to api/requirements.txt (or pyproject.toml — check which is authoritative).

Done signal: pgvector import resolves, pytest -q still green on model import.


TASK 1.3: Add TierFeature.REAL_EMBEDDINGS feature flag

File: api/app/billing/tier_manager.py

Add to the feature enum / matrix:

  • REAL_EMBEDDINGS = "real_embeddings" → granted for pro, power, team. Free = False.

Done signal: TierManager.has_feature(user, "real_embeddings") returns correct bool per tier.


TASK 1.4: Embedding helper

File: api/app/core/embeddings.py (new)

async def embed_text(text: str) -> list[float] | None:
    """Call OpenAI text-embedding-3-small. Return None on failure (caller falls back to keyword)."""

Use AsyncOpenAI client (already a dep via LiteLLM). Truncate input to 8000 chars. On any exception log warning + return None — MUST not raise.

Done signal: Unit test test_embed_text_returns_1536_floats passes with mocked client.


TASK 1.5: Wire embeddings into _load_associative + store_associative

File: api/app/core/memory_middleware.py

In _load_associative:

  1. Check user tier via TierManager.has_feature(user, "real_embeddings").
  2. If True → embed_text(message) → if vector not None run:
    SELECT * FROM memory_associative
    WHERE user_id = :uid
    ORDER BY embedding <=> :qvec
    LIMIT :k;
    
    Use SQLAlchemy embedding.cosine_distance(qvec) (pgvector).
  3. Fallback (False or None): keep current keyword-order path.

Add new store_associative(user_id, content) method:

  • Encrypt content with user Fernet.
  • If tier has real_embeddings → compute embedding, store alongside.
  • Else → store with embedding=NULL (still useful for future upgrade).

Done signal: Associative search returns semantically-closer results on a pro test user, keyword-ordered for free user.


TASK 1.6: Phase 1 checks

  • cd api && ruff check . --fix
  • cd api && pytest -q tests/test_memory_middleware.py (create minimal test if absent).
  • Manual smoke: spin up docker compose, insert two associative memories via pro user, query → verify cosine ordering.

Done signal: All three green.


PHASE 2 — Mem0-style Extract/Update pipeline

TASK 2.1: Extraction prompt + schema

File: api/app/core/memory_extraction.py (new)

Define Pydantic models:

class MemoryCandidate(BaseModel):
    type: Literal["fact", "preference", "relation", "routine"]
    content: str              # short canonical statement
    target_tier: Literal["core", "associative", "relational", "proactive"]
    subject: str | None = None     # only for relation
    predicate: str | None = None   # only for relation
    object: str | None = None      # only for relation
    confidence: float = 0.7

class ExtractionResult(BaseModel):
    candidates: list[MemoryCandidate]

Prompt template (system): "You are a memory extractor for a personal AI secretary. Given the last turn + core memory + recent episodes, identify durable facts, preferences, routines, and person/project relations. Output JSON matching the schema. Skip small talk. Max 5 candidates per turn."

Use gpt-4o-mini, temperature=0, response_format={"type": "json_object"}.

Done signal: Calling extract_candidates(last_turn, core, recent) on a fixture returns a valid ExtractionResult.


TASK 2.2: Update decision (ADD / UPDATE / DELETE / NOOP)

File: api/app/core/memory_extraction.py (same file)

async def decide_action(
    candidate: MemoryCandidate,
    existing: list[str],   # plaintext neighbours (top-3 by similarity in target tier)
) -> Literal["ADD", "UPDATE", "DELETE", "NOOP"]:

Uses a second gpt-4o-mini call with small prompt: "Given candidate and existing memories, decide ADD / UPDATE / DELETE / NOOP. Return only the verb."

Heuristic short-circuit: if existing empty → ADD without LLM (save cost).

Done signal: Unit tests for all 4 branches pass with mocked LLM.


TASK 2.3: Pipeline orchestrator

File: api/app/core/memory_extraction.py (same file)

async def run_extraction(
    db: AsyncSession,
    user_id: str,
    last_user_msg: str,
    last_assistant_msg: str,
    session_id: str | None,
) -> None:

Steps:

  1. Load small context: core_memory + last 5 episodes (via middleware helpers).
  2. extract_candidates(...).
  3. For each candidate: similarity-search target tier → top-3 neighbours → decide_action → apply via MemoryMiddleware.update_core / store_associative / (new) upsert_relation / store_proactive.
  4. Log Langfuse trace with trace_id.
  5. MUST not raise — wrap in try/except, log warning.

Done signal: Calling run_extraction on a fake "user said my CFO is Giulia" produces a relation candidate and a core candidate, and writes them.


TASK 2.4: Tier-gated dispatch

File: api/app/core/memory_middleware.py

After store_episode success, dispatch extraction:

  • Pro / Power / Team → schedule realtime task (asyncio.create_task(run_extraction(...)) — fire-and-forget, exceptions swallowed).
  • Free → enqueue a daily-batch marker row (new table extraction_queue(user_id, episode_id, created_at)). A separate cron (Phase 5 stub OK) drains it.

Add TierFeature.REALTIME_EXTRACTION to tier_manager (Free=False).

Done signal: Pro user triggers realtime task (verified via log line); Free user gets queue row.


TASK 2.5: Phase 2 checks

  • cd api && ruff check . --fix
  • cd api && pytest -q tests/test_memory_extraction.py

PHASE 3 — relational tier (Mem0g-light)

TASK 3.1: Alembic migration — memory_relations table

File: api/alembic/versions/XXX_memory_relations.py (new)

CREATE TABLE memory_relations (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id),
  subject_label VARCHAR(128) NOT NULL,        -- canonical label (e.g. "Giulia")
  subject_type VARCHAR(32) NOT NULL,          -- 'person' | 'company' | 'project' | 'topic'
  predicate VARCHAR(64) NOT NULL,             -- 'works_at' | 'reports_to' | 'stakeholder_of' | 'last_contacted_on' | 'owes_followup' | custom
  object_label VARCHAR(128) NOT NULL,
  object_type VARCHAR(32) NOT NULL,
  confidence FLOAT NOT NULL DEFAULT 0.7,
  source_episode_id UUID NULL REFERENCES memory_episodic(id),
  notes_encrypted BYTEA NULL,                 -- Fernet, optional per-user commentary
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  last_confirmed_at TIMESTAMPTZ NULL          -- used by TTL decay
);
CREATE INDEX memory_relations_user_subject_idx ON memory_relations(user_id, subject_label);
CREATE INDEX memory_relations_user_predicate_idx ON memory_relations(user_id, predicate);

Done signal: alembic upgrade head clean.


TASK 3.2: MemoryRelation ORM model

File: api/app/models.py

Mirror the table above. subject_label / object_label are plaintext (entity names — treated as identifiers, not content). notes_encrypted uses Fernet like other tiers.

Done signal: Import of MemoryRelation resolves.


TASK 3.3: Relational middleware methods

File: api/app/core/memory_middleware.py

Add:

  • async def upsert_relation(user_id, subject, subject_type, predicate, object_, object_type, *, confidence=0.7, source_episode_id=None, notes=None) -> None
  • async def query_relations(user_id, subject=None, predicate=None, object_=None, limit=20) -> list[MemoryRelation]
  • Extend enrich_context return dict with key relational_memory — list of short strings "{subject} --{predicate}--> {object}" filtered by recent/confident (top 10).
  • Tier-gate: Free tier → skip (empty list). Pro = base (person/project predicates only). Power = all predicates incl. custom. Use new TierFeature.RELATIONAL_MEMORY.

Done signal: Unit tests: upsert then query returns row; tier gating enforces limits.


TASK 3.4: Orchestrator prompt injection

File: api/app/core/deep_agent.py

Where core_memory / episodic already injected into system prompt, add a new paragraph labelled "Known people & projects:" listing the relational_memory strings. Keep under 800 chars (truncate if longer).

Done signal: Running a turn with seeded relations — agent uses the info (verified via Langfuse trace + test).


TASK 3.5: Hook into extraction pipeline

File: api/app/core/memory_extraction.py

When candidate.type == "relation" → call upsert_relation(...) instead of update_core / store_associative.

Done signal: End-to-end test: turn saying "Marco is the PM on Project Acme" produces a person --stakeholder_of--> project row.


TASK 3.6: TTL + decay job

File: api/app/core/memory_extraction.py (or new memory_maintenance.py)

async def decay_relations(db, user_id) -> None:
    # confidence *= 0.95 every 30 days since last_confirmed_at
    # delete rows with confidence < 0.2

Wire into the same daily batch cron as Free extraction (Phase 5 introduces scheduler — OK to define function now and call it from a stub).

Done signal: Function exists + has unit test on a seeded fixture.


TASK 3.7: Phase 3 checks

  • cd api && ruff check . --fix
  • cd api && pytest -q tests/test_memory_relations.py

PHASE 4 — Settings > Memory UI (Electron renderer)

TASK 4.1: Backend endpoints for UI

File: api/app/api/routes/auth.py (memory sub-section) or new api/app/api/routes/memory.py

Routes (all @require_auth, return user-scoped data only):

  • GET /auth/me/memory/coredict[str, str] (plaintext, decrypted).
  • GET /auth/me/memory/relationallist[RelationOut] (subject/pred/obj/confidence/last_confirmed_at).
  • PATCH /auth/me/memory/relational/{id} → edit label/confidence; body validates predicate ∈ allowed set.
  • DELETE /auth/me/memory/relational/{id} → hard delete (GDPR Art. 17).
  • DELETE /auth/me/memory/core/{key} → remove a core k/v.
  • POST /auth/me/memory/forget-all → wipe all 4 tiers for user; audit log entry. Requires X-Confirm: true header — reject 400 otherwise. Do NOT delete the User row.

Done signal: OpenAPI schema shows all 6 routes; pytest green.


TASK 4.2: tRPC + auth-manager wrappers

File: adiuvAI/src/main/auth/auth-manager.ts + adiuvAI/src/main/router/index.ts

Add auth-manager methods (6) wrapping each HTTP endpoint. Add tRPC procedures in a new memoryRouter merged into app router.

Done signal: trpc.memory.listRelational.useQuery() resolves from renderer.


TASK 4.3: MemorySection settings page

File: adiuvAI/src/renderer/components/settings/MemorySection.tsx (new)

Sections in order:

  1. Core preferences — table of k/v from trpc.memory.getCore. Each row: key, value, edit pencil (inline input), trash icon (deleteCore). Add-row form at bottom.
  2. People & relationships — table of relations. Columns: subject, predicate (select), object, confidence (progress bar), last confirmed (formatted via formatRow). Pencil → edit in drawer. Trash → deleteRelation.
  3. Danger zone — red Card with "Forget everything" button. Confirm dialog (typed "forget" to enable) → calls forgetAll with X-Confirm: true.

Wire into SECTIONS in adiuvAI/src/renderer/components/settings/types.ts as { id: 'memory', label: 'Memory', icon: Brain }. Use Brain from lucide-react.

Free tier gating: if profile.tier === 'free' → relational table hidden with upgrade CTA instead. Use usePlatform() + profile tier check.

Done signal: /settings → Memory tab renders all three sections, edits/deletes round-trip to backend.


TASK 4.4: i18n keys

Add translation keys to all 5 JSON files under namespace settings.memory.*:

  • corePreferences, peopleRelationships, dangerZone, forgetEverything, forgetConfirm, addEntry, noEntries, upgradeToSeePeople.

Keep common.* reuse for save/cancel/delete/edit (already present).

Done signal: All 5 locale files include the new keys.


TASK 4.5: Phase 4 checks

  • cd adiuvAI && npx eslint . --fix
  • cd adiuvAI && npx tsc --noEmit
  • Manual: run npm run start, log in, open Settings > Memory, edit a core key, verify persisted via GET /auth/me memory echo.

PHASE 5 — Proactive mining (Power tier only)

TASK 5.1: Scheduler skeleton

File: api/app/core/memory_maintenance.py

Two entrypoints, callable from a cron runner (APScheduler already a dep — if not, add):

  • drain_extraction_queue() — processes extraction_queue rows (Phase 2.4) for Free tier users, batched.
  • mine_proactive_patterns(user_id) — for Power tier users only. Reads last 30 days episodic, runs a single gpt-4o-mini call: "Identify recurring temporal/behavioral patterns". Writes results to memory_proactive with confidence. Applies decay (conf *= 0.9 per 7 days since last sighting).

Register jobs in app/main.py startup (only if settings.SCHEDULER_ENABLED=True, default True; false in tests).

Done signal: pytest -q green (scheduler disabled). Manual: setting SCHEDULER_ENABLED=True + dev run logs "memory cron tick" every 1h.


TASK 5.2: Surfacing proactive hints

File: api/app/core/deep_agent.py + adiuvAI/src/renderer/components/home/DailyBrief.tsx (if exists)

Backend already injects proactive_hints into prompt (middleware). Confirm still works after changes; add unit test with seeded proactive row → assert string present in final system prompt.

On renderer, if daily brief component exists, show proactive hints as chips under "I noticed…" header. If not, skip — not a regression.

Done signal: System prompt includes proactive line when row exists + confidence ≥ threshold.


TASK 5.3: Tier gate

Add TierFeature.PROACTIVE_MINING to tier_manager — Power + Team only.

Done signal: Free/Pro user → no cron row for them; Power user → mining runs.


TASK 5.4: Phase 5 checks

  • cd api && ruff check . --fix
  • cd api && pytest -q

PHASE 6 — Completion

TASK 6.1: Verify all files exist / modified

New files:

  • api/alembic/versions/*_associative_pgvector.py
  • api/alembic/versions/*_memory_relations.py
  • api/app/core/embeddings.py
  • api/app/core/memory_extraction.py
  • api/app/core/memory_maintenance.py
  • api/app/api/routes/memory.py (or new routes appended in auth.py)
  • adiuvAI/src/renderer/components/settings/MemorySection.tsx

Modified files:

  • api/app/models.py (MemoryAssociative.embedding Vector(1536), MemoryRelation class)
  • api/app/core/memory_middleware.py (real pgvector path, relational methods, enrich_context extended, dispatch extraction after store_episode)
  • api/app/billing/tier_manager.py (REAL_EMBEDDINGS, REALTIME_EXTRACTION, RELATIONAL_MEMORY, PROACTIVE_MINING features)
  • api/app/core/deep_agent.py (relational injection)
  • api/app/main.py (scheduler startup)
  • api/requirements.txt (pgvector, APScheduler)
  • adiuvAI/src/main/auth/auth-manager.ts (6 memory methods)
  • adiuvAI/src/main/router/index.ts (memoryRouter merged)
  • adiuvAI/src/renderer/components/settings/types.ts (memory section entry)
  • adiuvAI/src/renderer/locales/{en,it,es,fr,de}/translation.json (settings.memory.* keys)

TASK 6.2: Full gauntlet

Run all four commands, expect exit 0:

cd api && ruff check . --fix
cd api && pytest -q
cd adiuvAI && npx eslint . --fix
cd adiuvAI && npx tsc --noEmit

TASK 6.3: Output completion promise

If gauntlet green and file checklist complete:

<promise>MEMORY EVOLUTION COMPLETE</promise>

DO NOT

  • Skip the per-iteration caveman preamble — it is part of the contract of this loop.
  • Break zero-trust: never log / return plaintext user content in error paths. Relation subject_label/object_label ARE treated as identifiers — log OK. notes_encrypted never logged.
  • Introduce A-Mem-style retroactive memory rewrites. Explicitly out of scope (strategy doc §3.3).
  • Introduce AutoGPT-style reflective loops. Out of scope.
  • Store format prefs or device-specific UI data in core memory — that's electron-store territory (see PROMPT-onboarding.md for precedent).
  • Use Neo4j or any external graph DB — plain Postgres table is the spec.
  • Call OpenAI embeddings for Free-tier users.
  • Ship proactive mining (Phase 5) before Phase 3 (relational) is green — order matters.
  • Delete user rows in forget-all — only memory rows.
  • Let extraction pipeline or LLM normalization raise into the request path — always try/except, log, swallow.

REFERENCE — Existing patterns to reuse

Pattern Source Reuse for
Fernet per-user enc/dec api/app/core/memory_middleware.py _get_fernet, _safe_decrypt New relational notes_encrypted, extraction writes
LLM factory api/app/core/llm.py get_llm Extraction + normalization + proactive mining
Tier check api/app/billing/tier_manager.py has_feature All tier gates in this plan
Alembic async URL split api/alembic/env.py New migrations
tRPC procedure + authManager wrap adiuvAI/src/main/router/index.ts, auth-manager.ts 6 memory routes
Settings section pattern adiuvAI/src/renderer/components/settings/ProfileSection.tsx MemorySection shape
shadcn table + drawer + confirm Existing Settings sections Memory tables + forget confirm
i18n labelKey pattern See CLAUDE.md i18n section All new strings

CAVEMAN MODE REMINDER

This document's plan is executed under caveman:caveman ultra. Every iteration: activate the skill first, then work. Terse prose in all user-facing text emitted during the loop. Code + commit messages + migration SQL stay normal per caveman plugin boundaries.

If caveman plugin unavailable for any reason, STOP the iteration and report instead of proceeding in default mode — the loop contract requires it.