first commit

This commit is contained in:
Roberto Musso
2026-04-08 22:55:08 +02:00
commit 1f1ce7d40e
20 changed files with 2531 additions and 0 deletions

185
.claude/CLAUDE.md Normal file
View File

@@ -0,0 +1,185 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Keeping This File Up to Date
Update this file whenever a lesson is learned during development. Specifically, update CLAUDE.md when:
- A non-obvious architectural decision is made or discovered
- A gotcha, footgun, or surprising behavior is encountered (and the fix/workaround)
- A new command, workflow, or tool is added to the project
- A convention is established that isn't obvious from reading the code
- An integration detail is clarified (e.g., how the WebSocket protocol actually behaves, edge cases in the agent tool call cycle)
Do **not** add things already derivable from reading the code, generic best practices, or ephemeral task notes — only durable, reusable knowledge.
## Repository Layout
This repo contains two independent projects:
- **`adiuva/`** — Electron desktop app (TypeScript/React)
- **`adiuva-api/`** — FastAPI backend (Python)
---
## adiuva (Electron App)
### Commands
```bash
npm run start # Start dev server (Electron + Vite)
npm run lint # ESLint
npm run knip # Dead code analysis
npm run make # Build installers (Windows/Linux/macOS)
npm run package # Package without creating installers
```
Database schema changes require running Drizzle Kit — check `package.json` for db commands.
### Architecture
```
Renderer (React 19 + TanStack Router)
↓ tRPC over contextBridge
Main Process (Electron)
├── SQLite (better-sqlite3 + Drizzle ORM) — local app data
├── LanceDB — local vector embeddings
└── WebSocket client → adiuva-api backend
```
**IPC model**: The renderer calls tRPC procedures defined in `src/main/router/`. The preload script (`src/preload/`) bridges them with `contextIsolation: true`.
**Backend integration**: The Electron main process connects to the FastAPI backend via WebSocket. The backend sends tool calls (e.g., `insert`, `vector_search`) which the main process executes against local SQLite via Drizzle and returns results. All AI intelligence lives on the backend — the app is a smart terminal.
**Key source directories**:
- `src/main/agents/` — Agent scheduler
- `src/main/ai/` — Orchestrator, token management
- `src/main/db/` — SQLite schema (Drizzle) + LanceDB
- `src/main/router/` — tRPC router (all IPC procedures)
- `src/renderer/components/` — UI components (tasks, notes, projects, timeline, auth)
- `src/renderer/routes/` — TanStack Router pages
- `src/shared/` — Zod schemas shared between main/renderer (WebSocket frame types, casing utils)
**Path aliases** (tsconfig): `@/*``src/renderer/`, `@shared/*``src/shared/`
**WebSocket frame types** are defined in `src/shared/api-types.ts` using Zod. Client sends: `chat_request`, `floating_request`, `tool_result`. Server sends: `text_chunk`, `tool_call`, `final`, `ping`.
---
## adiuva-api (FastAPI Backend)
### Commands
```bash
# Development
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Production
gunicorn app.main:app -k uvicorn.workers.UvicornWorker -w 4 --timeout 120
# Database migrations
alembic upgrade head
# Testing
pytest
pytest -v
pytest tests/test_agents.py # single test file
# Linting/formatting
ruff check .
ruff format .
# Docker (full stack)
docker compose up --build
```
### Architecture
```
FastAPI app (app/main.py)
├── Middleware: RateLimiter → Sanitizer → CORS
├── HTTP Routes (app/api/routes/)
│ ├── auth.py — register, login, token refresh
│ ├── chat.py — POST /chat, POST /chat/embed, WS /chat/stream
│ └── billing.py — Stripe subscriptions
├── Agent System (app/agents/)
│ ├── task_agent.py — 8 tools
│ ├── project_agent.py — 6 tools
│ ├── timeline_agent.py — 4 tools
│ └── note_agent.py — 5 tools
└── Orchestration (app/core/)
├── agent_registry.py
├── agent_runner.py
├── llm.py — LiteLLM factory (100+ providers)
└── memory_middleware.py
```
**LLM routing**: GPT-4o-mini classifies incoming intent → routes to appropriate domain agent → agent uses GPT-4o with its tool set → sends tool calls back to Electron client for local execution.
**Zero-trust data model**: The backend never decrypts user data. PostgreSQL stores only auth, billing, and metadata. All user content stays local on the Electron client.
**Tier system**: Free / Pro / Power / Team — enforced in `app/api/middleware/rate_limit.py` (20200 req/min sliding window) and `app/billing/tier_manager.py`.
**Key config**: `app/config/settings.py` — all env vars via Pydantic Settings. Copy `.env.example` to `.env` for local dev.
**Database**: PostgreSQL with async SQLAlchemy 2.0 + asyncpg. Migrations in `alembic/versions/`. Models in `app/models.py`, Pydantic schemas in `app/schemas.py`.
**Testing**: pytest with pytest-asyncio. Fixtures in `tests/conftest.py`. Use in-memory SQLite for DB tests.
---
## Microservices Migration (In Progress)
The monolith (`adiuva-api/app/`) is being split into independent services under `adiuva-api/services/`. Architectural decisions are tracked in repo memory (`/memories/repo/microservices-architecture.md`).
### Target Services (MVP)
| Service | Owns | Scaling |
|---------|------|---------|
| **Auth** | JWT RS256 issuance, users, refresh_tokens, subscriptions | Stateless |
| **WS Gateway** | WebSocket connections, Redis frame routing, device registry | Sticky (user_id) |
| **Chat** | deep_agent, memory, domain agents (task/note/project/timeline), LLM | Stateless |
| **Batch Agent** | agent_runner, journey builder, filesystem_agent, integrations (+ Langfuse tracing TODO) | Stateless |
| **Billing** | Stripe, tier_manager | Stateless |
**API Gateway**: Traefik with ForwardAuth → Auth `/verify`. Injects `X-User-Id`, `X-User-Email`, `X-User-Tier` headers. Downstream services trust these headers.
### Monorepo Structure
```
adiuva-api/
├── shared/ ← SQLAlchemy models, Pydantic schemas, config, redis utils
├── services/
│ ├── auth/
│ ├── ws-gateway/
│ ├── chat/
│ ├── batch-agent/
│ └── billing/
├── alembic/ ← Centralized migrations (shared DB)
├── docker-compose.yml
└── traefik/
```
### Key Conventions
- **shared/ module**: Imported by all services. Contains models, schemas, config, DB session factory, Redis client. Changes here affect all services — be careful.
- **Redis is the glue**: WS Gateway ↔ Chat/Batch communication is entirely via Redis pub/sub and lists. See `/memories/repo/microservices-architecture.md` for channel naming.
- **No JWT validation in downstream services**: Only Auth Service has the private key. Other services receive pre-validated identity via Traefik headers.
- **Tool call round-trip**: Chat/Batch → publish `tool_call` to `ws:out:{user_id}` → WS Gateway forwards to Electron → Electron replies `tool_result` → WS Gateway LPUSH to `tool:result:{call_id}` → Chat/Batch BRPOP with 30s timeout.
- **Migration strategy**: Strangler fig. Extract one service at a time. The monolith `app/` continues to work until all services are extracted. Don't delete monolith code until the service replacement is tested.
- **Storage & Plugin services**: Removed from codebase. Will be re-evaluated in future feature planning.
### Commands (Microservices)
```bash
# Full stack
docker compose up --build
# Single service dev (example: chat)
docker compose up redis postgres auth
cd services/chat && uvicorn app.main:app --reload --port 8002
# Migrations (still centralized)
alembic upgrade head
```

9
.claude/settings.json Normal file
View File

@@ -0,0 +1,9 @@
{
"mcpServers": {
"langfuse-docs": {
"transportType": "http",
"url": "https://langfuse.com/api/mcp",
"verifySsl": true
}
}
}

View File

@@ -0,0 +1,6 @@
{
"enabledMcpjsonServers": [
"langfuse-docs"
],
"enableAllProjectMcpServers": true
}

View File

@@ -0,0 +1,25 @@
---
name: boost-prompt
description: 'Interactive prompt refinement workflow: interrogates scope, deliverables, constraints; copies final markdown to clipboard; never writes code. Requires the Joyride extension.'
---
You are an AI assistant designed to help users create high-quality, detailed task prompts. DO NOT WRITE ANY CODE.
Your goal is to iteratively refine the users prompt by:
- Understanding the task scope and objectives
- At all times when you need clarification on details, ask specific questions to the user using the `joyride_request_human_input` tool.
- Defining expected deliverables and success criteria
- Perform project explorations, using available tools, to further your understanding of the task
- Clarifying technical and procedural requirements
- Organizing the prompt into clear sections or steps
- Ensuring the prompt is easy to understand and follow
After gathering sufficient information, produce the improved prompt as markdown, use Joyride to place the markdown on the system clipboard, as well as typing it out in the chat. Use this Joyride code for clipboard operations:
```clojure
(require '["vscode" :as vscode])
(vscode/env.clipboard.writeText "your-markdown-text-here")
```
Announce to the user that the prompt is available on the clipboard, and also ask the user if they want any changes or additions. Repeat the copy + chat + ask after any revisions of the prompt.

View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,73 @@
---
name: brand-guidelines
description: Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
license: Complete terms in LICENSE.txt
---
# Anthropic Brand Styling
## Overview
To access Anthropic's official brand identity and style resources, use this skill.
**Keywords**: branding, corporate identity, visual identity, post-processing, styling, brand colors, typography, Anthropic brand, visual formatting, visual design
## Brand Guidelines
### Colors
**Main Colors:**
- Dark: `#141413` - Primary text and dark backgrounds
- Light: `#faf9f5` - Light backgrounds and text on dark
- Mid Gray: `#b0aea5` - Secondary elements
- Light Gray: `#e8e6dc` - Subtle backgrounds
**Accent Colors:**
- Orange: `#d97757` - Primary accent
- Blue: `#6a9bcc` - Secondary accent
- Green: `#788c5d` - Tertiary accent
### Typography
- **Headings**: Poppins (with Arial fallback)
- **Body Text**: Lora (with Georgia fallback)
- **Note**: Fonts should be pre-installed in your environment for best results
## Features
### Smart Font Application
- Applies Poppins font to headings (24pt and larger)
- Applies Lora font to body text
- Automatically falls back to Arial/Georgia if custom fonts unavailable
- Preserves readability across all systems
### Text Styling
- Headings (24pt+): Poppins font
- Body text: Lora font
- Smart color selection based on background
- Preserves text hierarchy and formatting
### Shape and Accent Colors
- Non-text shapes use accent colors
- Cycles through orange, blue, and green accents
- Maintains visual interest while staying on-brand
## Technical Details
### Font Management
- Uses system-installed Poppins and Lora fonts when available
- Provides automatic fallback to Arial (headings) and Georgia (body)
- No font installation required - works with existing system fonts
- For best results, pre-install Poppins and Lora fonts in your environment
### Color Application
- Uses RGB color values for precise brand matching
- Applied via python-pptx's RGBColor class
- Maintains color fidelity across different systems

View File

@@ -0,0 +1,177 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS

View File

@@ -0,0 +1,42 @@
---
name: frontend-design
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
license: Complete terms in LICENSE.txt
---
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
## Design Thinking
Before coding, understand the context and commit to a BOLD aesthetic direction:
- **Purpose**: What problem does this interface solve? Who uses it?
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
- **Constraints**: Technical requirements (framework, performance, accessibility).
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
- Production-grade and functional
- Visually striking and memorable
- Cohesive with a clear aesthetic point-of-view
- Meticulously refined in every detail
## Frontend Aesthetics Guidelines
Focus on:
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.

View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,96 @@
---
name: webapp-testing
description: Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
license: Complete terms in LICENSE.txt
---
# Web Application Testing
To test local web applications, write native Python Playwright scripts.
**Helper Scripts Available**:
- `scripts/with_server.py` - Manages server lifecycle (supports multiple servers)
**Always run scripts with `--help` first** to see usage. DO NOT read the source until you try running the script first and find that a customized solution is abslutely necessary. These scripts can be very large and thus pollute your context window. They exist to be called directly as black-box scripts rather than ingested into your context window.
## Decision Tree: Choosing Your Approach
```
User task → Is it static HTML?
├─ Yes → Read HTML file directly to identify selectors
│ ├─ Success → Write Playwright script using selectors
│ └─ Fails/Incomplete → Treat as dynamic (below)
└─ No (dynamic webapp) → Is the server already running?
├─ No → Run: python scripts/with_server.py --help
│ Then use the helper + write simplified Playwright script
└─ Yes → Reconnaissance-then-action:
1. Navigate and wait for networkidle
2. Take screenshot or inspect DOM
3. Identify selectors from rendered state
4. Execute actions with discovered selectors
```
## Example: Using with_server.py
To start a server, run `--help` first, then use the helper:
**Single server:**
```bash
python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py
```
**Multiple servers (e.g., backend + frontend):**
```bash
python scripts/with_server.py \
--server "cd backend && python server.py" --port 3000 \
--server "cd frontend && npm run dev" --port 5173 \
-- python your_automation.py
```
To create an automation script, include only Playwright logic (servers are managed automatically):
```python
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True) # Always launch chromium in headless mode
page = browser.new_page()
page.goto('http://localhost:5173') # Server already running and ready
page.wait_for_load_state('networkidle') # CRITICAL: Wait for JS to execute
# ... your automation logic
browser.close()
```
## Reconnaissance-Then-Action Pattern
1. **Inspect rendered DOM**:
```python
page.screenshot(path='/tmp/inspect.png', full_page=True)
content = page.content()
page.locator('button').all()
```
2. **Identify selectors** from inspection results
3. **Execute actions** using discovered selectors
## Common Pitfall
❌ **Don't** inspect the DOM before waiting for `networkidle` on dynamic apps
✅ **Do** wait for `page.wait_for_load_state('networkidle')` before inspection
## Best Practices
- **Use bundled scripts as black boxes** - To accomplish a task, consider whether one of the scripts available in `scripts/` can help. These scripts handle common, complex workflows reliably without cluttering the context window. Use `--help` to see usage, then invoke directly.
- Use `sync_playwright()` for synchronous scripts
- Always close the browser when done
- Use descriptive selectors: `text=`, `role=`, CSS selectors, or IDs
- Add appropriate waits: `page.wait_for_selector()` or `page.wait_for_timeout()`
## Reference Files
- **examples/** - Examples showing common patterns:
- `element_discovery.py` - Discovering buttons, links, and inputs on a page
- `static_html_automation.py` - Using file:// URLs for local HTML
- `console_logging.py` - Capturing console logs during automation

View File

@@ -0,0 +1,35 @@
from playwright.sync_api import sync_playwright
# Example: Capturing console logs during browser automation
url = 'http://localhost:5173' # Replace with your URL
console_logs = []
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(viewport={'width': 1920, 'height': 1080})
# Set up console log capture
def handle_console_message(msg):
console_logs.append(f"[{msg.type}] {msg.text}")
print(f"Console: [{msg.type}] {msg.text}")
page.on("console", handle_console_message)
# Navigate to page
page.goto(url)
page.wait_for_load_state('networkidle')
# Interact with the page (triggers console logs)
page.click('text=Dashboard')
page.wait_for_timeout(1000)
browser.close()
# Save console logs to file
with open('/mnt/user-data/outputs/console.log', 'w') as f:
f.write('\n'.join(console_logs))
print(f"\nCaptured {len(console_logs)} console messages")
print(f"Logs saved to: /mnt/user-data/outputs/console.log")

View File

@@ -0,0 +1,40 @@
from playwright.sync_api import sync_playwright
# Example: Discovering buttons and other elements on a page
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Navigate to page and wait for it to fully load
page.goto('http://localhost:5173')
page.wait_for_load_state('networkidle')
# Discover all buttons on the page
buttons = page.locator('button').all()
print(f"Found {len(buttons)} buttons:")
for i, button in enumerate(buttons):
text = button.inner_text() if button.is_visible() else "[hidden]"
print(f" [{i}] {text}")
# Discover links
links = page.locator('a[href]').all()
print(f"\nFound {len(links)} links:")
for link in links[:5]: # Show first 5
text = link.inner_text().strip()
href = link.get_attribute('href')
print(f" - {text} -> {href}")
# Discover input fields
inputs = page.locator('input, textarea, select').all()
print(f"\nFound {len(inputs)} input fields:")
for input_elem in inputs:
name = input_elem.get_attribute('name') or input_elem.get_attribute('id') or "[unnamed]"
input_type = input_elem.get_attribute('type') or 'text'
print(f" - {name} ({input_type})")
# Take screenshot for visual reference
page.screenshot(path='/tmp/page_discovery.png', full_page=True)
print("\nScreenshot saved to /tmp/page_discovery.png")
browser.close()

View File

@@ -0,0 +1,33 @@
from playwright.sync_api import sync_playwright
import os
# Example: Automating interaction with static HTML files using file:// URLs
html_file_path = os.path.abspath('path/to/your/file.html')
file_url = f'file://{html_file_path}'
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(viewport={'width': 1920, 'height': 1080})
# Navigate to local HTML file
page.goto(file_url)
# Take screenshot
page.screenshot(path='/mnt/user-data/outputs/static_page.png', full_page=True)
# Interact with elements
page.click('text=Click Me')
page.fill('#name', 'John Doe')
page.fill('#email', 'john@example.com')
# Submit form
page.click('button[type="submit"]')
page.wait_for_timeout(500)
# Take final screenshot
page.screenshot(path='/mnt/user-data/outputs/after_submit.png', full_page=True)
browser.close()
print("Static HTML automation completed!")

View File

@@ -0,0 +1,106 @@
#!/usr/bin/env python3
"""
Start one or more servers, wait for them to be ready, run a command, then clean up.
Usage:
# Single server
python scripts/with_server.py --server "npm run dev" --port 5173 -- python automation.py
python scripts/with_server.py --server "npm start" --port 3000 -- python test.py
# Multiple servers
python scripts/with_server.py \
--server "cd backend && python server.py" --port 3000 \
--server "cd frontend && npm run dev" --port 5173 \
-- python test.py
"""
import subprocess
import socket
import time
import sys
import argparse
def is_server_ready(port, timeout=30):
"""Wait for server to be ready by polling the port."""
start_time = time.time()
while time.time() - start_time < timeout:
try:
with socket.create_connection(('localhost', port), timeout=1):
return True
except (socket.error, ConnectionRefusedError):
time.sleep(0.5)
return False
def main():
parser = argparse.ArgumentParser(description='Run command with one or more servers')
parser.add_argument('--server', action='append', dest='servers', required=True, help='Server command (can be repeated)')
parser.add_argument('--port', action='append', dest='ports', type=int, required=True, help='Port for each server (must match --server count)')
parser.add_argument('--timeout', type=int, default=30, help='Timeout in seconds per server (default: 30)')
parser.add_argument('command', nargs=argparse.REMAINDER, help='Command to run after server(s) ready')
args = parser.parse_args()
# Remove the '--' separator if present
if args.command and args.command[0] == '--':
args.command = args.command[1:]
if not args.command:
print("Error: No command specified to run")
sys.exit(1)
# Parse server configurations
if len(args.servers) != len(args.ports):
print("Error: Number of --server and --port arguments must match")
sys.exit(1)
servers = []
for cmd, port in zip(args.servers, args.ports):
servers.append({'cmd': cmd, 'port': port})
server_processes = []
try:
# Start all servers
for i, server in enumerate(servers):
print(f"Starting server {i+1}/{len(servers)}: {server['cmd']}")
# Use shell=True to support commands with cd and &&
process = subprocess.Popen(
server['cmd'],
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
server_processes.append(process)
# Wait for this server to be ready
print(f"Waiting for server on port {server['port']}...")
if not is_server_ready(server['port'], timeout=args.timeout):
raise RuntimeError(f"Server failed to start on port {server['port']} within {args.timeout}s")
print(f"Server ready on port {server['port']}")
print(f"\nAll {len(servers)} server(s) ready")
# Run the command
print(f"Running: {' '.join(args.command)}\n")
result = subprocess.run(args.command)
sys.exit(result.returncode)
finally:
# Clean up all servers
print(f"\nStopping {len(server_processes)} server(s)...")
for i, process in enumerate(server_processes):
try:
process.terminate()
process.wait(timeout=5)
except subprocess.TimeoutExpired:
process.kill()
process.wait()
print(f"Server {i+1} stopped")
print("All servers stopped")
if __name__ == '__main__':
main()

8
.mcp.json Normal file
View File

@@ -0,0 +1,8 @@
{
"mcpServers": {
"langfuse-docs": {
"type": "http",
"url": "https://langfuse.com/api/mcp"
}
}
}

9
.vscode/mcp.json vendored Normal file
View File

@@ -0,0 +1,9 @@
{
"servers": {
"langfuse-docs": {
"url": "https://langfuse.com/api/mcp",
"type": "http"
}
},
"inputs": []
}

551
docs/LOCAL_AGENT_V2_PLAN.md Normal file
View File

@@ -0,0 +1,551 @@
# Local Agent V2 — Piano Implementativo
> Riferimento architetturale: [`local_agent_v2_mem.md`](local_agent_v2_mem.md)
---
## Panoramica
Il Local Agent V2 sostituisce il flusso a 3 call LLM (classification + processing separati)
con un'architettura a 2 fasi:
1. **Detect + Preprocess** (Python puro, zero LLM) — identifica il tipo di contenuto e lo pulisce
2. **Single LLM call** (classify + extract + create) — una sola call agentiva con tool calling
### Langfuse: Scoring + Prompt Management (hot-swap)
Ogni step include un test set con eval che inviano score a Langfuse.
I **prompt sono gestiti da Langfuse Prompt Management** — modificabili dalla UI
senza toccare codice. Ogni score è collegato alla **versione esatta del prompt**
che lo ha prodotto, permettendo confronto A/B tra versioni.
**Workflow iterativo:**
1. Scrivi/modifica il prompt nella UI di Langfuse (es. `unified_processing` v3)
2. Lancia gli eval: `pytest tests/test_agent_runner_v2.py -k eval`
3. Vedi in Langfuse: prompt v3 → score 0.6
4. Modifica il prompt → v4
5. Ri-lancia gli eval → prompt v4 → score 0.9
6. Promuovi v4 a `production` label
**Prompt Langfuse da creare (con fallback hardcoded nel codice):**
| Nome Langfuse | Usato in | Descrizione |
|---|---|---|
| `unified_processing` | Step 2 (runner) | Prompt unico: classify + extract + create |
| `journey_system_v2` | Step 4 (journey) | Journey chatbot → produce AgentConfig JSON |
**Pattern di scoring con prompt version linking:**
```python
from app.core.langfuse_client import get_langfuse, get_prompt_or_fallback
def run_eval_with_prompt(prompt_name: str, fallback: str, eval_name: str, run_fn):
"""Esegue un eval collegando score ↔ prompt version."""
lf = get_langfuse()
template, prompt_obj = get_prompt_or_fallback(prompt_name, fallback)
# Crea trace per l'eval
trace = lf.trace(name=f"eval-{eval_name}") if lf else None
# Esegui la call LLM dentro una generation linkata al prompt
if lf and trace:
with lf.start_as_current_observation(
as_type="generation",
name=eval_name,
prompt=prompt_obj, # ← linka alla versione esatta del prompt
trace_id=trace.id,
) as gen:
result, score = run_fn(template)
gen.update(output=str(result))
else:
result, score = run_fn(template)
# Score collegato al trace → visibile per prompt version in Langfuse
if lf and trace:
lf.score(
trace_id=trace.id,
name=eval_name,
value=score,
data_type="NUMERIC",
)
lf.flush()
return result, score
```
**In Langfuse vedrai:**
```
Prompt: unified_processing
├── v3 (2026-04-05) → avg score: 0.62 (12 evals)
├── v4 (2026-04-07) → avg score: 0.85 (12 evals) ← production
└── v5 (2026-04-08) → avg score: 0.91 (12 evals) ← candidate
```
---
## Step 1 — Preprocessor: email HTML handler ✅ DONE
## Step 3 — Model e schema: `prompt_template` → `agent_config` ✅ DONE
Aggiunto in parallelo a Step 2 come prerequisito:
- `app/schemas.py`: `ContentTypeConfig`, `AgentConfig`
- `app/models.py`: `agent_config: JSON` (nullable, accanto a `prompt_template`)
- `alembic/versions/a3b9c0d1e2f3_add_agent_config_to_local_agents.py`
## Step 2 — Refactor `agent_runner.py`: nuovo flusso per file ✅ DONE
**File da creare:**
- `app/core/preprocessors/__init__.py` — registry, detect, dispatch
- `app/core/preprocessors/base.py` — dataclass `PreprocessResult`, classe base
- `app/core/preprocessors/email_html.py` — BeautifulSoup handler
**Cosa fa:**
- `detect_content_type(filename, raw_content) -> str` — heuristic basata su extension + pattern nel contenuto
- `preprocess(content_type, raw_content) -> PreprocessResult` — dispatch al handler corretto
- `PreprocessResult`: `{ content_type, clean_text, metadata: {subject, from, to, date, ...} }`
**Handler `email_html`:**
- Strip `<style>`, `<script>`, HTML tags → testo pulito (BeautifulSoup)
- Estrai metadata: Subject, From, To, Date (da `<meta>`, header pattern, o content heuristic)
- Split thread: identifica quote markers (`>`, `On ... wrote:`, `---Original Message---`) → isola l'ultimo messaggio
- Fallback: se non riesce a splittare, restituisce tutto il testo pulito
**Handler fallback (`generic`):**
- Strip HTML tags se presenti
- Restituisce testo as-is con metadata minime (filename, extension)
**Dipendenze da aggiungere:**
- `beautifulsoup4` (già probabilmente installata, verificare)
- `lxml` (parser veloce per BS4, opzionale)
### Test set — Step 1
**File:** `tests/test_preprocessors.py`
| # | Test case | Input | Expected | Score name |
|---|-----------|-------|----------|------------|
| 1.1 | Detect email HTML | `.html` con `From:`, `To:`, `Subject:` | `content_type == "email_html"` | `preprocess.detect_email` |
| 1.2 | Detect generic HTML | `.html` con `<nav>`, `<main>` | `content_type == "generic_html"` | `preprocess.detect_generic` |
| 1.3 | Detect plain text | `.txt` | `content_type == "plain_text"` | `preprocess.detect_text` |
| 1.4 | Detect unknown | `.xyz` binario | `content_type == "unknown"` | `preprocess.detect_unknown` |
| 1.5 | Email: strip HTML | Email con `<style>`, CSS inline | `clean_text` senza tag HTML | `preprocess.email_strip` |
| 1.6 | Email: extract metadata | Email con Subject/From/Date | metadata corretti | `preprocess.email_metadata` |
| 1.7 | Email: split thread | Email con 3 risposte nested | `clean_text` = solo ultimo msg | `preprocess.email_thread` |
| 1.8 | Email: singolo messaggio | Email senza thread | `clean_text` = intero body | `preprocess.email_single` |
| 1.9 | Email: HTML pesante | Email con molto CSS/table layout | testo leggibile estratto | `preprocess.email_heavy_html` |
| 1.10 | Fallback: file sconosciuto | File binario | `clean_text` con fallback | `preprocess.fallback` |
**Eval con Langfuse:**
```python
@pytest.mark.asyncio
async def test_email_html_strip(sample_email_html):
lf = get_langfuse()
trace = lf.trace(name="eval-preprocess-email-strip") if lf else None
result = preprocess("email_html", sample_email_html)
# Assertions
has_no_tags = "<" not in result.clean_text
has_content = len(result.clean_text) > 50
ratio = len(result.clean_text) / len(sample_email_html) # compression ratio
score = 1.0 if (has_no_tags and has_content and ratio < 0.5) else 0.0
if trace:
lf.score(trace_id=trace.id, name="preprocess.email_strip", value=score,
comment=f"ratio={ratio:.2f}, len={len(result.clean_text)}")
lf.flush()
assert has_no_tags
assert has_content
```
**Criteri di successo:** tutti i 10 test passano, score medio ≥ 0.9
---
## Step 2 — Refactor `agent_runner.py`: nuovo flusso per file ✅ DONE
**File da modificare:**
- `app/core/agent_runner.py`
**Cosa cambia:**
- Rimuovere `_classify_file()` (Step 1 LLM separato)
- Rimuovere `_BATCH_FILE_CLASSIFIER_PROMPT`
- Aggiungere import del preprocessor
- Nuovo flusso in `run_local_agent()`:
```python
for file_path in file_paths:
# 1. Leggi file raw
raw_content = await execute_on_client(action="read_file_content", ...)
# 2. Detect + Preprocess (Python, zero LLM)
content_type = detect_content_type(file_path, raw_content)
preprocessed = preprocess(content_type, raw_content)
# 3. Fetch prompt da Langfuse (hot-swappable dalla UI) con fallback locale
template, prompt_obj = get_prompt_or_fallback(
"unified_processing", _UNIFIED_PROCESSING_PROMPT
)
extraction_rules = _get_extraction_rules(config.agent_config, content_type)
system_prompt = template.format(
extraction_rules=extraction_rules,
global_rules="\n".join(config.agent_config.get("global_rules", [])),
projects_list=_format_projects(projects),
data_types=", ".join(config.data_types),
filename=os.path.basename(file_path),
metadata_section=_format_metadata(preprocessed.metadata),
no_match_behavior=_get_no_match_behavior(config.agent_config),
)
# 4. Single LLM call con tools (classify + extract + create)
# La generation è linkata al prompt_obj → score visibili per versione
user_message = _build_user_message(file_path, preprocessed)
result = await _run_agent_with_tools(
system_prompt=system_prompt,
user_message=user_message,
tools=processing_tools,
max_steps=_MAX_PROCESSING_STEPS,
langfuse_prompt=prompt_obj, # ← linka alla versione del prompt
)
```
**Prompt `unified_processing` (fallback locale, editabile da Langfuse UI):**
```
You are a data extraction assistant for a freelance project management tool.
## Your process (follow this exact order)
### 1. Identify the project
File: {filename}
{metadata_section}
Existing projects:
{projects_list}
Match this file to an existing project using the filename and content.
If no project matches, {no_match_behavior}.
### 2. Check existing records
Once you identify the project, use list_tasks/list_notes/list_timelines
to see what already exists. NEVER create duplicates.
### 3. Extract and create/update
{extraction_rules}
### Rules
- Set isAiSuggested=1 on every new record
- Set projectId on every record
- Update existing records when a match is found by title/topic
{global_rules}
```
**Fix `items_created`:** contare i `create_*` tool calls nei risultati.
### Test set — Step 2
**File:** `tests/test_agent_runner_v2.py`
| # | Test case | Input | Expected | Score name |
|---|-----------|-------|----------|------------|
| 2.1 | Happy path: email → task | Email preprocessata con azione | `create_task` tool chiamato | `runner.email_to_task` |
| 2.2 | Happy path: email → nota | Email informativa | `create_note` tool chiamato | `runner.email_to_note` |
| 2.3 | Happy path: email → timeline | Email con data evento | `create_timeline` tool chiamato | `runner.email_to_timeline` |
| 2.4 | Project matching: filename | File `ProjectX_report.html` | progetto ProjectX selezionato | `runner.project_filename` |
| 2.5 | Project matching: contenuto | File con menzione progetto nel body | progetto corretto | `runner.project_content` |
| 2.6 | No project match → regola globale | File senza match progetto | comportamento da global_rules | `runner.no_project` |
| 2.7 | Deduplicazione | Task esistente + email simile | `update_task`, non `create_task` | `runner.dedup` |
| 2.8 | items_created conteggio | 2 create + 1 update | `items_created == 2` | `runner.items_count` |
| 2.9 | Device offline | No device | status=error | `runner.offline` |
| 2.10 | File vuoto | Contenuto vuoto | skip senza errori | `runner.empty_file` |
**Eval con Langfuse (prompt hot-swap + score per versione):**
```python
@pytest.mark.asyncio
async def test_email_to_task_e2e(mock_ws_executor):
lf = get_langfuse()
# Il prompt viene da Langfuse → puoi cambiarlo dalla UI e ri-lanciare il test
template, prompt_obj = get_prompt_or_fallback(
"unified_processing", _UNIFIED_PROCESSING_PROMPT
)
trace = lf.trace(
name="eval-runner-email-to-task",
metadata={"step": "2", "prompt_version": getattr(prompt_obj, "version", "fallback")},
) if lf else None
config = _make_config(agent_config={
"content_types": [{
"id": "email_html",
"extraction_prompt": "Azione diretta → task. Informativa → nota."
}],
"global_rules": [],
"data_types": ["tasks", "notes"]
})
# Mock preprocessed email with action request
mock_file_content = "Subject: Fix the bug\nFrom: boss@co.com\n\nPlease fix the login bug by Friday."
tool_calls_made = []
# ... setup mock that captures tool calls ...
await run_local_agent(user_id, config, run_log, device_mgr)
created_tasks = [c for c in tool_calls_made if c["name"] == "create_task"]
score = 1.0 if len(created_tasks) == 1 else 0.0
title_match = 1.0 if any("bug" in c["args"].get("title", "").lower() for c in created_tasks) else 0.0
if trace:
# Score collegato al trace → Langfuse lo linka alla prompt version automaticamente
lf.score(trace_id=trace.id, name="runner.email_to_task", value=score,
comment=f"tasks_created={len(created_tasks)}")
lf.score(trace_id=trace.id, name="runner.email_to_task.title", value=title_match)
lf.flush()
assert score == 1.0
assert title_match == 1.0
```
**Criteri di successo:** tutti i 10 test passano, score medio ≥ 0.8
---
## Step 3 — Model e schema: `prompt_template` → `agent_config` ✅ DONE (vedi sopra)
**File da modificare:**
- `app/models.py``LocalAgentConfig.prompt_template: Text``agent_config: JSON`
- `app/schemas.py` — Pydantic schema per `AgentConfig`
- `alembic/versions/` — nuova migration
- `app/api/routes/agents.py` — aggiornare `trigger_agent_run` per leggere `agent_config`
**Pydantic schema:**
```python
class ContentTypeConfig(BaseModel):
id: str
label: str = ""
detection_hint: str = ""
preprocessing: str = "generic" # nome handler: "email_html", "generic", ...
extraction_prompt: str
class AgentConfig(BaseModel):
content_types: list[ContentTypeConfig] = []
global_rules: list[str] = []
data_types: list[str] = []
```
### Test set — Step 3
**File:** `tests/test_agent_config_schema.py`
| # | Test case | Input | Expected | Score name |
|---|-----------|-------|----------|------------|
| 3.1 | Schema valida | JSON completo | parsing OK | `schema.valid` |
| 3.2 | Schema minima | Solo `data_types` | default applicati | `schema.minimal` |
| 3.3 | Content type sconosciuto | `preprocessing: "pdf"` | accettato (futuro) | `schema.unknown_type` |
| 3.4 | Migration up/down | Alembic migrate | nessun errore | `schema.migration` |
| 3.5 | Trigger con agent_config | POST /agents/trigger | config parsata | `schema.trigger` |
**Criteri di successo:** tutti i 5 test passano
---
## Step 4 — Journey setup: output strutturato ✅ DONE
**File da modificare:**
- `app/api/routes/agent_setup.py``_JOURNEY_SYSTEM_PROMPT` riscritta
- `app/api/routes/agent_setup.py` — parsing output JSON invece di marker di testo
**Cosa cambia:**
- Il journey produce un `AgentConfig` JSON, non un `prompt_template` in prosa
- Il system prompt viene da Langfuse (`journey_system_v2`) con fallback locale
**modificabile dalla UI senza toccare codice** per iterare sulla qualità del journey
- Il system prompt istruisce l'LLM a:
1. Esplorare la directory
2. Identificare i tipi di contenuto presenti
3. Per ogni tipo, chiedere all'utente le regole di estrazione
4. Produrre un JSON strutturato conforme allo schema `AgentConfig`
- I marker `PROMPT_TEMPLATE_START/END` diventano `AGENT_CONFIG_START/END`
- Il parsing estrae e valida JSON con Pydantic
- Ogni call LLM del journey è linkata al `prompt_obj` → score per versione
### Test set — Step 4
**File:** `tests/test_journey_v2.py`
| # | Test case | Input | Expected | Score name |
|---|-----------|-------|----------|------------|
| 4.1 | Journey start: esplora directory | Directory con email HTML | prima domanda pertinente | `journey.start` |
| 4.2 | Journey: produce JSON valido | 3-5 turni di conversazione | `AgentConfig` valido | `journey.valid_json` |
| 4.3 | Journey: rileva email HTML | Directory con `.html` email | content_type `email_html` presente | `journey.detect_email` |
| 4.4 | Journey: regole custom utente | "crea solo note, no task" | `extraction_prompt` riflette la regola | `journey.custom_rules` |
| 4.5 | Journey: global rules | "no progetto = no entità" | presente in `global_rules` | `journey.global_rules` |
| 4.6 | Journey: nudge dopo max turns | Raggiunto limite turni | JSON prodotto comunque | `journey.nudge` |
**Eval con Langfuse (esempio LLM-as-judge):**
```python
@pytest.mark.asyncio
async def test_journey_produces_valid_config(mock_ws_executor):
lf = get_langfuse()
trace = lf.trace(name="eval-journey-valid-config") if lf else None
# Simula journey completo: start + 3 messaggi
reply = await handle_journey_start(user_id, {
"agent_type": "local",
"directory": "/test/emails",
"data_types": ["tasks", "notes"],
})
# Simula risposte utente
for msg in ["They are email exports from Outlook", "Extract tasks from action items", "Yes, that looks correct"]:
reply = await handle_journey_message(user_id, {
"session_id": reply["session_id"],
"message": msg,
})
if reply.get("done"):
break
config_json = reply.get("agent_config")
is_valid = False
try:
parsed = AgentConfig.model_validate_json(config_json)
is_valid = len(parsed.content_types) > 0
except Exception:
pass
if trace:
lf.score(trace_id=trace.id, name="journey.valid_json", value=1.0 if is_valid else 0.0,
comment=f"config={config_json[:200] if config_json else 'None'}")
lf.flush()
assert is_valid
```
**Criteri di successo:** tutti i 6 test passano, score LLM ≥ 0.8
---
## Step 5 — Frontend: Electron store + scheduler + UI ✅ DONE
**File da modificare:**
- `src/main/store.ts` — campo `promptTemplate``agentConfig`
- `src/main/agents/agent-scheduler.ts` — passa `agentConfig` al trigger
- `src/renderer/components/settings/JourneyDialog.tsx` — parsing JSON da reply
- `src/renderer/components/settings/LocalAgentConfigPanel.tsx` — mostra config
- `src/renderer/components/settings/types.ts` — type `LocalAgentConfig` aggiornato
- `src/shared/api-types.ts` — frame type aggiornato (se impatta WS)
**Cosa cambia:**
- Lo store salva `agentConfig: AgentConfig` (oggetto) invece di `promptTemplate: string`
- Lo scheduler manda `agent_config` nel body del trigger (non `custom_agent_prompt`)
- Il JourneyDialog riceve JSON e lo mostra in modo human-readable
- Il config panel mostra i content types configurati e le regole
### Test set — Step 5
| # | Test case | Verifica | Score name |
|---|-----------|----------|------------|
| 5.1 | Store: salva/legge agentConfig | round-trip JSON | `fe.store` |
| 5.2 | Scheduler: passa config al trigger | body POST corretto | `fe.scheduler` |
| 5.3 | Journey: parsing reply JSON | `agentConfig` popolato | `fe.journey_parse` |
**Nota:** test frontend sono manuali/Playwright. Score inviati solo per i test backend.
**Criteri di successo:** round-trip completo funzionante
---
## Step 6 — Test end-to-end con file reali
**File da creare:**
- `tests/test_local_agent_e2e.py`
- `tests/fixtures/emails/` — 5-10 email HTML di esempio (anonimizzate)
**Scenari E2E:**
| # | Scenario | Input | Expected | Score name |
|---|----------|-------|----------|------------|
| 6.1 | Email con azione → task | "Please review the PR by Friday" | task creato con dueDate | `e2e.action_email` |
| 6.2 | Email informativa → nota | "FYI: new policy effective May 1" | nota + timeline creati | `e2e.info_email` |
| 6.3 | Email thread nested | 4 livelli di reply | solo ultimo msg processato | `e2e.thread` |
| 6.4 | Newsletter → skip | Newsletter marketing | nessuna entità creata | `e2e.newsletter_skip` |
| 6.5 | Progetto da filename | `ProjectX_update.html` | assegnato a ProjectX | `e2e.project_filename` |
| 6.6 | Progetto da contenuto | Email menziona "Project Alpha" | assegnato a Project Alpha | `e2e.project_content` |
| 6.7 | Nessun progetto + regola | No match + "no project = no entity" | nessuna entità creata | `e2e.no_project_rule` |
| 6.8 | Deduplicazione update | Task esiste + email simile | update, non create | `e2e.dedup` |
| 6.9 | Multi-entità da 1 email | Email con task + meeting date | task + timeline creati | `e2e.multi_entity` |
| 6.10 | Batch 5 file misti | 3 email + 1 newsletter + 1 info | 3 processati, 1 skippato, 1 nota | `e2e.batch_mixed` |
**Eval con Langfuse (esempio con scoring multiplo):**
```python
@pytest.mark.asyncio
async def test_e2e_action_email(real_email_fixtures):
lf = get_langfuse()
trace = lf.trace(name="eval-e2e-action-email", metadata={"step": "6"}) if lf else None
# Setup completo: config → preprocess → LLM → tool calls
tool_calls = await run_full_pipeline(
file_path="fixtures/emails/action_request.html",
agent_config=STANDARD_EMAIL_CONFIG,
existing_projects=[{"id": "p1", "name": "Project Alpha"}],
)
# Score multipli per aspetto
scores = {
"task_created": 1.0 if any(c["name"] == "create_task" for c in tool_calls) else 0.0,
"correct_project": 1.0 if any(c["args"].get("project_id") == "p1" for c in tool_calls) else 0.0,
"has_due_date": 1.0 if any(c["args"].get("due_date", 0) > 0 for c in tool_calls) else 0.0,
"is_ai_suggested": 1.0 if any(c["args"].get("is_ai_suggested") == 1 for c in tool_calls) else 0.0,
}
if trace:
for name, value in scores.items():
lf.score(trace_id=trace.id, name=f"e2e.action_email.{name}", value=value)
lf.flush()
assert all(v == 1.0 for v in scores.values())
```
**Criteri di successo:** ≥ 8/10 test passano, score medio ≥ 0.8
---
## Ordine di implementazione
```
Step 1 (preprocessor) ← nessuna dipendenza, partire qui
Step 3 (model/schema) ← parallelo a Step 1
Step 2 (agent_runner) ← dipende da Step 1 + Step 3
Step 4 (journey setup) ← dipende da Step 3 (schema AgentConfig)
Step 5 (frontend) ← dipende da Step 3 + Step 4
Step 6 (E2E) ← dipende da tutto
```
**Step 1 e 3 possono essere sviluppati in parallelo.**
---
## Riepilogo score Langfuse
| Step | Score prefix | # test | Soglia minima |
|------|-------------|--------|---------------|
| 1 | `preprocess.*` | 10 | ≥ 0.9 |
| 2 | `runner.*` | 10 | ≥ 0.8 |
| 3 | `schema.*` | 5 | 1.0 (deterministici) |
| 4 | `journey.*` | 6 | ≥ 0.8 |
| 5 | `fe.*` | 3 | 1.0 (deterministici) |
| 6 | `e2e.*` | 10 | ≥ 0.8 |
Totale: **44 test**, di cui ~26 con scoring LLM su Langfuse.

View File

@@ -0,0 +1,419 @@
# Enhanced Memory V2 - Analisi e Progettazione
## Stato: FASE 2 - Analisi e Proposta completata
---
## 1. DECISIONI PRESE
| Domanda | Risposta |
|---------|----------|
| **Privacy** | Backend PUÒ processare plaintext in-memory per estrazione. NO persistenza plaintext |
| **SaaS vs In-House** | **Solo in-house**. Nessuna dipendenza Supermemory API |
| **Feature target** | Graph Memory, Contradiction Resolution, Forgetting/Decay, User Profiles, LLM Episode Summarization |
| **Scala utenti** | < 100 (early stage) |
| **Budget LLM** | **Minimizzare**. Preferire modelli piccoli/euristiche dove possibile |
| **Semantic search** | NON richiesta in questa fase (keyword fallback accettato per ora) |
---
## 2. IMPLEMENTAZIONE ATTUALE (Adiuva)
### Architettura Memoria (MemGPT-style, 4 livelli)
| Livello | Tabella | Stato |
|---------|---------|-------|
| **Core** | `memory_core` | Funzionante - key/value preferenze |
| **Associativa** | `memory_associative` | Parziale - keyword fallback, no semantic |
| **Episodica** | `memory_episodic` | Funzionante - troncamento 200 char |
| **Proattiva** | `memory_proactive` | Schema vuoto, nessuna logica |
### Punti di Forza
- E2E encryption per-user (Fernet)
- 9 tool agente per memoria
- Context injection automatica pre-LLM
- Episodi auto-salvati post-conversazione
### Gap vs Supermemory (ciò che manca)
1. Nessuna estrazione automatica di fatti dalle conversazioni
2. Nessuna relazione tra memorie (graph UPDATE/EXTEND/DERIVE)
3. Nessun forgetting/decay temporale
4. Nessuna risoluzione contraddizioni
5. Nessun user profile auto-generato
6. Episodi troncati a 200 char senza summarization
7. Proactive memory non implementata
### IL PROBLEMA CENTRALE: context selection cieca
Il metodo `_load_associative()` fa:
```python
SELECT * FROM memory_associative
WHERE user_id = ?
ORDER BY updated_at DESC ordina per DATA, non per rilevanza
LIMIT 5
```
**Il messaggio dell'utente NON viene usato per filtrare.** Ritorna i 5 fatti più recenti,
anche se totalmente irrilevanti alla domanda. Lo stesso per episodic (ultimi 10 per data).
---
## 3. COME SUPERMEMORY SALVA I DATI (non usa MD!)
### Chiarimento: Supermemory NON salva file MD
Supermemory accetta MD come formato di input (insieme a PDF, JSON, codice, ecc.),
ma i dati vengono trasformati e salvati in **PostgreSQL + vector embeddings (Cloudflare AI)**.
### Schema di storage Supermemory (da codice sorgente)
```
Documents table (Drizzle ORM + Postgres)
├── id, customId, orgId, userId
├── content ← raw content originale
├── title, summary ← generati da LLM
├── type ← 'text' | 'web' | 'pdf' | 'md' | ecc.
├── status ← 'queued' | 'extracting' | 'chunking' | 'embedding' | 'done'
├── metadata ← JSON key/value filtrabile
├── tokenCount, wordCount, chunkCount
├── summaryEmbedding (vector)
├── containerTags[] ← namespace isolation (user_id, project_id)
└── createdAt, updatedAt
MemoryEntries table (fatti estratti)
├── id, memory ← il fatto estratto ("Mario preferisce risposte concise")
├── version ← numero di versione del fatto
├── context
│ ├── parents[] ← [{relation: 'updates'|'extends'|'derives', memory, version}]
│ └── children[] ← [{relation, memory, version}]
├── similarity ← score di ricerca
├── metadata
├── sourceDocumentId ← da quale documento è stato estratto
└── updatedAt
```
### Come Adiuva salva i dati (confronto)
```
memory_core (PostgreSQL)
├── key, value_encrypted ← Fernet AES-128
└── user_id
memory_associative (PostgreSQL + pgvector)
├── content_encrypted ← cifrato
├── embedding (1536-dim) ← colonna presente ma MAI usata per search
├── entity_type, entity_id
└── user_id
memory_episodic (PostgreSQL)
├── summary_encrypted ← "User: [200 char]\nAssistant: [200 char]"
├── session_id
└── user_id
memory_proactive (PostgreSQL)
├── pattern_encrypted, confidence ← schema vuoto, nessun dato
└── user_id
```
### Differenza chiave nel salvataggio
| | Supermemory | Adiuva |
|---|-----------|--------|
| **Cosa salva** | Fatti strutturati estratti da LLM | Testo grezzo cifrato |
| **Relazioni** | Graph con UPDATE/EXTEND/DERIVE + versioning | Nessuna relazione |
| **Embeddings** | Generati e usati attivamente per search | Colonna presente ma inutilizzata |
| **Encryption** | Nessuna (plaintext) | Fernet per-user |
| **Processing** | Pipeline: Extract Chunk Embed Index | Store diretto senza processing |
---
## 4. SUPERMEMORY - Cosa Prendere Come Ispirazione
> **NON integriamo Supermemory SaaS.** Ci ispiriamo al design per implementare in-house.
### Concetti da adottare
1. **Relazioni tra memorie**: UPDATE (sostituisce), EXTEND (arricchisce), DERIVE (inferisce)
2. **isLatest flag**: tracciamento della versione corrente di un fatto
3. **Automatic forgetting**: fatti temporali con `expires_at`, episodi che decadono
4. **User Profile duale**: `static` (fatti stabili) + `dynamic` (attività recente)
5. **Fact extraction post-conversazione**: LLM estrae fatti strutturati dopo ogni chat
### Concetti da NON adottare (non rilevanti)
- Connectors (Google Drive, Gmail, etc.) Adiuva è un'app desktop, non un aggregatore
- Multi-modal extraction (PDF, video) fuori scope
- Hybrid RAG+Memory search non richiesto ora
---
## 4. ANALISI COSTI/BENEFICI - OPZIONI
### OPZIONE SCARTATA: Supermemory SaaS Integration
| | Dettaglio |
|---|----------|
| **Costo** | $0-19/mo per <100 utenti (Free/Pro) |
| **Pro** | Implementazione rapida, SOTA benchmarks |
| **Contro Fatali** | Viola privacy (plaintext obbligatorio), vendor lock-in, latenza API esterna, architettura Cloudflare Workers non self-hostabile facilmente |
| **Verdetto** | **SCARTATA** incompatibile con zero-trust e preferenza in-house |
### OPZIONE SCELTA: Enhancement In-House Ispirato a Supermemory
**Approccio**: evoluzione incrementale dell'architettura esistente in 4 fasi.
---
## 5. PIANO DI IMPLEMENTAZIONE PROPOSTO
### FASE 1 — Memory Graph + Contradiction Resolution
**Effort**: ~3-5 giorni | **Costo LLM extra**: ~$0.002/conversazione (GPT-4o-mini)
**Cosa cambia nel DB:**
- Nuova tabella `memory_fact` (sostituisce progressivamente `memory_associative`)
```
memory_fact:
id, user_id
content_encrypted -- il fatto estratto, cifrato
category -- 'preference' | 'fact' | 'episode' | 'goal' | 'relationship'
entity_type -- a cosa si riferisce: 'user' | 'project' | 'task' | 'person'
entity_id -- opzionale, FK
is_latest -- boolean, come Supermemory
superseded_by_id -- FK → memory_fact (relazione UPDATE)
extends_id -- FK → memory_fact (relazione EXTEND)
derived_from_ids -- JSON array di FK (relazione DERIVE)
confidence -- 0.0-1.0
source -- 'extracted' | 'explicit' | 'inferred'
expires_at -- nullable, per fatti temporali
last_accessed_at -- per decay scoring
created_at, updated_at
```
**Come funziona:**
1. Post-conversazione, GPT-4o-mini riceve transcript (ultimi 2000 char) + prompt strutturato
2. LLM estrae JSON array di fatti: `[{content, category, entity_type, is_temporal, expires_at}]`
3. Per ogni fatto estratto, sistema verifica con fatti esistenti (keyword match su decrypt in-memory)
4. Se contraddizione trovata → vecchio fatto `is_latest=false`, `superseded_by_id=nuovo`
5. Se arricchimento → nuovo fatto ha `extends_id=vecchio`
6. Tutto cifrato prima di persistenza
**Stima costi LLM (GPT-4o-mini @ $0.15/1M input, $0.60/1M output):**
- Input: ~2500 tokens/conversazione (transcript + system prompt)
- Output: ~300 tokens (JSON fatti estratti)
- Costo: ~$0.0006/conversazione → **~$0.06 per 100 conversazioni/giorno**
---
### FASE 2 — Automatic Forgetting + Decay
**Effort**: ~1-2 giorni | **Costo LLM extra**: $0 (puro heuristico)
**Meccanismi:**
1. **TTL-based expiry**: fatti con `expires_at` vengono ignorati dopo la data
2. **Access decay**: `last_accessed_at` + scoring formula: `score = confidence * (1 / (1 + days_since_access * 0.05))`
3. **Background cleanup** (cron/periodic): soft-delete fatti con score < 0.1 da >30 giorni
4. **Episodic consolidation**: dopo N episodi per sessione, consolida in un singolo summary
**Nessun costo LLM** — pura logica temporale e scoring matematico.
---
### FASE 3 — User Profile Auto-Generato
**Effort**: ~2-3 giorni | **Costo LLM extra**: ~$0.001/aggiornamento
**Come funziona:**
1. Nuova mini-tabella `user_profile`:
```
user_profile:
user_id (PK)
static_encrypted -- JSON: fatti stabili (nome, ruolo, preferenze durature)
dynamic_encrypted -- JSON: attività recente (ultimi 3-5 topic, task in corso)
updated_at
```
2. Dopo ogni estrazione fatti (Fase 1), profilo viene aggiornato:
- `static`: fatti con `category IN ('preference', 'fact', 'relationship')` e `confidence > 0.7`
- `dynamic`: ultimi 5 fatti con `category = 'goal'` o episodi recenti
3. Profilo iniettato nel system prompt a OGNI conversazione (prima del context attuale)
4. Aggiornamento trigger: post-extraction, batch di fatti nuovi → GPT-4o-mini "aggiorna profilo"
**Stima costi:**
- Input: ~1000 tokens (profilo attuale + nuovi fatti)
- Output: ~500 tokens (profilo aggiornato)
- Costo: ~$0.0005/aggiornamento → **trascurabile**
**Alternativa zero-costo LLM:** il profilo `static` è calcolato come aggregazione diretta dei fatti con `is_latest=true` + alta confidence. Il `dynamic` è gli ultimi N episodi. Nessuna LLM call, solo query SQL. Meno elegante ma $0.
---
### FASE 4 — LLM Episode Summarization
**Effort**: ~1-2 giorni | **Costo LLM extra**: ~$0.001/episodio
**Cosa cambia:**
- Il campo `summary_encrypted` in `memory_episodic` passa da troncamento 200 char a summary LLM
- GPT-4o-mini genera un riassunto strutturato: `{topic, user_intent, outcome, key_facts_mentioned}`
- Async: non blocca la risposta. Avviene dopo che la risposta è già stata inviata al client
**Stima costi:**
- Input: ~1500 tokens (conversazione completa)
- Output: ~200 tokens (summary strutturato)
- Costo: ~$0.0004/episodio → **~$0.04 per 100 conversazioni/giorno**
---
## 6. RIEPILOGO COSTI TOTALI
### Costi LLM aggiuntivi stimati (< 100 utenti, ~100 conversazioni/giorno)
| Fase | Costo/conv | Costo/giorno | Costo/mese |
|------|------------|--------------|------------|
| Fase 1 (Fact Extraction) | $0.0006 | $0.06 | ~$1.80 |
| Fase 2 (Forgetting) | $0 | $0 | $0 |
| Fase 3 (User Profile) | $0-0.0005 | $0-0.05 | $0-1.50 |
| Fase 4 (Episode Summary) | $0.0004 | $0.04 | ~$1.20 |
| **TOTALE** | **~$0.001-0.002** | **~$0.10-0.15** | **~$3-4.50** |
### Confronto con Supermemory SaaS
| | In-House | Supermemory Free | Supermemory Pro |
|---|---------|------------------|-----------------|
| Costo/mese | ~$3-4.50 (LLM) | $0 (ma limiti) | $19/mo |
| Privacy | E2E mantenuta | Plaintext obbligatorio | Plaintext obbligatorio |
| Limiti | Solo LLM rate limits | 1M tokens, 10K search | 3M tokens, 100K search |
| Personalizzazione | Totale | Nessuna | Nessuna |
| Vendor lock-in | Zero | Alto | Alto |
**Verdetto**: l'implementazione in-house costa meno di $5/mese, mantiene la privacy, e offre personalizzazione totale.
---
## 7. MATRICE BENEFICI
| Feature | Impatto UX | Effort | Priorità |
|---------|-----------|--------|----------|
| Fact Extraction + Graph | ALTO - l'AI ricorda tutto automaticamente | Medio | P0 |
| Contradiction Resolution | ALTO - niente informazioni obsolete | Basso (incluso in P0) | P0 |
| Automatic Forgetting | MEDIO - meno noise nel context | Basso | P1 |
| User Profile | ALTO - personalizzazione immediata | Medio | P1 |
| Episode Summarization | MEDIO - recall migliore | Basso | P2 |
---
## 8. ANALISI CRITICA: COME SUPERMEMORY INIETTA IL CONTESTO
### Il flusso Supermemory (Python SDK)
```python
from supermemory import Supermemory
client = Supermemory() # richiede SUPERMEMORY_API_KEY
# ── PRE-LLM: recupera profilo + memorie rilevanti ──
profile = client.profile(
container_tag="user_123", # = user_id di Adiuva
q="What sneakers should I buy?" # = il messaggio dell'utente
)
# profile.profile.static → ["Senior engineer at Acme", "Prefers dark mode"]
# profile.profile.dynamic → ["Working on auth migration"]
# profile.search_results → memorie rilevanti per la query
# ── Assemblaggio system prompt ──
context = f"""Static profile:
{chr(10).join(profile.profile.static)}
Dynamic profile:
{chr(10).join(profile.profile.dynamic)}
Relevant memories:
{chr(10).join(r.get("memory","") for r in profile.search_results.results)}"""
messages = [{"role": "system", "content": f"User context:\n{context}"}, *conversation]
# → passa al tuo LLM
# ── POST-LLM: salva conversazione (Supermemory estrae fatti automaticamente) ──
client.add(
content="\n".join(f"{m['role']}: {m['content']}" for m in conversation),
container_tag="user_123",
)
```
### Cosa succede sotto: `client.add()` → HTTPS → Supermemory cloud
1. Supermemory riceve il **plaintext completo** della conversazione
2. Il **loro LLM** estrae fatti, preferenze, entità
3. Costruisce relazioni graph (UPDATE/EXTEND/DERIVE) con fatti esistenti
4. Aggiorna il profilo utente (static + dynamic)
5. Applica forgetting su fatti temporali scaduti
### Come si confronta con il tuo `enrich_context()`:
| Aspetto | Adiuva (attuale) | Supermemory |
|---------|-----------------|-------------|
| **Dove vive la logica** | `memory_middleware.py` nel tuo backend | Cloud di terzi |
| **Come recupera contesto** | 4 query SQL → decrypt in-memory | 1 HTTPS call `client.profile()` |
| **Qualità contesto** | Raw key/value + troncamenti 200 char | Fatti strutturati + profilo curato |
| **Chi estrae fatti** | Nessuno (o l'utente via tool) | LLM automatico su ogni `add()` |
| **Latenza retrieval** | ~5-15ms (DB locale) | ~50-200ms (HTTPS) |
| **Latenza storage** | ~2ms (INSERT SQL) | ~200-500ms (HTTPS + LLM extraction) |
| **Privacy** | Plaintext solo in-memory, cifrato a riposo | **Plaintext permanente su server terzi** |
---
## 9. VALUTAZIONE CRITICA ONESTA
### BENEFICI REALI dell'integrazione Supermemory
1. **Extraction automatica** — Non dover fare nulla: `client.add(conversation)` e i fatti vengono estratti. Risparmi ~3-5 giorni dev della Fase 1.
2. **Contradiction resolution SOTA** — Il loro graph engine è #1 sui benchmark. Implementarlo in-house richiede un LLM prompt ben ingegnerizzato + logica di matching.
3. **User Profiles pronti** — `client.profile()` restituisce static+dynamic in ~50ms. In-house devi costruire la logica di aggregazione.
4. **Temporal forgetting** — Gestiscono scadenza e noise filtering. In-house è semplice (TTL + cron) ma loro lo fanno meglio con LLM.
### PROBLEMI CRITICI dell'integrazione
1. **PRIVACY DISTRUTTA** — Il punto più grave. Tutto il modello E2E di Adiuva si basa su: "il backend non persiste mai plaintext". Supermemory riceve e **tiene** tutti i dati utente in chiaro. Per un'app che vende privacy, è un dealbreaker.
2. **LATENZA AGGIUNTA** — Ogni conversazione aggiunge:
- +50-200ms PRE-LLM (profile fetch via HTTPS)
- +200-500ms POST-LLM (add + extraction)
- vs. ~5-15ms totali con DB locale
- Su connessione instabile: timeout → memoria persa
3. **SINGLE POINT OF FAILURE** — Se `supermemory.ai` è down, la tua app perde TUTTA la memoria. Non ha fallback locale. Le tue 4 tabelle PostgreSQL attuali sono resilienti.
4. **VENDOR LOCK-IN** — I fatti estratti vivono nel loro cloud. Se chiudono, cambi pricing, o limiti free tier → migrazione dolorosa. Con la soluzione in-house hai ownership totale.
5. **COSTI CHE SCALANO MALE** — Free tier: 1M tokens/mese = ~250 conversazioni medie. Con 100 utenti attivi:
- ~30 conv/utente/mese = 3000 conv = ~12M tokens → **Scale plan $399/mo**
- In-house: $3-5/mo per le stesse 3000 conv con GPT-4o-mini
6. **ARCHITETTURA OVERHAUL** — Devi:
- Rimuovere/sostituire le 4 tabelle memory
- Riscrivere i 9 tool dell'agente per usare l'SDK
- Rimuovere la logica di encryption
- Cambiare il contratto WebSocket (se i tool memory cambiano)
- **Effort paradossale**: più lavoro per integrare che per migliorare in-house
7. **NON SELF-HOSTABILE** — Il repo GitHub è MIT ma il core è Cloudflare Workers + KV + Postgres. Self-hosting richiede Cloudflare infra o riscrittura significativa.
### BILANCIO FINALE
| Pro | Peso |
|-----|------|
| Extraction + Graph + Forgetting gratis | ★★★★ |
| User profiles automatici | ★★★ |
| Zero dev effort per le feature memory | ★★★ |
| **Totale Pro** | **10/15** |
| Contro | Peso |
|--------|------|
| Privacy distrutta (dealbreaker per il brand) | ★★★★★ |
| Vendor lock-in su funzionalità core | ★★★★ |
| Costi $399/mo a regime vs $5/mo in-house | ★★★★ |
| Latenza +200-700ms per conversazione | ★★★ |
| Single point of failure | ★★★ |
| **Totale Contro** | **19/25** |
> **Verdetto: l'integrazione Supermemory SaaS è netta-negativa per Adiuva.**
> I benefici (extraction, graph, profiles) sono replicabili in-house a costo inferiore,
> senza sacrificare privacy, ownership e resilienza.
---
## 10. PROSSIMI PASSI
- [ ] Approvazione piano di miglioramento in-house (4 fasi)
- [ ] Design schema migration per `memory_fact` e `user_profile`
- [ ] Implementazione Fase 1 (fact extraction + graph)
- [ ] Test extraction con conversazioni reali
- [ ] Implementazione Fasi 2-4 incrementalmente

303
docs/local_agent_v2_mem.md Normal file
View File

@@ -0,0 +1,303 @@
# Local Agent V2 — Working Memory
## Decisioni confermate
- **Breaking change**: nessuna backward compatibility con prompt_template
- **Preprocessing**: lato backend Python, approccio (c): handler predefiniti + fallback LLM futuro
- **Primo handler**: email HTML. Altri tipi in futuro.
- **Journey**: produce agent_config strutturato (JSON), non prompt monolitico
- **L'utente vuole personalizzazione**: es. "summarize documenti nelle note per progetto"
- **File types**: qualsiasi tipo, anche mischiati nella stessa directory
- **Progetti**: numero variabile, deve scalare
---
## Architettura V2 — Flusso per file
### [A] Detect + Preprocess (Python puro, zero LLM)
```
File raw da Electron
detect_content_type(filename, raw_content)
→ heuristic: extension + content patterns
→ match a un content_type dal agent_config
preprocess(content_type, raw_content)
→ handler specifico (es. email_html → BeautifulSoup)
→ Output: { content_type, clean_text, metadata: {subject, from, date, ...} }
```
Handlers predefiniti (MVP: solo email_html):
- `email_html`: strip tags, estrai subject/from/to/date, splitta thread → ultimo msg
- `generic_html`: estrai main content, strip nav/footer (futuro)
- `plain_text`: pass-through (futuro)
- `csv`: parse + summary (futuro)
- `pdf`: estrai testo (futuro)
- Fallback: raw text con limit
### [B] Single LLM call — classify + extract + create
Una sola call LLM con tool calling che fa tutto:
**System prompt costruito da:**
1. Istruzioni base (update-first, isAiSuggested=1, ecc.)
2. Regole di estrazione del content_type (dal agent_config) ← posizione PROMINENTE
3. Global rules (dal agent_config)
4. Lista progetti compatta
5. Istruzioni procedurali: identifica progetto → query entità → estrai → crea/aggiorna
**User message:**
- Filename + metadata
- Testo pulito
**Tools disponibili:**
- list_tasks, list_notes, list_timelines (query)
- create_task, create_note, create_timeline
- update_task, update_note, update_timeline
**Max steps:** 12 (loop tool calling)
### Journey → agent_config (JSON strutturato)
```json
{
"content_types": [
{
"id": "email_html",
"label": "Email HTML",
"detection_hint": "HTML con struttura email (From/To/Subject)",
"preprocessing": "email_html",
"extraction_prompt": "Per ogni email: azione diretta → task..."
}
],
"global_rules": [
"Se il file non è riconducibile a nessun progetto, non creare entità."
],
"data_types": ["tasks", "notes", "timelines"]
}
```
---
## Problemi V1 e come V2 li risolve
| # | Problema V1 | Soluzione V2 |
|---|---|---|
| P1 | HTML raw all'LLM | Preprocessing Python → testo pulito |
| P2 | Troncamento 4000 char | Testo preprocessato, molto più denso |
| P3 | Nessuna gestione thread | Handler email splitta thread, ultimo msg |
| P4 | Project matching debole | Filename come segnale primario + testo pulito |
| P5 | custom_prompt in coda | Extraction rules in posizione prominente |
| P6 | Nessun preprocessing | Handler predefiniti per tipo |
| P7 | items_created sempre 0 | Fix nel runner (contare tool call results) |
---
## Modifiche al codice necessarie
### Backend (adiuva-api)
1. **Nuovo modulo**: `app/core/preprocessors/` con handler per tipo
- `__init__.py` — registry + detect + dispatch
- `email_html.py` — BeautifulSoup: strip, metadata, thread split
- `base.py` — interfaccia base + fallback
2. **`agent_setup.py`**: Journey produce agent_config JSON, non prompt_template
- System prompt aggiornato per generare JSON strutturato
- Validazione output con schema Pydantic
3. **`agent_runner.py`**: Flusso rivisto
- Rimuovere `_classify_file()` (Step 1 separato)
- Aggiungere preprocess step prima della call LLM
- Single LLM call con prompt tipo-specifico
- Contare items_created dai tool call results
4. **`models.py`**: `prompt_template: Text``agent_config: JSON`
### Frontend (adiuva)
5. **`store.ts`**: Campo `promptTemplate``agentConfig`
6. **`JourneyDialog.tsx`**: Parsing JSON da journey reply
7. **`agent-scheduler.ts`**: Passa `agentConfig` al trigger
8. **Schema Pydantic/Zod**: Aggiornare per nuovo formato
---
---
## Stato implementazione
| Step | Stato | Branch |
|------|-------|--------|
| Step 1 — Preprocessors | ✅ DONE | `feature/batch-agent-v2` |
| Step 2 — agent_runner.py refactor | ✅ DONE | `feature/batch-agent-v2` |
| Step 3 — Model/schema agent_config | ✅ DONE | `feature/batch-agent-v2` |
| Step 4 — Journey setup output strutturato | ✅ DONE | `feature/batch-agent-v2` |
| Step 5 — Frontend | ✅ DONE | main |
| Step 6 — E2E con file reali | ⏳ TODO | — |
---
## Convenzioni test (aggiornate dopo implementazione step 12)
### Struttura fixture
```
tests/fixtures/<step_name>/
cases.yaml ← definizioni dei casi
data/ ← file di input (HTML, txt, ...)
```
Opzione CLI per sovrascrivere la cartella:
```bash
pytest tests/test_<step>.py -v --<step>-dir /path/to/folder
```
Registrata in `conftest.py` via `pytest_addoption`. La cartella custom deve avere la stessa struttura (`cases.yaml` + `data/`).
Opzioni registrate finora:
- `--preprocess-dir` → step 1
- `--runner-dir` → step 2 (aggiungere `--journey-dir` per step 4, `--e2e-dir` per step 6)
### Schema YAML — principi (step 1 vs step 2)
**Step 1 (preprocessors) — test deterministici, no LLM:**
- Chiavi piatte: `detect:`, `process:`, `no_html:`, `min_chars:`, ecc.
- Nessun `description``score_name` (Langfuse non usato)
- `file:` serve sia come nome su disco che come filename passato alla funzione
- `generate: binary_noise` per contenuto sintetico
**Step 2+ (runner, journey, e2e) — test LLM eval:**
- `file:` = nome su disco in `data/`
- `file_path:` = path vista dall'agent (separato perché più casi riusano lo stesso file con path diversi, es. per testare project matching da filename vs content)
- `description:` presente nel YAML (utile nel report pytest)
- `score_name:` presente nel YAML (il nome con cui lo score viene inviato a Langfuse)
- `projects:` lista di nomi simbolici (`alpha`, `beta`) o dict inline `{id, name, status}` — risolta da `_resolve_projects()`
- Assertion keys piatte: `expect_insert`, `expect_no_insert`, `expect_project_id`, `expect_dedup`
### Parametrize da YAML
Usare `pytest_generate_tests` per accedere all'opzione CLI custom:
```python
def pytest_generate_tests(metafunc):
if "runner_case" not in metafunc.fixturenames:
return
cases = _load_cases(metafunc.config)
metafunc.parametrize("runner_case", cases, ids=[c["id"] for c in cases])
```
I test accedono alla dir via `pytestconfig`:
```python
async def test_eval_runner(runner_case, pytestconfig):
data_dir = _fixtures_dir(pytestconfig) / "data"
```
### Langfuse V3 — pattern corretto
**Problemi riscontrati con V2 API (non usare):**
- `lf.trace()` → non esiste in V3
- `lf.score(trace_id=...)` → non esiste in V3
- `lf.start_as_current_observation(user_id=..., session_id=...)` → kwargs non accettati
**Pattern V3 corretto nei test eval:**
```python
from contextlib import nullcontext
lf = get_langfuse()
obs_ctx = lf.start_as_current_observation(
name="eval-runner-2.1",
metadata={"step": "2", "case_id": "2.1"},
) if lf else nullcontext()
with obs_ctx as obs:
# ... esegui il codice ...
if obs is not None:
obs.score(name="runner.email_to_task", value=1.0, comment="...")
if lf:
lf.flush()
```
**Pattern V3 corretto nel codice produzione (`agent_runner.py`, `deep_agent.py`, `agent_setup.py`):**
```python
# user_id e session_id vanno in metadata, NON come kwarg diretti
lf.start_as_current_observation(
as_type="span",
name="my-span",
metadata={"user_id": user_id, "session_id": session_id},
input=...,
)
```
### compile_prompt — non usare template.format() direttamente
`get_prompt_or_fallback()` ritorna il template grezzo. Langfuse usa `{{variable}}`, il fallback usa `{variable}`. Usare sempre `compile_prompt()` che dispatcha correttamente:
```python
from app.core.langfuse_client import compile_prompt, get_prompt_or_fallback
template, prompt_obj = get_prompt_or_fallback("my_prompt", FALLBACK_PROMPT)
compiled = compile_prompt(template, prompt_obj, var1=val1, var2=val2)
# ↑ usa prompt_obj.compile() per Langfuse, template.format() per fallback
```
**Non fare mai:**
```python
compiled = template.format(var1=val1) # ❌ rompe con Langfuse (usa {{var1}})
```
### Struttura test file per step LLM eval
Pattern consolidato da `test_agent_runner_v2.py`:
```
tests/test_<step>.py
├── Costanti (_USER_ID, _DEFAULT_FIXTURE_DIR, _AGENT_CONFIG, simboli progetto)
├── _fixtures_dir(config) + _load_cases(config) + _read_case_file(case, data_dir)
├── _resolve_projects(entries) — gestisce sia stringhe simboliche che dict inline
├── pytest_generate_tests — parametrize eval tests da YAML
├── Helper builders (_make_config, _make_run_log, _make_manager, _make_executor)
├── Unit tests statici (no YAML, no LLM)
└── test_eval_<step>(runner_case, pytestconfig) — unica funzione parametrizzata
↓ legge file, risolve progetti, crea executor, chiama runner
↓ _evaluate_case(case, calls, kwargs) → (score, comment)
↓ obs.score(...) se Langfuse attivo
```
`_evaluate_case()` centralizza tutta la logica di assertion mappata dalle chiavi YAML — nessuna logica di assert sparsa nel test.
### Step 4 — Journey V2: pattern specifici
**Sentinelle:** `AGENT_CONFIG_START` / `AGENT_CONFIG_END` (rimpiazzano `PROMPT_TEMPLATE_START/END`)
**Langfuse prompt:** `journey_system_v2` (non `journey_system` della V1)
**Frame key:** `existing_config` (JSON string, rimpiazza `existing_template` stringa in prosa)
**Ritorno handler:** chiave `agent_config` (JSON string validato da Pydantic) invece di `prompt_template`
**Executor per test journey:** usa `set_client_executor(executor)` / `clear_client_executor()` direttamente nel test helper `_run_journey`, mimando `device_ws._handle_journey_start`. Re-imposta prima di ogni chiamata (start + ogni message).
**Fixture YAML journey:** `directory_files: [{path, content_file}]` + `user_messages: [...]` + assertion keys flat (`expect_question`, `expect_done`, `expect_valid_config`, `expect_content_type_id`, `expect_extraction_contains`, `expect_global_rules`)
**Test nudge (unit):** popola `_sessions` con una `JourneySession` fake con `_MAX_TURNS` turni, patcha `_call_llm_with_tools`, verifica che il secondo call riceva il nudge con i nuovi marker nelle `history`.
**JSON nel system prompt:** i literal `{` e `}` nel JSON di esempio devono essere `{{` e `}}` per il fallback `str.format()`. Le variabili template usano `{var}` (singolo). `compile_prompt()` gestisce il dispatch corretto per Langfuse vs fallback.
### Step 5 — Frontend V2: pattern specifici
**Store (`LocalAgentLocalConfig`):** campo `agentConfig: Record<string, unknown> | null` sostituisce `promptTemplate: string`. Stored nell'electron-store come oggetto JSON.
**Trigger body:** lo scheduler e `runNow` mandano `agentConfig` (oggetto, non `customAgentPrompt` stringa).
**WS frame `journey_start`:** campo `existingConfig` (JSON string) rimpiazza `existingTemplate` stringa. Backend si aspetta `existing_config` (snake_case via `toSnakeCase()`).
**WS frame `journey_reply`:** campo `agentConfig` (JSON string) rimpiazza `promptTemplate`. Il FE lo riceve come stringa, lo parsa con `JSON.parse()` → oggetto.
**tRPC journey router:** ritorna `{ ..., agentConfig: string | undefined }`. I componenti React lo parsano localmente.
**Cloud agents:** non migrati — mantengono `promptTemplate: string` in `CloudAgentConfigSchema`, `agentCloudRouter`, `PromptBuilderChat.onPromptUpdate`. Il `PromptBuilderChat` ora ha anche `onConfigUpdate` per il path local.
**`JourneyDialog`:** props `currentConfig: Record<string, unknown> | null` + `onSaved(agentConfig: Record<string, unknown>)`. Mostra un summary human-readable (`AgentConfigSummary`) invece del raw prompt string.
**`InlineAgentCreationStepper`:** mantiene `promptTemplate` state per cloud; aggiunge `agentConfig` state per local. `PromptBuilderChat` richiama `onConfigUpdate` per local e `onPromptUpdate` per cloud (backward-compat).

10
skills-lock.json Normal file
View File

@@ -0,0 +1,10 @@
{
"version": 1,
"skills": {
"boost-prompt": {
"source": "github/awesome-copilot",
"sourceType": "github",
"computedHash": "2621a44fbd9fc2636953d1e6e39e5faeed995f7fb958ec12cc98a2f0576f6fa7"
}
}
}