first commit
This commit is contained in:
185
.claude/CLAUDE.md
Normal file
185
.claude/CLAUDE.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Keeping This File Up to Date
|
||||
|
||||
Update this file whenever a lesson is learned during development. Specifically, update CLAUDE.md when:
|
||||
|
||||
- A non-obvious architectural decision is made or discovered
|
||||
- A gotcha, footgun, or surprising behavior is encountered (and the fix/workaround)
|
||||
- A new command, workflow, or tool is added to the project
|
||||
- A convention is established that isn't obvious from reading the code
|
||||
- An integration detail is clarified (e.g., how the WebSocket protocol actually behaves, edge cases in the agent tool call cycle)
|
||||
|
||||
Do **not** add things already derivable from reading the code, generic best practices, or ephemeral task notes — only durable, reusable knowledge.
|
||||
|
||||
## Repository Layout
|
||||
|
||||
This repo contains two independent projects:
|
||||
|
||||
- **`adiuva/`** — Electron desktop app (TypeScript/React)
|
||||
- **`adiuva-api/`** — FastAPI backend (Python)
|
||||
|
||||
---
|
||||
|
||||
## adiuva (Electron App)
|
||||
|
||||
### Commands
|
||||
|
||||
```bash
|
||||
npm run start # Start dev server (Electron + Vite)
|
||||
npm run lint # ESLint
|
||||
npm run knip # Dead code analysis
|
||||
npm run make # Build installers (Windows/Linux/macOS)
|
||||
npm run package # Package without creating installers
|
||||
```
|
||||
|
||||
Database schema changes require running Drizzle Kit — check `package.json` for db commands.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Renderer (React 19 + TanStack Router)
|
||||
↓ tRPC over contextBridge
|
||||
Main Process (Electron)
|
||||
├── SQLite (better-sqlite3 + Drizzle ORM) — local app data
|
||||
├── LanceDB — local vector embeddings
|
||||
└── WebSocket client → adiuva-api backend
|
||||
```
|
||||
|
||||
**IPC model**: The renderer calls tRPC procedures defined in `src/main/router/`. The preload script (`src/preload/`) bridges them with `contextIsolation: true`.
|
||||
|
||||
**Backend integration**: The Electron main process connects to the FastAPI backend via WebSocket. The backend sends tool calls (e.g., `insert`, `vector_search`) which the main process executes against local SQLite via Drizzle and returns results. All AI intelligence lives on the backend — the app is a smart terminal.
|
||||
|
||||
**Key source directories**:
|
||||
- `src/main/agents/` — Agent scheduler
|
||||
- `src/main/ai/` — Orchestrator, token management
|
||||
- `src/main/db/` — SQLite schema (Drizzle) + LanceDB
|
||||
- `src/main/router/` — tRPC router (all IPC procedures)
|
||||
- `src/renderer/components/` — UI components (tasks, notes, projects, timeline, auth)
|
||||
- `src/renderer/routes/` — TanStack Router pages
|
||||
- `src/shared/` — Zod schemas shared between main/renderer (WebSocket frame types, casing utils)
|
||||
|
||||
**Path aliases** (tsconfig): `@/*` → `src/renderer/`, `@shared/*` → `src/shared/`
|
||||
|
||||
**WebSocket frame types** are defined in `src/shared/api-types.ts` using Zod. Client sends: `chat_request`, `floating_request`, `tool_result`. Server sends: `text_chunk`, `tool_call`, `final`, `ping`.
|
||||
|
||||
---
|
||||
|
||||
## adiuva-api (FastAPI Backend)
|
||||
|
||||
### Commands
|
||||
|
||||
```bash
|
||||
# Development
|
||||
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
||||
|
||||
# Production
|
||||
gunicorn app.main:app -k uvicorn.workers.UvicornWorker -w 4 --timeout 120
|
||||
|
||||
# Database migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Testing
|
||||
pytest
|
||||
pytest -v
|
||||
pytest tests/test_agents.py # single test file
|
||||
|
||||
# Linting/formatting
|
||||
ruff check .
|
||||
ruff format .
|
||||
|
||||
# Docker (full stack)
|
||||
docker compose up --build
|
||||
```
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
FastAPI app (app/main.py)
|
||||
├── Middleware: RateLimiter → Sanitizer → CORS
|
||||
├── HTTP Routes (app/api/routes/)
|
||||
│ ├── auth.py — register, login, token refresh
|
||||
│ ├── chat.py — POST /chat, POST /chat/embed, WS /chat/stream
|
||||
│ └── billing.py — Stripe subscriptions
|
||||
├── Agent System (app/agents/)
|
||||
│ ├── task_agent.py — 8 tools
|
||||
│ ├── project_agent.py — 6 tools
|
||||
│ ├── timeline_agent.py — 4 tools
|
||||
│ └── note_agent.py — 5 tools
|
||||
└── Orchestration (app/core/)
|
||||
├── agent_registry.py
|
||||
├── agent_runner.py
|
||||
├── llm.py — LiteLLM factory (100+ providers)
|
||||
└── memory_middleware.py
|
||||
```
|
||||
|
||||
**LLM routing**: GPT-4o-mini classifies incoming intent → routes to appropriate domain agent → agent uses GPT-4o with its tool set → sends tool calls back to Electron client for local execution.
|
||||
|
||||
**Zero-trust data model**: The backend never decrypts user data. PostgreSQL stores only auth, billing, and metadata. All user content stays local on the Electron client.
|
||||
|
||||
**Tier system**: Free / Pro / Power / Team — enforced in `app/api/middleware/rate_limit.py` (20–200 req/min sliding window) and `app/billing/tier_manager.py`.
|
||||
|
||||
**Key config**: `app/config/settings.py` — all env vars via Pydantic Settings. Copy `.env.example` to `.env` for local dev.
|
||||
|
||||
**Database**: PostgreSQL with async SQLAlchemy 2.0 + asyncpg. Migrations in `alembic/versions/`. Models in `app/models.py`, Pydantic schemas in `app/schemas.py`.
|
||||
|
||||
**Testing**: pytest with pytest-asyncio. Fixtures in `tests/conftest.py`. Use in-memory SQLite for DB tests.
|
||||
|
||||
---
|
||||
|
||||
## Microservices Migration (In Progress)
|
||||
|
||||
The monolith (`adiuva-api/app/`) is being split into independent services under `adiuva-api/services/`. Architectural decisions are tracked in repo memory (`/memories/repo/microservices-architecture.md`).
|
||||
|
||||
### Target Services (MVP)
|
||||
|
||||
| Service | Owns | Scaling |
|
||||
|---------|------|---------|
|
||||
| **Auth** | JWT RS256 issuance, users, refresh_tokens, subscriptions | Stateless |
|
||||
| **WS Gateway** | WebSocket connections, Redis frame routing, device registry | Sticky (user_id) |
|
||||
| **Chat** | deep_agent, memory, domain agents (task/note/project/timeline), LLM | Stateless |
|
||||
| **Batch Agent** | agent_runner, journey builder, filesystem_agent, integrations (+ Langfuse tracing TODO) | Stateless |
|
||||
| **Billing** | Stripe, tier_manager | Stateless |
|
||||
|
||||
**API Gateway**: Traefik with ForwardAuth → Auth `/verify`. Injects `X-User-Id`, `X-User-Email`, `X-User-Tier` headers. Downstream services trust these headers.
|
||||
|
||||
### Monorepo Structure
|
||||
|
||||
```
|
||||
adiuva-api/
|
||||
├── shared/ ← SQLAlchemy models, Pydantic schemas, config, redis utils
|
||||
├── services/
|
||||
│ ├── auth/
|
||||
│ ├── ws-gateway/
|
||||
│ ├── chat/
|
||||
│ ├── batch-agent/
|
||||
│ └── billing/
|
||||
├── alembic/ ← Centralized migrations (shared DB)
|
||||
├── docker-compose.yml
|
||||
└── traefik/
|
||||
```
|
||||
|
||||
### Key Conventions
|
||||
|
||||
- **shared/ module**: Imported by all services. Contains models, schemas, config, DB session factory, Redis client. Changes here affect all services — be careful.
|
||||
- **Redis is the glue**: WS Gateway ↔ Chat/Batch communication is entirely via Redis pub/sub and lists. See `/memories/repo/microservices-architecture.md` for channel naming.
|
||||
- **No JWT validation in downstream services**: Only Auth Service has the private key. Other services receive pre-validated identity via Traefik headers.
|
||||
- **Tool call round-trip**: Chat/Batch → publish `tool_call` to `ws:out:{user_id}` → WS Gateway forwards to Electron → Electron replies `tool_result` → WS Gateway LPUSH to `tool:result:{call_id}` → Chat/Batch BRPOP with 30s timeout.
|
||||
- **Migration strategy**: Strangler fig. Extract one service at a time. The monolith `app/` continues to work until all services are extracted. Don't delete monolith code until the service replacement is tested.
|
||||
- **Storage & Plugin services**: Removed from codebase. Will be re-evaluated in future feature planning.
|
||||
|
||||
### Commands (Microservices)
|
||||
|
||||
```bash
|
||||
# Full stack
|
||||
docker compose up --build
|
||||
|
||||
# Single service dev (example: chat)
|
||||
docker compose up redis postgres auth
|
||||
cd services/chat && uvicorn app.main:app --reload --port 8002
|
||||
|
||||
# Migrations (still centralized)
|
||||
alembic upgrade head
|
||||
```
|
||||
9
.claude/settings.json
Normal file
9
.claude/settings.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"langfuse-docs": {
|
||||
"transportType": "http",
|
||||
"url": "https://langfuse.com/api/mcp",
|
||||
"verifySsl": true
|
||||
}
|
||||
}
|
||||
}
|
||||
6
.claude/settings.local.json
Normal file
6
.claude/settings.local.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"enabledMcpjsonServers": [
|
||||
"langfuse-docs"
|
||||
],
|
||||
"enableAllProjectMcpServers": true
|
||||
}
|
||||
25
.claude/skills/boost-prompt/SKILL.md
Normal file
25
.claude/skills/boost-prompt/SKILL.md
Normal file
@@ -0,0 +1,25 @@
|
||||
---
|
||||
name: boost-prompt
|
||||
description: 'Interactive prompt refinement workflow: interrogates scope, deliverables, constraints; copies final markdown to clipboard; never writes code. Requires the Joyride extension.'
|
||||
---
|
||||
|
||||
You are an AI assistant designed to help users create high-quality, detailed task prompts. DO NOT WRITE ANY CODE.
|
||||
|
||||
Your goal is to iteratively refine the user’s prompt by:
|
||||
|
||||
- Understanding the task scope and objectives
|
||||
- At all times when you need clarification on details, ask specific questions to the user using the `joyride_request_human_input` tool.
|
||||
- Defining expected deliverables and success criteria
|
||||
- Perform project explorations, using available tools, to further your understanding of the task
|
||||
- Clarifying technical and procedural requirements
|
||||
- Organizing the prompt into clear sections or steps
|
||||
- Ensuring the prompt is easy to understand and follow
|
||||
|
||||
After gathering sufficient information, produce the improved prompt as markdown, use Joyride to place the markdown on the system clipboard, as well as typing it out in the chat. Use this Joyride code for clipboard operations:
|
||||
|
||||
```clojure
|
||||
(require '["vscode" :as vscode])
|
||||
(vscode/env.clipboard.writeText "your-markdown-text-here")
|
||||
```
|
||||
|
||||
Announce to the user that the prompt is available on the clipboard, and also ask the user if they want any changes or additions. Repeat the copy + chat + ask after any revisions of the prompt.
|
||||
202
.claude/skills/brand-guidelines/LICENSE.txt
Normal file
202
.claude/skills/brand-guidelines/LICENSE.txt
Normal file
@@ -0,0 +1,202 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
73
.claude/skills/brand-guidelines/SKILL.md
Normal file
73
.claude/skills/brand-guidelines/SKILL.md
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
name: brand-guidelines
|
||||
description: Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
# Anthropic Brand Styling
|
||||
|
||||
## Overview
|
||||
|
||||
To access Anthropic's official brand identity and style resources, use this skill.
|
||||
|
||||
**Keywords**: branding, corporate identity, visual identity, post-processing, styling, brand colors, typography, Anthropic brand, visual formatting, visual design
|
||||
|
||||
## Brand Guidelines
|
||||
|
||||
### Colors
|
||||
|
||||
**Main Colors:**
|
||||
|
||||
- Dark: `#141413` - Primary text and dark backgrounds
|
||||
- Light: `#faf9f5` - Light backgrounds and text on dark
|
||||
- Mid Gray: `#b0aea5` - Secondary elements
|
||||
- Light Gray: `#e8e6dc` - Subtle backgrounds
|
||||
|
||||
**Accent Colors:**
|
||||
|
||||
- Orange: `#d97757` - Primary accent
|
||||
- Blue: `#6a9bcc` - Secondary accent
|
||||
- Green: `#788c5d` - Tertiary accent
|
||||
|
||||
### Typography
|
||||
|
||||
- **Headings**: Poppins (with Arial fallback)
|
||||
- **Body Text**: Lora (with Georgia fallback)
|
||||
- **Note**: Fonts should be pre-installed in your environment for best results
|
||||
|
||||
## Features
|
||||
|
||||
### Smart Font Application
|
||||
|
||||
- Applies Poppins font to headings (24pt and larger)
|
||||
- Applies Lora font to body text
|
||||
- Automatically falls back to Arial/Georgia if custom fonts unavailable
|
||||
- Preserves readability across all systems
|
||||
|
||||
### Text Styling
|
||||
|
||||
- Headings (24pt+): Poppins font
|
||||
- Body text: Lora font
|
||||
- Smart color selection based on background
|
||||
- Preserves text hierarchy and formatting
|
||||
|
||||
### Shape and Accent Colors
|
||||
|
||||
- Non-text shapes use accent colors
|
||||
- Cycles through orange, blue, and green accents
|
||||
- Maintains visual interest while staying on-brand
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Font Management
|
||||
|
||||
- Uses system-installed Poppins and Lora fonts when available
|
||||
- Provides automatic fallback to Arial (headings) and Georgia (body)
|
||||
- No font installation required - works with existing system fonts
|
||||
- For best results, pre-install Poppins and Lora fonts in your environment
|
||||
|
||||
### Color Application
|
||||
|
||||
- Uses RGB color values for precise brand matching
|
||||
- Applied via python-pptx's RGBColor class
|
||||
- Maintains color fidelity across different systems
|
||||
177
.claude/skills/frontend-design/LICENSE.txt
Normal file
177
.claude/skills/frontend-design/LICENSE.txt
Normal file
@@ -0,0 +1,177 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
42
.claude/skills/frontend-design/SKILL.md
Normal file
42
.claude/skills/frontend-design/SKILL.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
name: frontend-design
|
||||
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
|
||||
|
||||
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
|
||||
|
||||
## Design Thinking
|
||||
|
||||
Before coding, understand the context and commit to a BOLD aesthetic direction:
|
||||
- **Purpose**: What problem does this interface solve? Who uses it?
|
||||
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
|
||||
- **Constraints**: Technical requirements (framework, performance, accessibility).
|
||||
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
|
||||
|
||||
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
|
||||
|
||||
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
|
||||
- Production-grade and functional
|
||||
- Visually striking and memorable
|
||||
- Cohesive with a clear aesthetic point-of-view
|
||||
- Meticulously refined in every detail
|
||||
|
||||
## Frontend Aesthetics Guidelines
|
||||
|
||||
Focus on:
|
||||
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
|
||||
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
|
||||
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
|
||||
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
|
||||
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
|
||||
|
||||
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
|
||||
|
||||
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
|
||||
|
||||
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
|
||||
|
||||
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
|
||||
202
.claude/skills/webapp-testing/LICENSE.txt
Normal file
202
.claude/skills/webapp-testing/LICENSE.txt
Normal file
@@ -0,0 +1,202 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
96
.claude/skills/webapp-testing/SKILL.md
Normal file
96
.claude/skills/webapp-testing/SKILL.md
Normal file
@@ -0,0 +1,96 @@
|
||||
---
|
||||
name: webapp-testing
|
||||
description: Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
# Web Application Testing
|
||||
|
||||
To test local web applications, write native Python Playwright scripts.
|
||||
|
||||
**Helper Scripts Available**:
|
||||
- `scripts/with_server.py` - Manages server lifecycle (supports multiple servers)
|
||||
|
||||
**Always run scripts with `--help` first** to see usage. DO NOT read the source until you try running the script first and find that a customized solution is abslutely necessary. These scripts can be very large and thus pollute your context window. They exist to be called directly as black-box scripts rather than ingested into your context window.
|
||||
|
||||
## Decision Tree: Choosing Your Approach
|
||||
|
||||
```
|
||||
User task → Is it static HTML?
|
||||
├─ Yes → Read HTML file directly to identify selectors
|
||||
│ ├─ Success → Write Playwright script using selectors
|
||||
│ └─ Fails/Incomplete → Treat as dynamic (below)
|
||||
│
|
||||
└─ No (dynamic webapp) → Is the server already running?
|
||||
├─ No → Run: python scripts/with_server.py --help
|
||||
│ Then use the helper + write simplified Playwright script
|
||||
│
|
||||
└─ Yes → Reconnaissance-then-action:
|
||||
1. Navigate and wait for networkidle
|
||||
2. Take screenshot or inspect DOM
|
||||
3. Identify selectors from rendered state
|
||||
4. Execute actions with discovered selectors
|
||||
```
|
||||
|
||||
## Example: Using with_server.py
|
||||
|
||||
To start a server, run `--help` first, then use the helper:
|
||||
|
||||
**Single server:**
|
||||
```bash
|
||||
python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py
|
||||
```
|
||||
|
||||
**Multiple servers (e.g., backend + frontend):**
|
||||
```bash
|
||||
python scripts/with_server.py \
|
||||
--server "cd backend && python server.py" --port 3000 \
|
||||
--server "cd frontend && npm run dev" --port 5173 \
|
||||
-- python your_automation.py
|
||||
```
|
||||
|
||||
To create an automation script, include only Playwright logic (servers are managed automatically):
|
||||
```python
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True) # Always launch chromium in headless mode
|
||||
page = browser.new_page()
|
||||
page.goto('http://localhost:5173') # Server already running and ready
|
||||
page.wait_for_load_state('networkidle') # CRITICAL: Wait for JS to execute
|
||||
# ... your automation logic
|
||||
browser.close()
|
||||
```
|
||||
|
||||
## Reconnaissance-Then-Action Pattern
|
||||
|
||||
1. **Inspect rendered DOM**:
|
||||
```python
|
||||
page.screenshot(path='/tmp/inspect.png', full_page=True)
|
||||
content = page.content()
|
||||
page.locator('button').all()
|
||||
```
|
||||
|
||||
2. **Identify selectors** from inspection results
|
||||
|
||||
3. **Execute actions** using discovered selectors
|
||||
|
||||
## Common Pitfall
|
||||
|
||||
❌ **Don't** inspect the DOM before waiting for `networkidle` on dynamic apps
|
||||
✅ **Do** wait for `page.wait_for_load_state('networkidle')` before inspection
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Use bundled scripts as black boxes** - To accomplish a task, consider whether one of the scripts available in `scripts/` can help. These scripts handle common, complex workflows reliably without cluttering the context window. Use `--help` to see usage, then invoke directly.
|
||||
- Use `sync_playwright()` for synchronous scripts
|
||||
- Always close the browser when done
|
||||
- Use descriptive selectors: `text=`, `role=`, CSS selectors, or IDs
|
||||
- Add appropriate waits: `page.wait_for_selector()` or `page.wait_for_timeout()`
|
||||
|
||||
## Reference Files
|
||||
|
||||
- **examples/** - Examples showing common patterns:
|
||||
- `element_discovery.py` - Discovering buttons, links, and inputs on a page
|
||||
- `static_html_automation.py` - Using file:// URLs for local HTML
|
||||
- `console_logging.py` - Capturing console logs during automation
|
||||
35
.claude/skills/webapp-testing/examples/console_logging.py
Normal file
35
.claude/skills/webapp-testing/examples/console_logging.py
Normal file
@@ -0,0 +1,35 @@
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
# Example: Capturing console logs during browser automation
|
||||
|
||||
url = 'http://localhost:5173' # Replace with your URL
|
||||
|
||||
console_logs = []
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True)
|
||||
page = browser.new_page(viewport={'width': 1920, 'height': 1080})
|
||||
|
||||
# Set up console log capture
|
||||
def handle_console_message(msg):
|
||||
console_logs.append(f"[{msg.type}] {msg.text}")
|
||||
print(f"Console: [{msg.type}] {msg.text}")
|
||||
|
||||
page.on("console", handle_console_message)
|
||||
|
||||
# Navigate to page
|
||||
page.goto(url)
|
||||
page.wait_for_load_state('networkidle')
|
||||
|
||||
# Interact with the page (triggers console logs)
|
||||
page.click('text=Dashboard')
|
||||
page.wait_for_timeout(1000)
|
||||
|
||||
browser.close()
|
||||
|
||||
# Save console logs to file
|
||||
with open('/mnt/user-data/outputs/console.log', 'w') as f:
|
||||
f.write('\n'.join(console_logs))
|
||||
|
||||
print(f"\nCaptured {len(console_logs)} console messages")
|
||||
print(f"Logs saved to: /mnt/user-data/outputs/console.log")
|
||||
40
.claude/skills/webapp-testing/examples/element_discovery.py
Normal file
40
.claude/skills/webapp-testing/examples/element_discovery.py
Normal file
@@ -0,0 +1,40 @@
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
# Example: Discovering buttons and other elements on a page
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True)
|
||||
page = browser.new_page()
|
||||
|
||||
# Navigate to page and wait for it to fully load
|
||||
page.goto('http://localhost:5173')
|
||||
page.wait_for_load_state('networkidle')
|
||||
|
||||
# Discover all buttons on the page
|
||||
buttons = page.locator('button').all()
|
||||
print(f"Found {len(buttons)} buttons:")
|
||||
for i, button in enumerate(buttons):
|
||||
text = button.inner_text() if button.is_visible() else "[hidden]"
|
||||
print(f" [{i}] {text}")
|
||||
|
||||
# Discover links
|
||||
links = page.locator('a[href]').all()
|
||||
print(f"\nFound {len(links)} links:")
|
||||
for link in links[:5]: # Show first 5
|
||||
text = link.inner_text().strip()
|
||||
href = link.get_attribute('href')
|
||||
print(f" - {text} -> {href}")
|
||||
|
||||
# Discover input fields
|
||||
inputs = page.locator('input, textarea, select').all()
|
||||
print(f"\nFound {len(inputs)} input fields:")
|
||||
for input_elem in inputs:
|
||||
name = input_elem.get_attribute('name') or input_elem.get_attribute('id') or "[unnamed]"
|
||||
input_type = input_elem.get_attribute('type') or 'text'
|
||||
print(f" - {name} ({input_type})")
|
||||
|
||||
# Take screenshot for visual reference
|
||||
page.screenshot(path='/tmp/page_discovery.png', full_page=True)
|
||||
print("\nScreenshot saved to /tmp/page_discovery.png")
|
||||
|
||||
browser.close()
|
||||
@@ -0,0 +1,33 @@
|
||||
from playwright.sync_api import sync_playwright
|
||||
import os
|
||||
|
||||
# Example: Automating interaction with static HTML files using file:// URLs
|
||||
|
||||
html_file_path = os.path.abspath('path/to/your/file.html')
|
||||
file_url = f'file://{html_file_path}'
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True)
|
||||
page = browser.new_page(viewport={'width': 1920, 'height': 1080})
|
||||
|
||||
# Navigate to local HTML file
|
||||
page.goto(file_url)
|
||||
|
||||
# Take screenshot
|
||||
page.screenshot(path='/mnt/user-data/outputs/static_page.png', full_page=True)
|
||||
|
||||
# Interact with elements
|
||||
page.click('text=Click Me')
|
||||
page.fill('#name', 'John Doe')
|
||||
page.fill('#email', 'john@example.com')
|
||||
|
||||
# Submit form
|
||||
page.click('button[type="submit"]')
|
||||
page.wait_for_timeout(500)
|
||||
|
||||
# Take final screenshot
|
||||
page.screenshot(path='/mnt/user-data/outputs/after_submit.png', full_page=True)
|
||||
|
||||
browser.close()
|
||||
|
||||
print("Static HTML automation completed!")
|
||||
106
.claude/skills/webapp-testing/scripts/with_server.py
Normal file
106
.claude/skills/webapp-testing/scripts/with_server.py
Normal file
@@ -0,0 +1,106 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Start one or more servers, wait for them to be ready, run a command, then clean up.
|
||||
|
||||
Usage:
|
||||
# Single server
|
||||
python scripts/with_server.py --server "npm run dev" --port 5173 -- python automation.py
|
||||
python scripts/with_server.py --server "npm start" --port 3000 -- python test.py
|
||||
|
||||
# Multiple servers
|
||||
python scripts/with_server.py \
|
||||
--server "cd backend && python server.py" --port 3000 \
|
||||
--server "cd frontend && npm run dev" --port 5173 \
|
||||
-- python test.py
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import socket
|
||||
import time
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
def is_server_ready(port, timeout=30):
|
||||
"""Wait for server to be ready by polling the port."""
|
||||
start_time = time.time()
|
||||
while time.time() - start_time < timeout:
|
||||
try:
|
||||
with socket.create_connection(('localhost', port), timeout=1):
|
||||
return True
|
||||
except (socket.error, ConnectionRefusedError):
|
||||
time.sleep(0.5)
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Run command with one or more servers')
|
||||
parser.add_argument('--server', action='append', dest='servers', required=True, help='Server command (can be repeated)')
|
||||
parser.add_argument('--port', action='append', dest='ports', type=int, required=True, help='Port for each server (must match --server count)')
|
||||
parser.add_argument('--timeout', type=int, default=30, help='Timeout in seconds per server (default: 30)')
|
||||
parser.add_argument('command', nargs=argparse.REMAINDER, help='Command to run after server(s) ready')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Remove the '--' separator if present
|
||||
if args.command and args.command[0] == '--':
|
||||
args.command = args.command[1:]
|
||||
|
||||
if not args.command:
|
||||
print("Error: No command specified to run")
|
||||
sys.exit(1)
|
||||
|
||||
# Parse server configurations
|
||||
if len(args.servers) != len(args.ports):
|
||||
print("Error: Number of --server and --port arguments must match")
|
||||
sys.exit(1)
|
||||
|
||||
servers = []
|
||||
for cmd, port in zip(args.servers, args.ports):
|
||||
servers.append({'cmd': cmd, 'port': port})
|
||||
|
||||
server_processes = []
|
||||
|
||||
try:
|
||||
# Start all servers
|
||||
for i, server in enumerate(servers):
|
||||
print(f"Starting server {i+1}/{len(servers)}: {server['cmd']}")
|
||||
|
||||
# Use shell=True to support commands with cd and &&
|
||||
process = subprocess.Popen(
|
||||
server['cmd'],
|
||||
shell=True,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE
|
||||
)
|
||||
server_processes.append(process)
|
||||
|
||||
# Wait for this server to be ready
|
||||
print(f"Waiting for server on port {server['port']}...")
|
||||
if not is_server_ready(server['port'], timeout=args.timeout):
|
||||
raise RuntimeError(f"Server failed to start on port {server['port']} within {args.timeout}s")
|
||||
|
||||
print(f"Server ready on port {server['port']}")
|
||||
|
||||
print(f"\nAll {len(servers)} server(s) ready")
|
||||
|
||||
# Run the command
|
||||
print(f"Running: {' '.join(args.command)}\n")
|
||||
result = subprocess.run(args.command)
|
||||
sys.exit(result.returncode)
|
||||
|
||||
finally:
|
||||
# Clean up all servers
|
||||
print(f"\nStopping {len(server_processes)} server(s)...")
|
||||
for i, process in enumerate(server_processes):
|
||||
try:
|
||||
process.terminate()
|
||||
process.wait(timeout=5)
|
||||
except subprocess.TimeoutExpired:
|
||||
process.kill()
|
||||
process.wait()
|
||||
print(f"Server {i+1} stopped")
|
||||
print("All servers stopped")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
8
.mcp.json
Normal file
8
.mcp.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"langfuse-docs": {
|
||||
"type": "http",
|
||||
"url": "https://langfuse.com/api/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
9
.vscode/mcp.json
vendored
Normal file
9
.vscode/mcp.json
vendored
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"servers": {
|
||||
"langfuse-docs": {
|
||||
"url": "https://langfuse.com/api/mcp",
|
||||
"type": "http"
|
||||
}
|
||||
},
|
||||
"inputs": []
|
||||
}
|
||||
551
docs/LOCAL_AGENT_V2_PLAN.md
Normal file
551
docs/LOCAL_AGENT_V2_PLAN.md
Normal file
@@ -0,0 +1,551 @@
|
||||
# Local Agent V2 — Piano Implementativo
|
||||
|
||||
> Riferimento architetturale: [`local_agent_v2_mem.md`](local_agent_v2_mem.md)
|
||||
|
||||
---
|
||||
|
||||
## Panoramica
|
||||
|
||||
Il Local Agent V2 sostituisce il flusso a 3 call LLM (classification + processing separati)
|
||||
con un'architettura a 2 fasi:
|
||||
|
||||
1. **Detect + Preprocess** (Python puro, zero LLM) — identifica il tipo di contenuto e lo pulisce
|
||||
2. **Single LLM call** (classify + extract + create) — una sola call agentiva con tool calling
|
||||
|
||||
### Langfuse: Scoring + Prompt Management (hot-swap)
|
||||
|
||||
Ogni step include un test set con eval che inviano score a Langfuse.
|
||||
I **prompt sono gestiti da Langfuse Prompt Management** — modificabili dalla UI
|
||||
senza toccare codice. Ogni score è collegato alla **versione esatta del prompt**
|
||||
che lo ha prodotto, permettendo confronto A/B tra versioni.
|
||||
|
||||
**Workflow iterativo:**
|
||||
1. Scrivi/modifica il prompt nella UI di Langfuse (es. `unified_processing` v3)
|
||||
2. Lancia gli eval: `pytest tests/test_agent_runner_v2.py -k eval`
|
||||
3. Vedi in Langfuse: prompt v3 → score 0.6
|
||||
4. Modifica il prompt → v4
|
||||
5. Ri-lancia gli eval → prompt v4 → score 0.9
|
||||
6. Promuovi v4 a `production` label
|
||||
|
||||
**Prompt Langfuse da creare (con fallback hardcoded nel codice):**
|
||||
|
||||
| Nome Langfuse | Usato in | Descrizione |
|
||||
|---|---|---|
|
||||
| `unified_processing` | Step 2 (runner) | Prompt unico: classify + extract + create |
|
||||
| `journey_system_v2` | Step 4 (journey) | Journey chatbot → produce AgentConfig JSON |
|
||||
|
||||
**Pattern di scoring con prompt version linking:**
|
||||
|
||||
```python
|
||||
from app.core.langfuse_client import get_langfuse, get_prompt_or_fallback
|
||||
|
||||
def run_eval_with_prompt(prompt_name: str, fallback: str, eval_name: str, run_fn):
|
||||
"""Esegue un eval collegando score ↔ prompt version."""
|
||||
lf = get_langfuse()
|
||||
template, prompt_obj = get_prompt_or_fallback(prompt_name, fallback)
|
||||
|
||||
# Crea trace per l'eval
|
||||
trace = lf.trace(name=f"eval-{eval_name}") if lf else None
|
||||
|
||||
# Esegui la call LLM dentro una generation linkata al prompt
|
||||
if lf and trace:
|
||||
with lf.start_as_current_observation(
|
||||
as_type="generation",
|
||||
name=eval_name,
|
||||
prompt=prompt_obj, # ← linka alla versione esatta del prompt
|
||||
trace_id=trace.id,
|
||||
) as gen:
|
||||
result, score = run_fn(template)
|
||||
gen.update(output=str(result))
|
||||
else:
|
||||
result, score = run_fn(template)
|
||||
|
||||
# Score collegato al trace → visibile per prompt version in Langfuse
|
||||
if lf and trace:
|
||||
lf.score(
|
||||
trace_id=trace.id,
|
||||
name=eval_name,
|
||||
value=score,
|
||||
data_type="NUMERIC",
|
||||
)
|
||||
lf.flush()
|
||||
|
||||
return result, score
|
||||
```
|
||||
|
||||
**In Langfuse vedrai:**
|
||||
```
|
||||
Prompt: unified_processing
|
||||
├── v3 (2026-04-05) → avg score: 0.62 (12 evals)
|
||||
├── v4 (2026-04-07) → avg score: 0.85 (12 evals) ← production
|
||||
└── v5 (2026-04-08) → avg score: 0.91 (12 evals) ← candidate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 1 — Preprocessor: email HTML handler ✅ DONE
|
||||
|
||||
## Step 3 — Model e schema: `prompt_template` → `agent_config` ✅ DONE
|
||||
|
||||
Aggiunto in parallelo a Step 2 come prerequisito:
|
||||
- `app/schemas.py`: `ContentTypeConfig`, `AgentConfig`
|
||||
- `app/models.py`: `agent_config: JSON` (nullable, accanto a `prompt_template`)
|
||||
- `alembic/versions/a3b9c0d1e2f3_add_agent_config_to_local_agents.py`
|
||||
|
||||
## Step 2 — Refactor `agent_runner.py`: nuovo flusso per file ✅ DONE
|
||||
|
||||
**File da creare:**
|
||||
- `app/core/preprocessors/__init__.py` — registry, detect, dispatch
|
||||
- `app/core/preprocessors/base.py` — dataclass `PreprocessResult`, classe base
|
||||
- `app/core/preprocessors/email_html.py` — BeautifulSoup handler
|
||||
|
||||
**Cosa fa:**
|
||||
- `detect_content_type(filename, raw_content) -> str` — heuristic basata su extension + pattern nel contenuto
|
||||
- `preprocess(content_type, raw_content) -> PreprocessResult` — dispatch al handler corretto
|
||||
- `PreprocessResult`: `{ content_type, clean_text, metadata: {subject, from, to, date, ...} }`
|
||||
|
||||
**Handler `email_html`:**
|
||||
- Strip `<style>`, `<script>`, HTML tags → testo pulito (BeautifulSoup)
|
||||
- Estrai metadata: Subject, From, To, Date (da `<meta>`, header pattern, o content heuristic)
|
||||
- Split thread: identifica quote markers (`>`, `On ... wrote:`, `---Original Message---`) → isola l'ultimo messaggio
|
||||
- Fallback: se non riesce a splittare, restituisce tutto il testo pulito
|
||||
|
||||
**Handler fallback (`generic`):**
|
||||
- Strip HTML tags se presenti
|
||||
- Restituisce testo as-is con metadata minime (filename, extension)
|
||||
|
||||
**Dipendenze da aggiungere:**
|
||||
- `beautifulsoup4` (già probabilmente installata, verificare)
|
||||
- `lxml` (parser veloce per BS4, opzionale)
|
||||
|
||||
### Test set — Step 1
|
||||
|
||||
**File:** `tests/test_preprocessors.py`
|
||||
|
||||
| # | Test case | Input | Expected | Score name |
|
||||
|---|-----------|-------|----------|------------|
|
||||
| 1.1 | Detect email HTML | `.html` con `From:`, `To:`, `Subject:` | `content_type == "email_html"` | `preprocess.detect_email` |
|
||||
| 1.2 | Detect generic HTML | `.html` con `<nav>`, `<main>` | `content_type == "generic_html"` | `preprocess.detect_generic` |
|
||||
| 1.3 | Detect plain text | `.txt` | `content_type == "plain_text"` | `preprocess.detect_text` |
|
||||
| 1.4 | Detect unknown | `.xyz` binario | `content_type == "unknown"` | `preprocess.detect_unknown` |
|
||||
| 1.5 | Email: strip HTML | Email con `<style>`, CSS inline | `clean_text` senza tag HTML | `preprocess.email_strip` |
|
||||
| 1.6 | Email: extract metadata | Email con Subject/From/Date | metadata corretti | `preprocess.email_metadata` |
|
||||
| 1.7 | Email: split thread | Email con 3 risposte nested | `clean_text` = solo ultimo msg | `preprocess.email_thread` |
|
||||
| 1.8 | Email: singolo messaggio | Email senza thread | `clean_text` = intero body | `preprocess.email_single` |
|
||||
| 1.9 | Email: HTML pesante | Email con molto CSS/table layout | testo leggibile estratto | `preprocess.email_heavy_html` |
|
||||
| 1.10 | Fallback: file sconosciuto | File binario | `clean_text` con fallback | `preprocess.fallback` |
|
||||
|
||||
**Eval con Langfuse:**
|
||||
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_email_html_strip(sample_email_html):
|
||||
lf = get_langfuse()
|
||||
trace = lf.trace(name="eval-preprocess-email-strip") if lf else None
|
||||
|
||||
result = preprocess("email_html", sample_email_html)
|
||||
|
||||
# Assertions
|
||||
has_no_tags = "<" not in result.clean_text
|
||||
has_content = len(result.clean_text) > 50
|
||||
ratio = len(result.clean_text) / len(sample_email_html) # compression ratio
|
||||
|
||||
score = 1.0 if (has_no_tags and has_content and ratio < 0.5) else 0.0
|
||||
|
||||
if trace:
|
||||
lf.score(trace_id=trace.id, name="preprocess.email_strip", value=score,
|
||||
comment=f"ratio={ratio:.2f}, len={len(result.clean_text)}")
|
||||
lf.flush()
|
||||
|
||||
assert has_no_tags
|
||||
assert has_content
|
||||
```
|
||||
|
||||
**Criteri di successo:** tutti i 10 test passano, score medio ≥ 0.9
|
||||
|
||||
---
|
||||
|
||||
## Step 2 — Refactor `agent_runner.py`: nuovo flusso per file ✅ DONE
|
||||
|
||||
**File da modificare:**
|
||||
- `app/core/agent_runner.py`
|
||||
|
||||
**Cosa cambia:**
|
||||
- Rimuovere `_classify_file()` (Step 1 LLM separato)
|
||||
- Rimuovere `_BATCH_FILE_CLASSIFIER_PROMPT`
|
||||
- Aggiungere import del preprocessor
|
||||
- Nuovo flusso in `run_local_agent()`:
|
||||
|
||||
```python
|
||||
for file_path in file_paths:
|
||||
# 1. Leggi file raw
|
||||
raw_content = await execute_on_client(action="read_file_content", ...)
|
||||
|
||||
# 2. Detect + Preprocess (Python, zero LLM)
|
||||
content_type = detect_content_type(file_path, raw_content)
|
||||
preprocessed = preprocess(content_type, raw_content)
|
||||
|
||||
# 3. Fetch prompt da Langfuse (hot-swappable dalla UI) con fallback locale
|
||||
template, prompt_obj = get_prompt_or_fallback(
|
||||
"unified_processing", _UNIFIED_PROCESSING_PROMPT
|
||||
)
|
||||
extraction_rules = _get_extraction_rules(config.agent_config, content_type)
|
||||
system_prompt = template.format(
|
||||
extraction_rules=extraction_rules,
|
||||
global_rules="\n".join(config.agent_config.get("global_rules", [])),
|
||||
projects_list=_format_projects(projects),
|
||||
data_types=", ".join(config.data_types),
|
||||
filename=os.path.basename(file_path),
|
||||
metadata_section=_format_metadata(preprocessed.metadata),
|
||||
no_match_behavior=_get_no_match_behavior(config.agent_config),
|
||||
)
|
||||
|
||||
# 4. Single LLM call con tools (classify + extract + create)
|
||||
# La generation è linkata al prompt_obj → score visibili per versione
|
||||
user_message = _build_user_message(file_path, preprocessed)
|
||||
result = await _run_agent_with_tools(
|
||||
system_prompt=system_prompt,
|
||||
user_message=user_message,
|
||||
tools=processing_tools,
|
||||
max_steps=_MAX_PROCESSING_STEPS,
|
||||
langfuse_prompt=prompt_obj, # ← linka alla versione del prompt
|
||||
)
|
||||
```
|
||||
|
||||
**Prompt `unified_processing` (fallback locale, editabile da Langfuse UI):**
|
||||
|
||||
```
|
||||
You are a data extraction assistant for a freelance project management tool.
|
||||
|
||||
## Your process (follow this exact order)
|
||||
|
||||
### 1. Identify the project
|
||||
File: {filename}
|
||||
{metadata_section}
|
||||
|
||||
Existing projects:
|
||||
{projects_list}
|
||||
|
||||
Match this file to an existing project using the filename and content.
|
||||
If no project matches, {no_match_behavior}.
|
||||
|
||||
### 2. Check existing records
|
||||
Once you identify the project, use list_tasks/list_notes/list_timelines
|
||||
to see what already exists. NEVER create duplicates.
|
||||
|
||||
### 3. Extract and create/update
|
||||
{extraction_rules}
|
||||
|
||||
### Rules
|
||||
- Set isAiSuggested=1 on every new record
|
||||
- Set projectId on every record
|
||||
- Update existing records when a match is found by title/topic
|
||||
{global_rules}
|
||||
```
|
||||
|
||||
**Fix `items_created`:** contare i `create_*` tool calls nei risultati.
|
||||
|
||||
### Test set — Step 2
|
||||
|
||||
**File:** `tests/test_agent_runner_v2.py`
|
||||
|
||||
| # | Test case | Input | Expected | Score name |
|
||||
|---|-----------|-------|----------|------------|
|
||||
| 2.1 | Happy path: email → task | Email preprocessata con azione | `create_task` tool chiamato | `runner.email_to_task` |
|
||||
| 2.2 | Happy path: email → nota | Email informativa | `create_note` tool chiamato | `runner.email_to_note` |
|
||||
| 2.3 | Happy path: email → timeline | Email con data evento | `create_timeline` tool chiamato | `runner.email_to_timeline` |
|
||||
| 2.4 | Project matching: filename | File `ProjectX_report.html` | progetto ProjectX selezionato | `runner.project_filename` |
|
||||
| 2.5 | Project matching: contenuto | File con menzione progetto nel body | progetto corretto | `runner.project_content` |
|
||||
| 2.6 | No project match → regola globale | File senza match progetto | comportamento da global_rules | `runner.no_project` |
|
||||
| 2.7 | Deduplicazione | Task esistente + email simile | `update_task`, non `create_task` | `runner.dedup` |
|
||||
| 2.8 | items_created conteggio | 2 create + 1 update | `items_created == 2` | `runner.items_count` |
|
||||
| 2.9 | Device offline | No device | status=error | `runner.offline` |
|
||||
| 2.10 | File vuoto | Contenuto vuoto | skip senza errori | `runner.empty_file` |
|
||||
|
||||
**Eval con Langfuse (prompt hot-swap + score per versione):**
|
||||
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_email_to_task_e2e(mock_ws_executor):
|
||||
lf = get_langfuse()
|
||||
|
||||
# Il prompt viene da Langfuse → puoi cambiarlo dalla UI e ri-lanciare il test
|
||||
template, prompt_obj = get_prompt_or_fallback(
|
||||
"unified_processing", _UNIFIED_PROCESSING_PROMPT
|
||||
)
|
||||
|
||||
trace = lf.trace(
|
||||
name="eval-runner-email-to-task",
|
||||
metadata={"step": "2", "prompt_version": getattr(prompt_obj, "version", "fallback")},
|
||||
) if lf else None
|
||||
|
||||
config = _make_config(agent_config={
|
||||
"content_types": [{
|
||||
"id": "email_html",
|
||||
"extraction_prompt": "Azione diretta → task. Informativa → nota."
|
||||
}],
|
||||
"global_rules": [],
|
||||
"data_types": ["tasks", "notes"]
|
||||
})
|
||||
|
||||
# Mock preprocessed email with action request
|
||||
mock_file_content = "Subject: Fix the bug\nFrom: boss@co.com\n\nPlease fix the login bug by Friday."
|
||||
|
||||
tool_calls_made = []
|
||||
# ... setup mock that captures tool calls ...
|
||||
|
||||
await run_local_agent(user_id, config, run_log, device_mgr)
|
||||
|
||||
created_tasks = [c for c in tool_calls_made if c["name"] == "create_task"]
|
||||
score = 1.0 if len(created_tasks) == 1 else 0.0
|
||||
title_match = 1.0 if any("bug" in c["args"].get("title", "").lower() for c in created_tasks) else 0.0
|
||||
|
||||
if trace:
|
||||
# Score collegato al trace → Langfuse lo linka alla prompt version automaticamente
|
||||
lf.score(trace_id=trace.id, name="runner.email_to_task", value=score,
|
||||
comment=f"tasks_created={len(created_tasks)}")
|
||||
lf.score(trace_id=trace.id, name="runner.email_to_task.title", value=title_match)
|
||||
lf.flush()
|
||||
|
||||
assert score == 1.0
|
||||
assert title_match == 1.0
|
||||
```
|
||||
|
||||
**Criteri di successo:** tutti i 10 test passano, score medio ≥ 0.8
|
||||
|
||||
---
|
||||
|
||||
## Step 3 — Model e schema: `prompt_template` → `agent_config` ✅ DONE (vedi sopra)
|
||||
|
||||
**File da modificare:**
|
||||
- `app/models.py` — `LocalAgentConfig.prompt_template: Text` → `agent_config: JSON`
|
||||
- `app/schemas.py` — Pydantic schema per `AgentConfig`
|
||||
- `alembic/versions/` — nuova migration
|
||||
- `app/api/routes/agents.py` — aggiornare `trigger_agent_run` per leggere `agent_config`
|
||||
|
||||
**Pydantic schema:**
|
||||
|
||||
```python
|
||||
class ContentTypeConfig(BaseModel):
|
||||
id: str
|
||||
label: str = ""
|
||||
detection_hint: str = ""
|
||||
preprocessing: str = "generic" # nome handler: "email_html", "generic", ...
|
||||
extraction_prompt: str
|
||||
|
||||
class AgentConfig(BaseModel):
|
||||
content_types: list[ContentTypeConfig] = []
|
||||
global_rules: list[str] = []
|
||||
data_types: list[str] = []
|
||||
```
|
||||
|
||||
### Test set — Step 3
|
||||
|
||||
**File:** `tests/test_agent_config_schema.py`
|
||||
|
||||
| # | Test case | Input | Expected | Score name |
|
||||
|---|-----------|-------|----------|------------|
|
||||
| 3.1 | Schema valida | JSON completo | parsing OK | `schema.valid` |
|
||||
| 3.2 | Schema minima | Solo `data_types` | default applicati | `schema.minimal` |
|
||||
| 3.3 | Content type sconosciuto | `preprocessing: "pdf"` | accettato (futuro) | `schema.unknown_type` |
|
||||
| 3.4 | Migration up/down | Alembic migrate | nessun errore | `schema.migration` |
|
||||
| 3.5 | Trigger con agent_config | POST /agents/trigger | config parsata | `schema.trigger` |
|
||||
|
||||
**Criteri di successo:** tutti i 5 test passano
|
||||
|
||||
---
|
||||
|
||||
## Step 4 — Journey setup: output strutturato ✅ DONE
|
||||
|
||||
**File da modificare:**
|
||||
- `app/api/routes/agent_setup.py` — `_JOURNEY_SYSTEM_PROMPT` riscritta
|
||||
- `app/api/routes/agent_setup.py` — parsing output JSON invece di marker di testo
|
||||
|
||||
**Cosa cambia:**
|
||||
- Il journey produce un `AgentConfig` JSON, non un `prompt_template` in prosa
|
||||
- Il system prompt viene da Langfuse (`journey_system_v2`) con fallback locale
|
||||
→ **modificabile dalla UI senza toccare codice** per iterare sulla qualità del journey
|
||||
- Il system prompt istruisce l'LLM a:
|
||||
1. Esplorare la directory
|
||||
2. Identificare i tipi di contenuto presenti
|
||||
3. Per ogni tipo, chiedere all'utente le regole di estrazione
|
||||
4. Produrre un JSON strutturato conforme allo schema `AgentConfig`
|
||||
- I marker `PROMPT_TEMPLATE_START/END` diventano `AGENT_CONFIG_START/END`
|
||||
- Il parsing estrae e valida JSON con Pydantic
|
||||
- Ogni call LLM del journey è linkata al `prompt_obj` → score per versione
|
||||
|
||||
### Test set — Step 4
|
||||
|
||||
**File:** `tests/test_journey_v2.py`
|
||||
|
||||
| # | Test case | Input | Expected | Score name |
|
||||
|---|-----------|-------|----------|------------|
|
||||
| 4.1 | Journey start: esplora directory | Directory con email HTML | prima domanda pertinente | `journey.start` |
|
||||
| 4.2 | Journey: produce JSON valido | 3-5 turni di conversazione | `AgentConfig` valido | `journey.valid_json` |
|
||||
| 4.3 | Journey: rileva email HTML | Directory con `.html` email | content_type `email_html` presente | `journey.detect_email` |
|
||||
| 4.4 | Journey: regole custom utente | "crea solo note, no task" | `extraction_prompt` riflette la regola | `journey.custom_rules` |
|
||||
| 4.5 | Journey: global rules | "no progetto = no entità" | presente in `global_rules` | `journey.global_rules` |
|
||||
| 4.6 | Journey: nudge dopo max turns | Raggiunto limite turni | JSON prodotto comunque | `journey.nudge` |
|
||||
|
||||
**Eval con Langfuse (esempio LLM-as-judge):**
|
||||
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_journey_produces_valid_config(mock_ws_executor):
|
||||
lf = get_langfuse()
|
||||
trace = lf.trace(name="eval-journey-valid-config") if lf else None
|
||||
|
||||
# Simula journey completo: start + 3 messaggi
|
||||
reply = await handle_journey_start(user_id, {
|
||||
"agent_type": "local",
|
||||
"directory": "/test/emails",
|
||||
"data_types": ["tasks", "notes"],
|
||||
})
|
||||
|
||||
# Simula risposte utente
|
||||
for msg in ["They are email exports from Outlook", "Extract tasks from action items", "Yes, that looks correct"]:
|
||||
reply = await handle_journey_message(user_id, {
|
||||
"session_id": reply["session_id"],
|
||||
"message": msg,
|
||||
})
|
||||
if reply.get("done"):
|
||||
break
|
||||
|
||||
config_json = reply.get("agent_config")
|
||||
is_valid = False
|
||||
try:
|
||||
parsed = AgentConfig.model_validate_json(config_json)
|
||||
is_valid = len(parsed.content_types) > 0
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if trace:
|
||||
lf.score(trace_id=trace.id, name="journey.valid_json", value=1.0 if is_valid else 0.0,
|
||||
comment=f"config={config_json[:200] if config_json else 'None'}")
|
||||
lf.flush()
|
||||
|
||||
assert is_valid
|
||||
```
|
||||
|
||||
**Criteri di successo:** tutti i 6 test passano, score LLM ≥ 0.8
|
||||
|
||||
---
|
||||
|
||||
## Step 5 — Frontend: Electron store + scheduler + UI ✅ DONE
|
||||
|
||||
**File da modificare:**
|
||||
- `src/main/store.ts` — campo `promptTemplate` → `agentConfig`
|
||||
- `src/main/agents/agent-scheduler.ts` — passa `agentConfig` al trigger
|
||||
- `src/renderer/components/settings/JourneyDialog.tsx` — parsing JSON da reply
|
||||
- `src/renderer/components/settings/LocalAgentConfigPanel.tsx` — mostra config
|
||||
- `src/renderer/components/settings/types.ts` — type `LocalAgentConfig` aggiornato
|
||||
- `src/shared/api-types.ts` — frame type aggiornato (se impatta WS)
|
||||
|
||||
**Cosa cambia:**
|
||||
- Lo store salva `agentConfig: AgentConfig` (oggetto) invece di `promptTemplate: string`
|
||||
- Lo scheduler manda `agent_config` nel body del trigger (non `custom_agent_prompt`)
|
||||
- Il JourneyDialog riceve JSON e lo mostra in modo human-readable
|
||||
- Il config panel mostra i content types configurati e le regole
|
||||
|
||||
### Test set — Step 5
|
||||
|
||||
| # | Test case | Verifica | Score name |
|
||||
|---|-----------|----------|------------|
|
||||
| 5.1 | Store: salva/legge agentConfig | round-trip JSON | `fe.store` |
|
||||
| 5.2 | Scheduler: passa config al trigger | body POST corretto | `fe.scheduler` |
|
||||
| 5.3 | Journey: parsing reply JSON | `agentConfig` popolato | `fe.journey_parse` |
|
||||
|
||||
**Nota:** test frontend sono manuali/Playwright. Score inviati solo per i test backend.
|
||||
|
||||
**Criteri di successo:** round-trip completo funzionante
|
||||
|
||||
---
|
||||
|
||||
## Step 6 — Test end-to-end con file reali
|
||||
|
||||
**File da creare:**
|
||||
- `tests/test_local_agent_e2e.py`
|
||||
- `tests/fixtures/emails/` — 5-10 email HTML di esempio (anonimizzate)
|
||||
|
||||
**Scenari E2E:**
|
||||
|
||||
| # | Scenario | Input | Expected | Score name |
|
||||
|---|----------|-------|----------|------------|
|
||||
| 6.1 | Email con azione → task | "Please review the PR by Friday" | task creato con dueDate | `e2e.action_email` |
|
||||
| 6.2 | Email informativa → nota | "FYI: new policy effective May 1" | nota + timeline creati | `e2e.info_email` |
|
||||
| 6.3 | Email thread nested | 4 livelli di reply | solo ultimo msg processato | `e2e.thread` |
|
||||
| 6.4 | Newsletter → skip | Newsletter marketing | nessuna entità creata | `e2e.newsletter_skip` |
|
||||
| 6.5 | Progetto da filename | `ProjectX_update.html` | assegnato a ProjectX | `e2e.project_filename` |
|
||||
| 6.6 | Progetto da contenuto | Email menziona "Project Alpha" | assegnato a Project Alpha | `e2e.project_content` |
|
||||
| 6.7 | Nessun progetto + regola | No match + "no project = no entity" | nessuna entità creata | `e2e.no_project_rule` |
|
||||
| 6.8 | Deduplicazione update | Task esiste + email simile | update, non create | `e2e.dedup` |
|
||||
| 6.9 | Multi-entità da 1 email | Email con task + meeting date | task + timeline creati | `e2e.multi_entity` |
|
||||
| 6.10 | Batch 5 file misti | 3 email + 1 newsletter + 1 info | 3 processati, 1 skippato, 1 nota | `e2e.batch_mixed` |
|
||||
|
||||
**Eval con Langfuse (esempio con scoring multiplo):**
|
||||
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_e2e_action_email(real_email_fixtures):
|
||||
lf = get_langfuse()
|
||||
trace = lf.trace(name="eval-e2e-action-email", metadata={"step": "6"}) if lf else None
|
||||
|
||||
# Setup completo: config → preprocess → LLM → tool calls
|
||||
tool_calls = await run_full_pipeline(
|
||||
file_path="fixtures/emails/action_request.html",
|
||||
agent_config=STANDARD_EMAIL_CONFIG,
|
||||
existing_projects=[{"id": "p1", "name": "Project Alpha"}],
|
||||
)
|
||||
|
||||
# Score multipli per aspetto
|
||||
scores = {
|
||||
"task_created": 1.0 if any(c["name"] == "create_task" for c in tool_calls) else 0.0,
|
||||
"correct_project": 1.0 if any(c["args"].get("project_id") == "p1" for c in tool_calls) else 0.0,
|
||||
"has_due_date": 1.0 if any(c["args"].get("due_date", 0) > 0 for c in tool_calls) else 0.0,
|
||||
"is_ai_suggested": 1.0 if any(c["args"].get("is_ai_suggested") == 1 for c in tool_calls) else 0.0,
|
||||
}
|
||||
|
||||
if trace:
|
||||
for name, value in scores.items():
|
||||
lf.score(trace_id=trace.id, name=f"e2e.action_email.{name}", value=value)
|
||||
lf.flush()
|
||||
|
||||
assert all(v == 1.0 for v in scores.values())
|
||||
```
|
||||
|
||||
**Criteri di successo:** ≥ 8/10 test passano, score medio ≥ 0.8
|
||||
|
||||
---
|
||||
|
||||
## Ordine di implementazione
|
||||
|
||||
```
|
||||
Step 1 (preprocessor) ← nessuna dipendenza, partire qui
|
||||
↓
|
||||
Step 3 (model/schema) ← parallelo a Step 1
|
||||
↓
|
||||
Step 2 (agent_runner) ← dipende da Step 1 + Step 3
|
||||
↓
|
||||
Step 4 (journey setup) ← dipende da Step 3 (schema AgentConfig)
|
||||
↓
|
||||
Step 5 (frontend) ← dipende da Step 3 + Step 4
|
||||
↓
|
||||
Step 6 (E2E) ← dipende da tutto
|
||||
```
|
||||
|
||||
**Step 1 e 3 possono essere sviluppati in parallelo.**
|
||||
|
||||
---
|
||||
|
||||
## Riepilogo score Langfuse
|
||||
|
||||
| Step | Score prefix | # test | Soglia minima |
|
||||
|------|-------------|--------|---------------|
|
||||
| 1 | `preprocess.*` | 10 | ≥ 0.9 |
|
||||
| 2 | `runner.*` | 10 | ≥ 0.8 |
|
||||
| 3 | `schema.*` | 5 | 1.0 (deterministici) |
|
||||
| 4 | `journey.*` | 6 | ≥ 0.8 |
|
||||
| 5 | `fe.*` | 3 | 1.0 (deterministici) |
|
||||
| 6 | `e2e.*` | 10 | ≥ 0.8 |
|
||||
|
||||
Totale: **44 test**, di cui ~26 con scoring LLM su Langfuse.
|
||||
419
docs/enanch_memorie_v2_mem.md
Normal file
419
docs/enanch_memorie_v2_mem.md
Normal file
@@ -0,0 +1,419 @@
|
||||
# Enhanced Memory V2 - Analisi e Progettazione
|
||||
|
||||
## Stato: FASE 2 - Analisi e Proposta completata
|
||||
|
||||
---
|
||||
|
||||
## 1. DECISIONI PRESE
|
||||
|
||||
| Domanda | Risposta |
|
||||
|---------|----------|
|
||||
| **Privacy** | Backend PUÒ processare plaintext in-memory per estrazione. NO persistenza plaintext |
|
||||
| **SaaS vs In-House** | **Solo in-house**. Nessuna dipendenza Supermemory API |
|
||||
| **Feature target** | Graph Memory, Contradiction Resolution, Forgetting/Decay, User Profiles, LLM Episode Summarization |
|
||||
| **Scala utenti** | < 100 (early stage) |
|
||||
| **Budget LLM** | **Minimizzare**. Preferire modelli piccoli/euristiche dove possibile |
|
||||
| **Semantic search** | NON richiesta in questa fase (keyword fallback accettato per ora) |
|
||||
|
||||
---
|
||||
|
||||
## 2. IMPLEMENTAZIONE ATTUALE (Adiuva)
|
||||
|
||||
### Architettura Memoria (MemGPT-style, 4 livelli)
|
||||
|
||||
| Livello | Tabella | Stato |
|
||||
|---------|---------|-------|
|
||||
| **Core** | `memory_core` | Funzionante - key/value preferenze |
|
||||
| **Associativa** | `memory_associative` | Parziale - keyword fallback, no semantic |
|
||||
| **Episodica** | `memory_episodic` | Funzionante - troncamento 200 char |
|
||||
| **Proattiva** | `memory_proactive` | Schema vuoto, nessuna logica |
|
||||
|
||||
### Punti di Forza
|
||||
- E2E encryption per-user (Fernet)
|
||||
- 9 tool agente per memoria
|
||||
- Context injection automatica pre-LLM
|
||||
- Episodi auto-salvati post-conversazione
|
||||
|
||||
### Gap vs Supermemory (ciò che manca)
|
||||
1. Nessuna estrazione automatica di fatti dalle conversazioni
|
||||
2. Nessuna relazione tra memorie (graph UPDATE/EXTEND/DERIVE)
|
||||
3. Nessun forgetting/decay temporale
|
||||
4. Nessuna risoluzione contraddizioni
|
||||
5. Nessun user profile auto-generato
|
||||
6. Episodi troncati a 200 char senza summarization
|
||||
7. Proactive memory non implementata
|
||||
|
||||
### IL PROBLEMA CENTRALE: context selection cieca
|
||||
Il metodo `_load_associative()` fa:
|
||||
```python
|
||||
SELECT * FROM memory_associative
|
||||
WHERE user_id = ?
|
||||
ORDER BY updated_at DESC ← ordina per DATA, non per rilevanza
|
||||
LIMIT 5
|
||||
```
|
||||
**Il messaggio dell'utente NON viene usato per filtrare.** Ritorna i 5 fatti più recenti,
|
||||
anche se totalmente irrilevanti alla domanda. Lo stesso per episodic (ultimi 10 per data).
|
||||
|
||||
---
|
||||
|
||||
## 3. COME SUPERMEMORY SALVA I DATI (non usa MD!)
|
||||
|
||||
### Chiarimento: Supermemory NON salva file MD
|
||||
Supermemory accetta MD come formato di input (insieme a PDF, JSON, codice, ecc.),
|
||||
ma i dati vengono trasformati e salvati in **PostgreSQL + vector embeddings (Cloudflare AI)**.
|
||||
|
||||
### Schema di storage Supermemory (da codice sorgente)
|
||||
```
|
||||
Documents table (Drizzle ORM + Postgres)
|
||||
├── id, customId, orgId, userId
|
||||
├── content ← raw content originale
|
||||
├── title, summary ← generati da LLM
|
||||
├── type ← 'text' | 'web' | 'pdf' | 'md' | ecc.
|
||||
├── status ← 'queued' | 'extracting' | 'chunking' | 'embedding' | 'done'
|
||||
├── metadata ← JSON key/value filtrabile
|
||||
├── tokenCount, wordCount, chunkCount
|
||||
├── summaryEmbedding (vector)
|
||||
├── containerTags[] ← namespace isolation (user_id, project_id)
|
||||
└── createdAt, updatedAt
|
||||
|
||||
MemoryEntries table (fatti estratti)
|
||||
├── id, memory ← il fatto estratto ("Mario preferisce risposte concise")
|
||||
├── version ← numero di versione del fatto
|
||||
├── context
|
||||
│ ├── parents[] ← [{relation: 'updates'|'extends'|'derives', memory, version}]
|
||||
│ └── children[] ← [{relation, memory, version}]
|
||||
├── similarity ← score di ricerca
|
||||
├── metadata
|
||||
├── sourceDocumentId ← da quale documento è stato estratto
|
||||
└── updatedAt
|
||||
```
|
||||
|
||||
### Come Adiuva salva i dati (confronto)
|
||||
```
|
||||
memory_core (PostgreSQL)
|
||||
├── key, value_encrypted ← Fernet AES-128
|
||||
└── user_id
|
||||
|
||||
memory_associative (PostgreSQL + pgvector)
|
||||
├── content_encrypted ← cifrato
|
||||
├── embedding (1536-dim) ← colonna presente ma MAI usata per search
|
||||
├── entity_type, entity_id
|
||||
└── user_id
|
||||
|
||||
memory_episodic (PostgreSQL)
|
||||
├── summary_encrypted ← "User: [200 char]\nAssistant: [200 char]"
|
||||
├── session_id
|
||||
└── user_id
|
||||
|
||||
memory_proactive (PostgreSQL)
|
||||
├── pattern_encrypted, confidence ← schema vuoto, nessun dato
|
||||
└── user_id
|
||||
```
|
||||
|
||||
### Differenza chiave nel salvataggio
|
||||
| | Supermemory | Adiuva |
|
||||
|---|-----------|--------|
|
||||
| **Cosa salva** | Fatti strutturati estratti da LLM | Testo grezzo cifrato |
|
||||
| **Relazioni** | Graph con UPDATE/EXTEND/DERIVE + versioning | Nessuna relazione |
|
||||
| **Embeddings** | Generati e usati attivamente per search | Colonna presente ma inutilizzata |
|
||||
| **Encryption** | Nessuna (plaintext) | Fernet per-user |
|
||||
| **Processing** | Pipeline: Extract → Chunk → Embed → Index | Store diretto senza processing |
|
||||
|
||||
---
|
||||
|
||||
## 4. SUPERMEMORY - Cosa Prendere Come Ispirazione
|
||||
|
||||
> **NON integriamo Supermemory SaaS.** Ci ispiriamo al design per implementare in-house.
|
||||
|
||||
### Concetti da adottare
|
||||
1. **Relazioni tra memorie**: UPDATE (sostituisce), EXTEND (arricchisce), DERIVE (inferisce)
|
||||
2. **isLatest flag**: tracciamento della versione corrente di un fatto
|
||||
3. **Automatic forgetting**: fatti temporali con `expires_at`, episodi che decadono
|
||||
4. **User Profile duale**: `static` (fatti stabili) + `dynamic` (attività recente)
|
||||
5. **Fact extraction post-conversazione**: LLM estrae fatti strutturati dopo ogni chat
|
||||
|
||||
### Concetti da NON adottare (non rilevanti)
|
||||
- Connectors (Google Drive, Gmail, etc.) — Adiuva è un'app desktop, non un aggregatore
|
||||
- Multi-modal extraction (PDF, video) — fuori scope
|
||||
- Hybrid RAG+Memory search — non richiesto ora
|
||||
|
||||
---
|
||||
|
||||
## 4. ANALISI COSTI/BENEFICI - OPZIONI
|
||||
|
||||
### OPZIONE SCARTATA: Supermemory SaaS Integration
|
||||
|
||||
| | Dettaglio |
|
||||
|---|----------|
|
||||
| **Costo** | $0-19/mo per <100 utenti (Free/Pro) |
|
||||
| **Pro** | Implementazione rapida, SOTA benchmarks |
|
||||
| **Contro Fatali** | Viola privacy (plaintext obbligatorio), vendor lock-in, latenza API esterna, architettura Cloudflare Workers non self-hostabile facilmente |
|
||||
| **Verdetto** | **SCARTATA** — incompatibile con zero-trust e preferenza in-house |
|
||||
|
||||
### OPZIONE SCELTA: Enhancement In-House Ispirato a Supermemory
|
||||
|
||||
**Approccio**: evoluzione incrementale dell'architettura esistente in 4 fasi.
|
||||
|
||||
---
|
||||
|
||||
## 5. PIANO DI IMPLEMENTAZIONE PROPOSTO
|
||||
|
||||
### FASE 1 — Memory Graph + Contradiction Resolution
|
||||
**Effort**: ~3-5 giorni | **Costo LLM extra**: ~$0.002/conversazione (GPT-4o-mini)
|
||||
|
||||
**Cosa cambia nel DB:**
|
||||
- Nuova tabella `memory_fact` (sostituisce progressivamente `memory_associative`)
|
||||
```
|
||||
memory_fact:
|
||||
id, user_id
|
||||
content_encrypted -- il fatto estratto, cifrato
|
||||
category -- 'preference' | 'fact' | 'episode' | 'goal' | 'relationship'
|
||||
entity_type -- a cosa si riferisce: 'user' | 'project' | 'task' | 'person'
|
||||
entity_id -- opzionale, FK
|
||||
is_latest -- boolean, come Supermemory
|
||||
superseded_by_id -- FK → memory_fact (relazione UPDATE)
|
||||
extends_id -- FK → memory_fact (relazione EXTEND)
|
||||
derived_from_ids -- JSON array di FK (relazione DERIVE)
|
||||
confidence -- 0.0-1.0
|
||||
source -- 'extracted' | 'explicit' | 'inferred'
|
||||
expires_at -- nullable, per fatti temporali
|
||||
last_accessed_at -- per decay scoring
|
||||
created_at, updated_at
|
||||
```
|
||||
|
||||
**Come funziona:**
|
||||
1. Post-conversazione, GPT-4o-mini riceve transcript (ultimi 2000 char) + prompt strutturato
|
||||
2. LLM estrae JSON array di fatti: `[{content, category, entity_type, is_temporal, expires_at}]`
|
||||
3. Per ogni fatto estratto, sistema verifica con fatti esistenti (keyword match su decrypt in-memory)
|
||||
4. Se contraddizione trovata → vecchio fatto `is_latest=false`, `superseded_by_id=nuovo`
|
||||
5. Se arricchimento → nuovo fatto ha `extends_id=vecchio`
|
||||
6. Tutto cifrato prima di persistenza
|
||||
|
||||
**Stima costi LLM (GPT-4o-mini @ $0.15/1M input, $0.60/1M output):**
|
||||
- Input: ~2500 tokens/conversazione (transcript + system prompt)
|
||||
- Output: ~300 tokens (JSON fatti estratti)
|
||||
- Costo: ~$0.0006/conversazione → **~$0.06 per 100 conversazioni/giorno**
|
||||
|
||||
---
|
||||
|
||||
### FASE 2 — Automatic Forgetting + Decay
|
||||
**Effort**: ~1-2 giorni | **Costo LLM extra**: $0 (puro heuristico)
|
||||
|
||||
**Meccanismi:**
|
||||
1. **TTL-based expiry**: fatti con `expires_at` vengono ignorati dopo la data
|
||||
2. **Access decay**: `last_accessed_at` + scoring formula: `score = confidence * (1 / (1 + days_since_access * 0.05))`
|
||||
3. **Background cleanup** (cron/periodic): soft-delete fatti con score < 0.1 da >30 giorni
|
||||
4. **Episodic consolidation**: dopo N episodi per sessione, consolida in un singolo summary
|
||||
|
||||
**Nessun costo LLM** — pura logica temporale e scoring matematico.
|
||||
|
||||
---
|
||||
|
||||
### FASE 3 — User Profile Auto-Generato
|
||||
**Effort**: ~2-3 giorni | **Costo LLM extra**: ~$0.001/aggiornamento
|
||||
|
||||
**Come funziona:**
|
||||
1. Nuova mini-tabella `user_profile`:
|
||||
```
|
||||
user_profile:
|
||||
user_id (PK)
|
||||
static_encrypted -- JSON: fatti stabili (nome, ruolo, preferenze durature)
|
||||
dynamic_encrypted -- JSON: attività recente (ultimi 3-5 topic, task in corso)
|
||||
updated_at
|
||||
```
|
||||
2. Dopo ogni estrazione fatti (Fase 1), profilo viene aggiornato:
|
||||
- `static`: fatti con `category IN ('preference', 'fact', 'relationship')` e `confidence > 0.7`
|
||||
- `dynamic`: ultimi 5 fatti con `category = 'goal'` o episodi recenti
|
||||
3. Profilo iniettato nel system prompt a OGNI conversazione (prima del context attuale)
|
||||
4. Aggiornamento trigger: post-extraction, batch di fatti nuovi → GPT-4o-mini "aggiorna profilo"
|
||||
|
||||
**Stima costi:**
|
||||
- Input: ~1000 tokens (profilo attuale + nuovi fatti)
|
||||
- Output: ~500 tokens (profilo aggiornato)
|
||||
- Costo: ~$0.0005/aggiornamento → **trascurabile**
|
||||
|
||||
**Alternativa zero-costo LLM:** il profilo `static` è calcolato come aggregazione diretta dei fatti con `is_latest=true` + alta confidence. Il `dynamic` è gli ultimi N episodi. Nessuna LLM call, solo query SQL. Meno elegante ma $0.
|
||||
|
||||
---
|
||||
|
||||
### FASE 4 — LLM Episode Summarization
|
||||
**Effort**: ~1-2 giorni | **Costo LLM extra**: ~$0.001/episodio
|
||||
|
||||
**Cosa cambia:**
|
||||
- Il campo `summary_encrypted` in `memory_episodic` passa da troncamento 200 char a summary LLM
|
||||
- GPT-4o-mini genera un riassunto strutturato: `{topic, user_intent, outcome, key_facts_mentioned}`
|
||||
- Async: non blocca la risposta. Avviene dopo che la risposta è già stata inviata al client
|
||||
|
||||
**Stima costi:**
|
||||
- Input: ~1500 tokens (conversazione completa)
|
||||
- Output: ~200 tokens (summary strutturato)
|
||||
- Costo: ~$0.0004/episodio → **~$0.04 per 100 conversazioni/giorno**
|
||||
|
||||
---
|
||||
|
||||
## 6. RIEPILOGO COSTI TOTALI
|
||||
|
||||
### Costi LLM aggiuntivi stimati (< 100 utenti, ~100 conversazioni/giorno)
|
||||
|
||||
| Fase | Costo/conv | Costo/giorno | Costo/mese |
|
||||
|------|------------|--------------|------------|
|
||||
| Fase 1 (Fact Extraction) | $0.0006 | $0.06 | ~$1.80 |
|
||||
| Fase 2 (Forgetting) | $0 | $0 | $0 |
|
||||
| Fase 3 (User Profile) | $0-0.0005 | $0-0.05 | $0-1.50 |
|
||||
| Fase 4 (Episode Summary) | $0.0004 | $0.04 | ~$1.20 |
|
||||
| **TOTALE** | **~$0.001-0.002** | **~$0.10-0.15** | **~$3-4.50** |
|
||||
|
||||
### Confronto con Supermemory SaaS
|
||||
| | In-House | Supermemory Free | Supermemory Pro |
|
||||
|---|---------|------------------|-----------------|
|
||||
| Costo/mese | ~$3-4.50 (LLM) | $0 (ma limiti) | $19/mo |
|
||||
| Privacy | E2E mantenuta | Plaintext obbligatorio | Plaintext obbligatorio |
|
||||
| Limiti | Solo LLM rate limits | 1M tokens, 10K search | 3M tokens, 100K search |
|
||||
| Personalizzazione | Totale | Nessuna | Nessuna |
|
||||
| Vendor lock-in | Zero | Alto | Alto |
|
||||
|
||||
**Verdetto**: l'implementazione in-house costa meno di $5/mese, mantiene la privacy, e offre personalizzazione totale.
|
||||
|
||||
---
|
||||
|
||||
## 7. MATRICE BENEFICI
|
||||
|
||||
| Feature | Impatto UX | Effort | Priorità |
|
||||
|---------|-----------|--------|----------|
|
||||
| Fact Extraction + Graph | ALTO - l'AI ricorda tutto automaticamente | Medio | P0 |
|
||||
| Contradiction Resolution | ALTO - niente informazioni obsolete | Basso (incluso in P0) | P0 |
|
||||
| Automatic Forgetting | MEDIO - meno noise nel context | Basso | P1 |
|
||||
| User Profile | ALTO - personalizzazione immediata | Medio | P1 |
|
||||
| Episode Summarization | MEDIO - recall migliore | Basso | P2 |
|
||||
|
||||
---
|
||||
|
||||
## 8. ANALISI CRITICA: COME SUPERMEMORY INIETTA IL CONTESTO
|
||||
|
||||
### Il flusso Supermemory (Python SDK)
|
||||
|
||||
```python
|
||||
from supermemory import Supermemory
|
||||
client = Supermemory() # richiede SUPERMEMORY_API_KEY
|
||||
|
||||
# ── PRE-LLM: recupera profilo + memorie rilevanti ──
|
||||
profile = client.profile(
|
||||
container_tag="user_123", # = user_id di Adiuva
|
||||
q="What sneakers should I buy?" # = il messaggio dell'utente
|
||||
)
|
||||
|
||||
# profile.profile.static → ["Senior engineer at Acme", "Prefers dark mode"]
|
||||
# profile.profile.dynamic → ["Working on auth migration"]
|
||||
# profile.search_results → memorie rilevanti per la query
|
||||
|
||||
# ── Assemblaggio system prompt ──
|
||||
context = f"""Static profile:
|
||||
{chr(10).join(profile.profile.static)}
|
||||
|
||||
Dynamic profile:
|
||||
{chr(10).join(profile.profile.dynamic)}
|
||||
|
||||
Relevant memories:
|
||||
{chr(10).join(r.get("memory","") for r in profile.search_results.results)}"""
|
||||
|
||||
messages = [{"role": "system", "content": f"User context:\n{context}"}, *conversation]
|
||||
# → passa al tuo LLM
|
||||
|
||||
# ── POST-LLM: salva conversazione (Supermemory estrae fatti automaticamente) ──
|
||||
client.add(
|
||||
content="\n".join(f"{m['role']}: {m['content']}" for m in conversation),
|
||||
container_tag="user_123",
|
||||
)
|
||||
```
|
||||
|
||||
### Cosa succede sotto: `client.add()` → HTTPS → Supermemory cloud
|
||||
|
||||
1. Supermemory riceve il **plaintext completo** della conversazione
|
||||
2. Il **loro LLM** estrae fatti, preferenze, entità
|
||||
3. Costruisce relazioni graph (UPDATE/EXTEND/DERIVE) con fatti esistenti
|
||||
4. Aggiorna il profilo utente (static + dynamic)
|
||||
5. Applica forgetting su fatti temporali scaduti
|
||||
|
||||
### Come si confronta con il tuo `enrich_context()`:
|
||||
|
||||
| Aspetto | Adiuva (attuale) | Supermemory |
|
||||
|---------|-----------------|-------------|
|
||||
| **Dove vive la logica** | `memory_middleware.py` nel tuo backend | Cloud di terzi |
|
||||
| **Come recupera contesto** | 4 query SQL → decrypt in-memory | 1 HTTPS call `client.profile()` |
|
||||
| **Qualità contesto** | Raw key/value + troncamenti 200 char | Fatti strutturati + profilo curato |
|
||||
| **Chi estrae fatti** | Nessuno (o l'utente via tool) | LLM automatico su ogni `add()` |
|
||||
| **Latenza retrieval** | ~5-15ms (DB locale) | ~50-200ms (HTTPS) |
|
||||
| **Latenza storage** | ~2ms (INSERT SQL) | ~200-500ms (HTTPS + LLM extraction) |
|
||||
| **Privacy** | Plaintext solo in-memory, cifrato a riposo | **Plaintext permanente su server terzi** |
|
||||
|
||||
---
|
||||
|
||||
## 9. VALUTAZIONE CRITICA ONESTA
|
||||
|
||||
### BENEFICI REALI dell'integrazione Supermemory
|
||||
|
||||
1. **Extraction automatica** — Non dover fare nulla: `client.add(conversation)` e i fatti vengono estratti. Risparmi ~3-5 giorni dev della Fase 1.
|
||||
|
||||
2. **Contradiction resolution SOTA** — Il loro graph engine è #1 sui benchmark. Implementarlo in-house richiede un LLM prompt ben ingegnerizzato + logica di matching.
|
||||
|
||||
3. **User Profiles pronti** — `client.profile()` restituisce static+dynamic in ~50ms. In-house devi costruire la logica di aggregazione.
|
||||
|
||||
4. **Temporal forgetting** — Gestiscono scadenza e noise filtering. In-house è semplice (TTL + cron) ma loro lo fanno meglio con LLM.
|
||||
|
||||
### PROBLEMI CRITICI dell'integrazione
|
||||
|
||||
1. **PRIVACY DISTRUTTA** — Il punto più grave. Tutto il modello E2E di Adiuva si basa su: "il backend non persiste mai plaintext". Supermemory riceve e **tiene** tutti i dati utente in chiaro. Per un'app che vende privacy, è un dealbreaker.
|
||||
|
||||
2. **LATENZA AGGIUNTA** — Ogni conversazione aggiunge:
|
||||
- +50-200ms PRE-LLM (profile fetch via HTTPS)
|
||||
- +200-500ms POST-LLM (add + extraction)
|
||||
- vs. ~5-15ms totali con DB locale
|
||||
- Su connessione instabile: timeout → memoria persa
|
||||
|
||||
3. **SINGLE POINT OF FAILURE** — Se `supermemory.ai` è down, la tua app perde TUTTA la memoria. Non ha fallback locale. Le tue 4 tabelle PostgreSQL attuali sono resilienti.
|
||||
|
||||
4. **VENDOR LOCK-IN** — I fatti estratti vivono nel loro cloud. Se chiudono, cambi pricing, o limiti free tier → migrazione dolorosa. Con la soluzione in-house hai ownership totale.
|
||||
|
||||
5. **COSTI CHE SCALANO MALE** — Free tier: 1M tokens/mese = ~250 conversazioni medie. Con 100 utenti attivi:
|
||||
- ~30 conv/utente/mese = 3000 conv = ~12M tokens → **Scale plan $399/mo**
|
||||
- In-house: $3-5/mo per le stesse 3000 conv con GPT-4o-mini
|
||||
|
||||
6. **ARCHITETTURA OVERHAUL** — Devi:
|
||||
- Rimuovere/sostituire le 4 tabelle memory
|
||||
- Riscrivere i 9 tool dell'agente per usare l'SDK
|
||||
- Rimuovere la logica di encryption
|
||||
- Cambiare il contratto WebSocket (se i tool memory cambiano)
|
||||
- **Effort paradossale**: più lavoro per integrare che per migliorare in-house
|
||||
|
||||
7. **NON SELF-HOSTABILE** — Il repo GitHub è MIT ma il core è Cloudflare Workers + KV + Postgres. Self-hosting richiede Cloudflare infra o riscrittura significativa.
|
||||
|
||||
### BILANCIO FINALE
|
||||
|
||||
| Pro | Peso |
|
||||
|-----|------|
|
||||
| Extraction + Graph + Forgetting gratis | ★★★★ |
|
||||
| User profiles automatici | ★★★ |
|
||||
| Zero dev effort per le feature memory | ★★★ |
|
||||
| **Totale Pro** | **10/15** |
|
||||
|
||||
| Contro | Peso |
|
||||
|--------|------|
|
||||
| Privacy distrutta (dealbreaker per il brand) | ★★★★★ |
|
||||
| Vendor lock-in su funzionalità core | ★★★★ |
|
||||
| Costi $399/mo a regime vs $5/mo in-house | ★★★★ |
|
||||
| Latenza +200-700ms per conversazione | ★★★ |
|
||||
| Single point of failure | ★★★ |
|
||||
| **Totale Contro** | **19/25** |
|
||||
|
||||
> **Verdetto: l'integrazione Supermemory SaaS è netta-negativa per Adiuva.**
|
||||
> I benefici (extraction, graph, profiles) sono replicabili in-house a costo inferiore,
|
||||
> senza sacrificare privacy, ownership e resilienza.
|
||||
|
||||
---
|
||||
|
||||
## 10. PROSSIMI PASSI
|
||||
- [ ] Approvazione piano di miglioramento in-house (4 fasi)
|
||||
- [ ] Design schema migration per `memory_fact` e `user_profile`
|
||||
- [ ] Implementazione Fase 1 (fact extraction + graph)
|
||||
- [ ] Test extraction con conversazioni reali
|
||||
- [ ] Implementazione Fasi 2-4 incrementalmente
|
||||
303
docs/local_agent_v2_mem.md
Normal file
303
docs/local_agent_v2_mem.md
Normal file
@@ -0,0 +1,303 @@
|
||||
# Local Agent V2 — Working Memory
|
||||
|
||||
## Decisioni confermate
|
||||
|
||||
- **Breaking change**: nessuna backward compatibility con prompt_template
|
||||
- **Preprocessing**: lato backend Python, approccio (c): handler predefiniti + fallback LLM futuro
|
||||
- **Primo handler**: email HTML. Altri tipi in futuro.
|
||||
- **Journey**: produce agent_config strutturato (JSON), non prompt monolitico
|
||||
- **L'utente vuole personalizzazione**: es. "summarize documenti nelle note per progetto"
|
||||
- **File types**: qualsiasi tipo, anche mischiati nella stessa directory
|
||||
- **Progetti**: numero variabile, deve scalare
|
||||
|
||||
---
|
||||
|
||||
## Architettura V2 — Flusso per file
|
||||
|
||||
### [A] Detect + Preprocess (Python puro, zero LLM)
|
||||
|
||||
```
|
||||
File raw da Electron
|
||||
↓
|
||||
detect_content_type(filename, raw_content)
|
||||
→ heuristic: extension + content patterns
|
||||
→ match a un content_type dal agent_config
|
||||
↓
|
||||
preprocess(content_type, raw_content)
|
||||
→ handler specifico (es. email_html → BeautifulSoup)
|
||||
→ Output: { content_type, clean_text, metadata: {subject, from, date, ...} }
|
||||
```
|
||||
|
||||
Handlers predefiniti (MVP: solo email_html):
|
||||
- `email_html`: strip tags, estrai subject/from/to/date, splitta thread → ultimo msg
|
||||
- `generic_html`: estrai main content, strip nav/footer (futuro)
|
||||
- `plain_text`: pass-through (futuro)
|
||||
- `csv`: parse + summary (futuro)
|
||||
- `pdf`: estrai testo (futuro)
|
||||
- Fallback: raw text con limit
|
||||
|
||||
### [B] Single LLM call — classify + extract + create
|
||||
|
||||
Una sola call LLM con tool calling che fa tutto:
|
||||
|
||||
**System prompt costruito da:**
|
||||
1. Istruzioni base (update-first, isAiSuggested=1, ecc.)
|
||||
2. Regole di estrazione del content_type (dal agent_config) ← posizione PROMINENTE
|
||||
3. Global rules (dal agent_config)
|
||||
4. Lista progetti compatta
|
||||
5. Istruzioni procedurali: identifica progetto → query entità → estrai → crea/aggiorna
|
||||
|
||||
**User message:**
|
||||
- Filename + metadata
|
||||
- Testo pulito
|
||||
|
||||
**Tools disponibili:**
|
||||
- list_tasks, list_notes, list_timelines (query)
|
||||
- create_task, create_note, create_timeline
|
||||
- update_task, update_note, update_timeline
|
||||
|
||||
**Max steps:** 12 (loop tool calling)
|
||||
|
||||
### Journey → agent_config (JSON strutturato)
|
||||
|
||||
```json
|
||||
{
|
||||
"content_types": [
|
||||
{
|
||||
"id": "email_html",
|
||||
"label": "Email HTML",
|
||||
"detection_hint": "HTML con struttura email (From/To/Subject)",
|
||||
"preprocessing": "email_html",
|
||||
"extraction_prompt": "Per ogni email: azione diretta → task..."
|
||||
}
|
||||
],
|
||||
"global_rules": [
|
||||
"Se il file non è riconducibile a nessun progetto, non creare entità."
|
||||
],
|
||||
"data_types": ["tasks", "notes", "timelines"]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Problemi V1 e come V2 li risolve
|
||||
|
||||
| # | Problema V1 | Soluzione V2 |
|
||||
|---|---|---|
|
||||
| P1 | HTML raw all'LLM | Preprocessing Python → testo pulito |
|
||||
| P2 | Troncamento 4000 char | Testo preprocessato, molto più denso |
|
||||
| P3 | Nessuna gestione thread | Handler email splitta thread, ultimo msg |
|
||||
| P4 | Project matching debole | Filename come segnale primario + testo pulito |
|
||||
| P5 | custom_prompt in coda | Extraction rules in posizione prominente |
|
||||
| P6 | Nessun preprocessing | Handler predefiniti per tipo |
|
||||
| P7 | items_created sempre 0 | Fix nel runner (contare tool call results) |
|
||||
|
||||
---
|
||||
|
||||
## Modifiche al codice necessarie
|
||||
|
||||
### Backend (adiuva-api)
|
||||
|
||||
1. **Nuovo modulo**: `app/core/preprocessors/` con handler per tipo
|
||||
- `__init__.py` — registry + detect + dispatch
|
||||
- `email_html.py` — BeautifulSoup: strip, metadata, thread split
|
||||
- `base.py` — interfaccia base + fallback
|
||||
|
||||
2. **`agent_setup.py`**: Journey produce agent_config JSON, non prompt_template
|
||||
- System prompt aggiornato per generare JSON strutturato
|
||||
- Validazione output con schema Pydantic
|
||||
|
||||
3. **`agent_runner.py`**: Flusso rivisto
|
||||
- Rimuovere `_classify_file()` (Step 1 separato)
|
||||
- Aggiungere preprocess step prima della call LLM
|
||||
- Single LLM call con prompt tipo-specifico
|
||||
- Contare items_created dai tool call results
|
||||
|
||||
4. **`models.py`**: `prompt_template: Text` → `agent_config: JSON`
|
||||
|
||||
### Frontend (adiuva)
|
||||
|
||||
5. **`store.ts`**: Campo `promptTemplate` → `agentConfig`
|
||||
6. **`JourneyDialog.tsx`**: Parsing JSON da journey reply
|
||||
7. **`agent-scheduler.ts`**: Passa `agentConfig` al trigger
|
||||
8. **Schema Pydantic/Zod**: Aggiornare per nuovo formato
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Stato implementazione
|
||||
|
||||
| Step | Stato | Branch |
|
||||
|------|-------|--------|
|
||||
| Step 1 — Preprocessors | ✅ DONE | `feature/batch-agent-v2` |
|
||||
| Step 2 — agent_runner.py refactor | ✅ DONE | `feature/batch-agent-v2` |
|
||||
| Step 3 — Model/schema agent_config | ✅ DONE | `feature/batch-agent-v2` |
|
||||
| Step 4 — Journey setup output strutturato | ✅ DONE | `feature/batch-agent-v2` |
|
||||
| Step 5 — Frontend | ✅ DONE | main |
|
||||
| Step 6 — E2E con file reali | ⏳ TODO | — |
|
||||
|
||||
---
|
||||
|
||||
## Convenzioni test (aggiornate dopo implementazione step 1–2)
|
||||
|
||||
### Struttura fixture
|
||||
|
||||
```
|
||||
tests/fixtures/<step_name>/
|
||||
cases.yaml ← definizioni dei casi
|
||||
data/ ← file di input (HTML, txt, ...)
|
||||
```
|
||||
|
||||
Opzione CLI per sovrascrivere la cartella:
|
||||
```bash
|
||||
pytest tests/test_<step>.py -v --<step>-dir /path/to/folder
|
||||
```
|
||||
Registrata in `conftest.py` via `pytest_addoption`. La cartella custom deve avere la stessa struttura (`cases.yaml` + `data/`).
|
||||
|
||||
Opzioni registrate finora:
|
||||
- `--preprocess-dir` → step 1
|
||||
- `--runner-dir` → step 2 (aggiungere `--journey-dir` per step 4, `--e2e-dir` per step 6)
|
||||
|
||||
### Schema YAML — principi (step 1 vs step 2)
|
||||
|
||||
**Step 1 (preprocessors) — test deterministici, no LLM:**
|
||||
- Chiavi piatte: `detect:`, `process:`, `no_html:`, `min_chars:`, ecc.
|
||||
- Nessun `description` né `score_name` (Langfuse non usato)
|
||||
- `file:` serve sia come nome su disco che come filename passato alla funzione
|
||||
- `generate: binary_noise` per contenuto sintetico
|
||||
|
||||
**Step 2+ (runner, journey, e2e) — test LLM eval:**
|
||||
- `file:` = nome su disco in `data/`
|
||||
- `file_path:` = path vista dall'agent (separato perché più casi riusano lo stesso file con path diversi, es. per testare project matching da filename vs content)
|
||||
- `description:` presente nel YAML (utile nel report pytest)
|
||||
- `score_name:` presente nel YAML (il nome con cui lo score viene inviato a Langfuse)
|
||||
- `projects:` lista di nomi simbolici (`alpha`, `beta`) o dict inline `{id, name, status}` — risolta da `_resolve_projects()`
|
||||
- Assertion keys piatte: `expect_insert`, `expect_no_insert`, `expect_project_id`, `expect_dedup`
|
||||
|
||||
### Parametrize da YAML
|
||||
|
||||
Usare `pytest_generate_tests` per accedere all'opzione CLI custom:
|
||||
|
||||
```python
|
||||
def pytest_generate_tests(metafunc):
|
||||
if "runner_case" not in metafunc.fixturenames:
|
||||
return
|
||||
cases = _load_cases(metafunc.config)
|
||||
metafunc.parametrize("runner_case", cases, ids=[c["id"] for c in cases])
|
||||
```
|
||||
|
||||
I test accedono alla dir via `pytestconfig`:
|
||||
```python
|
||||
async def test_eval_runner(runner_case, pytestconfig):
|
||||
data_dir = _fixtures_dir(pytestconfig) / "data"
|
||||
```
|
||||
|
||||
### Langfuse V3 — pattern corretto
|
||||
|
||||
**Problemi riscontrati con V2 API (non usare):**
|
||||
- `lf.trace()` → non esiste in V3
|
||||
- `lf.score(trace_id=...)` → non esiste in V3
|
||||
- `lf.start_as_current_observation(user_id=..., session_id=...)` → kwargs non accettati
|
||||
|
||||
**Pattern V3 corretto nei test eval:**
|
||||
```python
|
||||
from contextlib import nullcontext
|
||||
lf = get_langfuse()
|
||||
obs_ctx = lf.start_as_current_observation(
|
||||
name="eval-runner-2.1",
|
||||
metadata={"step": "2", "case_id": "2.1"},
|
||||
) if lf else nullcontext()
|
||||
|
||||
with obs_ctx as obs:
|
||||
# ... esegui il codice ...
|
||||
if obs is not None:
|
||||
obs.score(name="runner.email_to_task", value=1.0, comment="...")
|
||||
|
||||
if lf:
|
||||
lf.flush()
|
||||
```
|
||||
|
||||
**Pattern V3 corretto nel codice produzione (`agent_runner.py`, `deep_agent.py`, `agent_setup.py`):**
|
||||
```python
|
||||
# user_id e session_id vanno in metadata, NON come kwarg diretti
|
||||
lf.start_as_current_observation(
|
||||
as_type="span",
|
||||
name="my-span",
|
||||
metadata={"user_id": user_id, "session_id": session_id},
|
||||
input=...,
|
||||
)
|
||||
```
|
||||
|
||||
### compile_prompt — non usare template.format() direttamente
|
||||
|
||||
`get_prompt_or_fallback()` ritorna il template grezzo. Langfuse usa `{{variable}}`, il fallback usa `{variable}`. Usare sempre `compile_prompt()` che dispatcha correttamente:
|
||||
|
||||
```python
|
||||
from app.core.langfuse_client import compile_prompt, get_prompt_or_fallback
|
||||
|
||||
template, prompt_obj = get_prompt_or_fallback("my_prompt", FALLBACK_PROMPT)
|
||||
compiled = compile_prompt(template, prompt_obj, var1=val1, var2=val2)
|
||||
# ↑ usa prompt_obj.compile() per Langfuse, template.format() per fallback
|
||||
```
|
||||
|
||||
**Non fare mai:**
|
||||
```python
|
||||
compiled = template.format(var1=val1) # ❌ rompe con Langfuse (usa {{var1}})
|
||||
```
|
||||
|
||||
### Struttura test file per step LLM eval
|
||||
|
||||
Pattern consolidato da `test_agent_runner_v2.py`:
|
||||
|
||||
```
|
||||
tests/test_<step>.py
|
||||
├── Costanti (_USER_ID, _DEFAULT_FIXTURE_DIR, _AGENT_CONFIG, simboli progetto)
|
||||
├── _fixtures_dir(config) + _load_cases(config) + _read_case_file(case, data_dir)
|
||||
├── _resolve_projects(entries) — gestisce sia stringhe simboliche che dict inline
|
||||
├── pytest_generate_tests — parametrize eval tests da YAML
|
||||
├── Helper builders (_make_config, _make_run_log, _make_manager, _make_executor)
|
||||
├── Unit tests statici (no YAML, no LLM)
|
||||
└── test_eval_<step>(runner_case, pytestconfig) — unica funzione parametrizzata
|
||||
↓ legge file, risolve progetti, crea executor, chiama runner
|
||||
↓ _evaluate_case(case, calls, kwargs) → (score, comment)
|
||||
↓ obs.score(...) se Langfuse attivo
|
||||
```
|
||||
|
||||
`_evaluate_case()` centralizza tutta la logica di assertion mappata dalle chiavi YAML — nessuna logica di assert sparsa nel test.
|
||||
|
||||
### Step 4 — Journey V2: pattern specifici
|
||||
|
||||
**Sentinelle:** `AGENT_CONFIG_START` / `AGENT_CONFIG_END` (rimpiazzano `PROMPT_TEMPLATE_START/END`)
|
||||
|
||||
**Langfuse prompt:** `journey_system_v2` (non `journey_system` della V1)
|
||||
|
||||
**Frame key:** `existing_config` (JSON string, rimpiazza `existing_template` stringa in prosa)
|
||||
|
||||
**Ritorno handler:** chiave `agent_config` (JSON string validato da Pydantic) invece di `prompt_template`
|
||||
|
||||
**Executor per test journey:** usa `set_client_executor(executor)` / `clear_client_executor()` direttamente nel test helper `_run_journey`, mimando `device_ws._handle_journey_start`. Re-imposta prima di ogni chiamata (start + ogni message).
|
||||
|
||||
**Fixture YAML journey:** `directory_files: [{path, content_file}]` + `user_messages: [...]` + assertion keys flat (`expect_question`, `expect_done`, `expect_valid_config`, `expect_content_type_id`, `expect_extraction_contains`, `expect_global_rules`)
|
||||
|
||||
**Test nudge (unit):** popola `_sessions` con una `JourneySession` fake con `_MAX_TURNS` turni, patcha `_call_llm_with_tools`, verifica che il secondo call riceva il nudge con i nuovi marker nelle `history`.
|
||||
|
||||
**JSON nel system prompt:** i literal `{` e `}` nel JSON di esempio devono essere `{{` e `}}` per il fallback `str.format()`. Le variabili template usano `{var}` (singolo). `compile_prompt()` gestisce il dispatch corretto per Langfuse vs fallback.
|
||||
|
||||
### Step 5 — Frontend V2: pattern specifici
|
||||
|
||||
**Store (`LocalAgentLocalConfig`):** campo `agentConfig: Record<string, unknown> | null` sostituisce `promptTemplate: string`. Stored nell'electron-store come oggetto JSON.
|
||||
|
||||
**Trigger body:** lo scheduler e `runNow` mandano `agentConfig` (oggetto, non `customAgentPrompt` stringa).
|
||||
|
||||
**WS frame `journey_start`:** campo `existingConfig` (JSON string) rimpiazza `existingTemplate` stringa. Backend si aspetta `existing_config` (snake_case via `toSnakeCase()`).
|
||||
|
||||
**WS frame `journey_reply`:** campo `agentConfig` (JSON string) rimpiazza `promptTemplate`. Il FE lo riceve come stringa, lo parsa con `JSON.parse()` → oggetto.
|
||||
|
||||
**tRPC journey router:** ritorna `{ ..., agentConfig: string | undefined }`. I componenti React lo parsano localmente.
|
||||
|
||||
**Cloud agents:** non migrati — mantengono `promptTemplate: string` in `CloudAgentConfigSchema`, `agentCloudRouter`, `PromptBuilderChat.onPromptUpdate`. Il `PromptBuilderChat` ora ha anche `onConfigUpdate` per il path local.
|
||||
|
||||
**`JourneyDialog`:** props `currentConfig: Record<string, unknown> | null` + `onSaved(agentConfig: Record<string, unknown>)`. Mostra un summary human-readable (`AgentConfigSummary`) invece del raw prompt string.
|
||||
|
||||
**`InlineAgentCreationStepper`:** mantiene `promptTemplate` state per cloud; aggiunge `agentConfig` state per local. `PromptBuilderChat` richiama `onConfigUpdate` per local e `onPromptUpdate` per cloud (backward-compat).
|
||||
10
skills-lock.json
Normal file
10
skills-lock.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"version": 1,
|
||||
"skills": {
|
||||
"boost-prompt": {
|
||||
"source": "github/awesome-copilot",
|
||||
"sourceType": "github",
|
||||
"computedHash": "2621a44fbd9fc2636953d1e6e39e5faeed995f7fb958ec12cc98a2f0576f6fa7"
|
||||
}
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user