update migration plan

2026-03-20 23:48:36 +01:00
parent e7cdce8287
commit 1a8bf11f90
1 changed files with 271 additions and 209 deletions
--- a/docs/MICROSERVICES_ARCHITECTURE.md
+++ b/docs/MICROSERVICES_ARCHITECTURE.md
@@ -1,8 +1,10 @@
-# Adiuva — Architettura Microservizi
+# Adiuva — Architettura Microservizi (MVP)

 ## Panoramica

-Il monolite attuale viene suddiviso in **5 servizi** + un **API Gateway**, orchestrati con Docker Compose e raggiungibili tramite dominio su Cloudflare.
+Il monolite viene suddiviso in **4 servizi MVP** + un **API Gateway (Traefik)**, orchestrati con Docker Compose su un singolo VPS raggiungibile via Cloudflare.
+
+> **Fuori dall'MVP**: Storage Service (S3/backup CRUD) e Plugin Service (marketplace). Verranno aggiunti come servizi indipendenti in una fase successiva.

 ```
                          ┌──────────────┐
@@ -14,20 +16,21 @@ Il monolite attuale viene suddiviso in **5 servizi** + un **API Gateway**, orche
                          │   Traefik    │
                          │ API Gateway  │
                          │  (routing,   │
-                          │   TLS term.) │
+                          │   TLS, rate  │
+                          │   limiting)  │
                          └──────┬───────┘
                                 │
-          ┌──────────┬───────────┼───────────┬──────────┐
-          │          │           │           │          │
-    ┌─────▼────┐ ┌───▼───┐ ┌────▼────┐ ┌────▼───┐ ┌───▼─────┐
-    │  Auth    │ │  Chat │ │ Storage │ │Billing │ │ Plugins │
-    │ Service  │ │Service│ │ Service │ │Service │ │ Service │
-    └─────┬────┘ └───┬───┘ └────┬────┘ └────┬───┘ └───┬─────┘
-          │          │          │           │          │
-    ┌─────▼──────────▼──────────▼───────────▼──────────▼─────┐
+          ┌──────────┬───────────┼───────────┐
+          │          │           │           │
+    ┌─────▼────┐ ┌───▼───┐ ┌────▼────┐ ┌────▼───┐
+    │  Auth    │ │  Chat │ │  Agent  │ │Billing │
+    │ Service  │ │Service│ │ Service │ │Service │
+    └─────┬────┘ └───┬───┘ └────┬────┘ └────┬───┘
+          │          │          │           │
+    ┌─────▼──────────▼──────────▼───────────▼────┐
    │              Infrastruttura                 │
-    │  PostgreSQL │ Redis │ MinIO (S3) │ Qdrant │ (Pinecone) │
-    └────────────────────────────────────────────────────────┘
+    │  PostgreSQL  │  Redis  │  Qdrant            │
+    └─────────────────────────────────────────────┘
 ```

 ---
@@ -83,46 +86,68 @@ def verify_token(token: str) -> dict:

 ---

-### 1.2 Chat Service (`chat-service`) ⭐ Core
+### 1.2 Chat Service (`chat-service`) ⭐ Real-time

-**Responsabilità**: WebSocket device, home chat, floating chat, agent runner, memory middleware, agent setup journeys.
+**Responsabilità**: WebSocket device connection, home chat, floating chat, memory middleware, streaming LLM responses verso il client.

-| Endpoint originale | Tipo |
+Questo servizio gestisce la **connessione persistente** con l'app Electron e le interazioni **real-time** dell'utente (chat home, floating chat). È il proprietario della WebSocket.
+
+| Endpoint | Tipo |
 |---|---|
-| `/api/v1/ws/device` | WebSocket |
+| `/api/v1/ws/device` | WebSocket (connessione persistente) |
 | `/api/v1/chat` | POST (REST fallback) |
-| `/api/v1/agents/catalog` | GET |
-| `/api/v1/agents/can-create` | POST |
-| `/api/v1/agents/trigger` | POST |

-**Moduli inclusi**: `deep_agent`, `agent_runner`, `agent_registry`, `memory_middleware`, `ws_context`, `device_manager`, tutti gli agent tools (`task_agent`, `project_agent`, `note_agent`, `timeline_agent`, `filesystem_agent`).
+**Moduli inclusi**: `deep_agent`, `memory_middleware`, `ws_context`, `device_manager` (Redis-backed), `output_formatter`, `llm`, tutti gli agent tools (`task_agent`, `project_agent`, `note_agent`, `timeline_agent`).

-**Questa è la bestia che deve scalare orizzontalmente** — è il servizio più CPU/memory intensive (LLM calls, tool loops, WebSocket persistenti).
+**Perché separato dall'Agent Service**: Il Chat Service tiene la WebSocket aperta e risponde in tempo reale (streaming). Scalare aggiungendo repliche è semplice con sticky sessions + Redis pub/sub per il cross-instance routing dei tool_call.
+
+**Scaling**: 2–N repliche. Sticky cookies per le WS + Redis per cross-instance.

 ---

-### 1.3 Storage Service (`storage-service`)
+### 1.3 Agent Service (`agent-service`) ⭐ Batch

-**Responsabilità**: CRUD record crittografati su S3, vector operations, backup.
+**Responsabilità**: Batch agent processing (directory scanning, file classification, entity extraction), agent setup journeys, agent configuration CRUD.

-| Endpoint originale | Metodo |
+Questo servizio gestisce i processi **long-running** e **CPU-intensive**: scansione filesystem, classificazione file con LLM, estrazione entità in batch. Non possiede la WebSocket — comunica con il device dell'utente tramite **Redis pub/sub** passando per il Chat Service.
+
+| Endpoint | Tipo |
 |---|---|
-| `/api/v1/storage/records` | POST / GET |
-| `/api/v1/storage/records/{id}` | GET / PUT / DELETE |
-| `/api/v1/vectors/upsert` | POST |
-| `/api/v1/vectors/search` | POST |
-| `/api/v1/vectors/embed` | POST |
-| `/api/v1/vectors` | DELETE |
-| `/api/v1/backup` | PUT / GET / DELETE |
-| `/api/v1/backup/history` | GET |
+| `/api/v1/agents/catalog` | GET |
+| `/api/v1/agents/can-create` | POST |
+| `/api/v1/agents/trigger` | POST |
+| `/api/v1/agents/journey/start` | POST (o WS relay) |
+| `/api/v1/agents/journey/message` | POST (o WS relay) |

-**Scaling**: 2–3 repliche. I/O bound (S3, Qdrant). Stateless.
+**Moduli inclusi**: `agent_runner`, `agent_registry`, `filesystem_agent`, `llm`.
+
+**Flusso tool-call cross-service** (l'Agent Service non ha la WS):
+
+```
+┌──────────────┐            ┌──────────────┐            ┌──────────┐
+│ Agent Service│            │    Redis     │            │  Chat    │
+│ (batch run)  │            │              │            │ Service  │
+│              │            │              │            │ (ha WS)  │
+│ 1. Needs to  │  PUBLISH   │              │ SUBSCRIBE  │          │
+│    read file ├───────────►│tool_call:u123├───────────►│ 2. Invia │
+│    from      │            │              │            │    al    │
+│    device    │            │              │            │    device│
+│              │            │              │            │    via WS│
+│              │  SUBSCRIBE │              │  PUBLISH   │          │
+│ 4. Riceve   ◄────────────┤tool_result:id│◄───────────┤ 3. Device│
+│    risultato │            │              │            │    reply │
+└──────────────┘            └──────────────┘            └──────────┘
+```
+
+**Scaling**: 1–N repliche. Completamente stateless, scala indipendentemente dalla chat. Ogni replica processa batch job diversi. Può essere scalato a 0 se non ci sono agent attivi (risparmio risorse).
+
+**Vantaggio dello split**: Se 50 utenti triggerano agenti batch contemporaneamente, il Chat Service non ne risente — le risposte real-time rimangono veloci.

 ---

 ### 1.4 Billing Service (`billing-service`)

-**Responsabilità**: Stripe checkout, webhook, subscription management, tier enforcement.
+**Responsabilità**: Stripe checkout, webhook, subscription management.

 | Endpoint originale | Metodo |
 |---|---|
@@ -132,31 +157,125 @@ def verify_token(token: str) -> dict:

 **Database**: Tabelle `subscriptions` (schema `billing`).

-**Comunicazione inter-servizio**: Quando Stripe invia un webhook e il tier cambia, il Billing Service pubblica un evento su **Redis pub/sub** channel `tier_changed:{user_id}`. L'Auth Service aggiorna il campo `tier` nella tabella users (oppure i servizi leggono il tier direttamente dal JWT, aggiornato al prossimo refresh).
+**Comunicazione inter-servizio**: Quando Stripe invia un webhook e il tier cambia, il Billing Service pubblica un evento su **Redis pub/sub** channel `tier_changed:{user_id}`. L'Auth Service aggiorna il campo `tier` nella tabella users. Al prossimo token refresh il JWT conterrà il tier aggiornato.

 **Scaling**: 1 replica sufficiente. Basso traffico.

 ---

-### 1.5 Plugin Service (`plugin-service`)
+### 1.5 Servizi esclusi dall'MVP

-**Responsabilità**: Marketplace, installazione plugin, revenue split.
+I seguenti servizi verranno aggiunti post-MVP come servizi indipendenti:

-| Endpoint originale | Metodo |
-|---|---|
-| `/api/v1/plugins` | GET |
-| `/api/v1/plugins/{id}` | GET |
-| `/api/v1/plugins/{id}/install` | POST / DELETE |
-
-**Database**: Tabelle `plugins`, `plugin_installations`, `revenue_events`.
-
-**Scaling**: 1 replica. Basso traffico.
+| Servizio | Responsabilità | Note |
+|---|---|---|
+| **Storage Service** | S3 blobs CRUD, vector ops, backup | Le funzionalità vector/embed possono restare nel Chat Service per il MVP |
+| **Plugin Service** | Marketplace, install, revenue split | Feature non critica per il lancio |

 ---

-## 2. WebSocket con Scaling Orizzontale — Il Problema Chiave
+## 2. Tier Check — Dove e Come

-### Il problema attuale
+Il tier dell'utente (free/pro/power/team) determina rate-limiting, quote e accesso a funzionalità. Con i microservizi, **ogni servizio controlla il tier autonomamente** senza chiamare l'Auth Service.
+
+### Strategia: Tier nel JWT
+
+L'Auth Service include il `tier` come claim nel JWT al momento del login/refresh:
+
+```json
+{
+  "sub": "user_123",
+  "tier": "pro",
+  "exp": 1742515200,
+  "iat": 1742511600
+}
+```
+
+Ogni servizio:
+1. Decodifica il JWT con la chiave pubblica (già lo fa per l'auth)
+2. Legge `payload["tier"]` — **zero chiamate extra**
+3. Applica le sue regole di enforcement localmente
+
+```python
+# shared/auth.py — dependency FastAPI condivisa
+from fastapi import Depends, HTTPException, Request
+from jose import jwt
+
+PUBLIC_KEY = ...
+
+class CurrentUser:
+    def __init__(self, user_id: str, tier: str):
+        self.user_id = user_id
+        self.tier = tier
+
+async def get_current_user(request: Request) -> CurrentUser:
+    token = request.headers.get("Authorization", "").removeprefix("Bearer ")
+    payload = jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
+    return CurrentUser(user_id=payload["sub"], tier=payload["tier"])
+
+def require_tier(*allowed_tiers: str):
+    """Dependency che blocca se il tier non è tra quelli ammessi."""
+    async def check(user: CurrentUser = Depends(get_current_user)):
+        if user.tier not in allowed_tiers:
+            raise HTTPException(403, "Tier insufficient")
+        return user
+    return check
+```
+
+### Cosa succede quando il tier cambia (upgrade/downgrade)?
+
+```
+┌──────────┐  Stripe webhook   ┌──────────┐  tier_changed   ┌──────────┐
+│  Stripe  │ ─────────────────►│ Billing  │ ───────────────►│   Auth   │
+│          │                    │ Service  │  (Redis pub/sub) │ Service  │
+└──────────┘                    └──────────┘                  └────┬─────┘
+                                                                   │
+                                                          UPDATE users
+                                                          SET tier = 'power'
+                                                                   │
+                                                    Al prossimo /refresh
+                                                    il JWT conterrà tier='power'
+```
+
+**Latenza del cambio**: Il tier si propaga al prossimo token refresh (tipicamente 15–30 min, o il client può forzare un refresh immediato dopo il checkout). Per il billing webhook, il downgrade può essere forzato invalidando il refresh token su Redis → il client è obbligato a ri-autenticarsi.
+
+### Dove si applica in ciascun servizio
+
+| Servizio | Enforcement |
+|---|---|
+| **Auth Service** | Nessuno (è lui che scrive il tier) |
+| **Chat Service** | Rate-limit per tier (req/min), quota messaggi |
+| **Agent Service** | Max agent configs, max runs/day, max concurrent batches |
+| **Billing Service** | Nessuno (gestisce i tier, non li consuma) |
+
+### Rate-limit distribuito via Redis
+
+Poiché ogni servizio ha le sue repliche, il rate-limiting deve essere **condiviso** via Redis:
+
+```python
+# shared/middleware/rate_limit.py
+import redis.asyncio as aioredis
+
+class DistributedRateLimiter:
+    def __init__(self, redis: aioredis.Redis):
+        self._redis = redis
+
+    async def check(self, user_id: str, tier: str, service: str) -> bool:
+        limits = {"free": 20, "pro": 60, "power": 120, "team": 200}
+        max_req = limits.get(tier, 20)
+        key = f"rate:{service}:{user_id}"
+
+        pipe = self._redis.pipeline()
+        pipe.incr(key)
+        pipe.expire(key, 60)
+        count, _ = await pipe.execute()
+
+        return count <= max_req
+```
+
+---
+
+## 3. WebSocket con Scaling Orizzontale — Il Problema Chiave

 `DeviceConnectionManager` è un **singleton in-memory**:

@@ -354,7 +473,7 @@ class RedisDeviceManager:

 ---

-## 3. Struttura Directory Proposta
+## 4. Struttura Directory Proposta (MVP)

 ```
 adiuva-api/
@@ -364,7 +483,7 @@ adiuva-api/
 │   ├── auth.py                 # JWT verification (chiave pubblica)
 │   ├── schemas.py              # Pydantic schemas condivisi
 │   ├── middleware/
-│   │   ├── rate_limit.py
+│   │   ├── rate_limit.py       # DistributedRateLimiter (Redis)
 │   │   └── sanitizer.py
 │   └── models/
 │       └── base.py             # SQLAlchemy base condivisa
@@ -390,42 +509,45 @@ adiuva-api/
 │       ├── main.py
 │       ├── config.py
 │       ├── db.py
-│       ├── models.py           # agent_run_logs, memory_*
+│       ├── models.py           # memory_*
 │       ├── routes/
-│       │   ├── device_ws.py
-│       │   ├── chat.py
-│       │   └── agents.py
+│       │   ├── device_ws.py    # WS connection owner
+│       │   └── chat.py         # REST fallback
 │       ├── core/
 │       │   ├── device_manager.py   # RedisDeviceManager
-│       │   ├── deep_agent.py
-│       │   ├── agent_runner.py
-│       │   ├── agent_registry.py
+│       │   ├── deep_agent.py       # Home + floating chat
 │       │   ├── memory_middleware.py
 │       │   ├── ws_context.py
 │       │   ├── output_formatter.py
 │       │   └── llm.py
-│       └── agents/
+│       └── agents/                 # Tool definitions (used by deep_agent)
 │           ├── task_agent.py
 │           ├── project_agent.py
 │           ├── note_agent.py
-│           ├── timeline_agent.py
-│           └── filesystem_agent.py
+│           └── timeline_agent.py
 │
-├── storage-service/
+├── agent-service/
 │   ├── Dockerfile
 │   ├── requirements.txt
 │   └── app/
 │       ├── main.py
 │       ├── config.py
 │       ├── db.py
-│       ├── models.py           # storage_records, backup_metadata
+│       ├── models.py           # agent_run_logs, local/cloud_agent_configs
 │       ├── routes/
-│       │   ├── storage.py
-│       │   ├── vectors.py
-│       │   └── backup.py
-│       └── services/
-│           ├── blob_store.py
-│           └── vector_store.py
+│       │   ├── agents.py       # catalog, can-create, trigger
+│       │   └── agent_setup.py  # journey start/message
+│       ├── core/
+│       │   ├── agent_runner.py     # Batch classify → process
+│       │   ├── agent_registry.py
+│       │   ├── redis_executor.py   # execute_on_client via Redis pub/sub
+│       │   └── llm.py
+│       └── agents/
+│           ├── task_agent.py       # Tool definitions (batch context)
+│           ├── project_agent.py
+│           ├── note_agent.py
+│           ├── timeline_agent.py
+│           └── filesystem_agent.py
 │
 ├── billing-service/
 │   ├── Dockerfile
@@ -441,26 +563,18 @@ adiuva-api/
 │           ├── stripe_service.py
 │           └── tier_manager.py
 │
-├── plugin-service/
-│   ├── Dockerfile
-│   ├── requirements.txt
-│   └── app/
-│       ├── main.py
-│       ├── config.py
-│       ├── db.py
-│       ├── models.py           # plugins, installations, revenue
-│       └── routes/
-│           └── plugins.py
-│
 └── infra/
    ├── traefik/
    │   └── traefik.yml
+    ├── keys/
+    │   ├── jwt_private.pem     # Solo auth-service
+    │   └── jwt_public.pem      # Tutti i servizi
    └── alembic/                # Migrazioni condivise o per-servizio
 ```

 ---

-## 4. Docker Compose — Configurazione Completa
+## 5. Docker Compose — Configurazione MVP

 ```yaml
 # docker-compose.yml
@@ -478,14 +592,14 @@ services:
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
-      # Cloudflare gestisce TLS, Traefik riceve HTTP dal proxy
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
    ports:
      - "80:80"
      - "443:443"
-      - "8080:8080"   # Dashboard Traefik
+      - "8080:8080"   # Dashboard Traefik (disabilitare in prod)
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
+      - ./infra/certs:/certs:ro
    restart: unless-stopped

  # ══════════════════════════════════════════════════════════
@@ -498,10 +612,12 @@ services:
    env_file: .env
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
      JWT_PRIVATE_KEY_FILE: /run/secrets/jwt_private_key
      SERVICE_NAME: auth
    secrets:
      - jwt_private_key
+      - jwt_public_key
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.auth.rule=PathPrefix(`/api/v1/auth`)"
@@ -509,14 +625,16 @@ services:
    depends_on:
      db:
        condition: service_healthy
+      redis:
+        condition: service_healthy

  # ══════════════════════════════════════════════════════════
-  # Chat Service (scalabile, N repliche)
+  # Chat Service — Real-time WS + Chat (scalabile)
  # ══════════════════════════════════════════════════════════
  chat-service:
    build: ./chat-service
    deploy:
-      replicas: 3
+      replicas: 2
    env_file: .env
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
@@ -527,8 +645,8 @@ services:
      - jwt_public_key
    labels:
      - "traefik.enable=true"
-      # REST routes
-      - "traefik.http.routers.chat.rule=PathPrefix(`/api/v1/chat`) || PathPrefix(`/api/v1/agents`)"
+      # REST chat endpoint
+      - "traefik.http.routers.chat.rule=PathPrefix(`/api/v1/chat`)"
      - "traefik.http.services.chat.loadbalancer.server.port=8000"
      # WebSocket route con sticky session
      - "traefik.http.routers.ws.rule=PathPrefix(`/api/v1/ws`)"
@@ -543,26 +661,29 @@ services:
        condition: service_healthy

  # ══════════════════════════════════════════════════════════
-  # Storage Service (2 repliche)
+  # Agent Service — Batch processing (scalabile indipendentemente)
  # ══════════════════════════════════════════════════════════
-  storage-service:
-    build: ./storage-service
+  agent-service:
+    build: ./agent-service
    deploy:
      replicas: 2
    env_file: .env
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
-      SERVICE_NAME: storage
+      SERVICE_NAME: agent
    secrets:
      - jwt_public_key
    labels:
      - "traefik.enable=true"
-      - "traefik.http.routers.storage.rule=PathPrefix(`/api/v1/storage`) || PathPrefix(`/api/v1/vectors`) || PathPrefix(`/api/v1/backup`)"
-      - "traefik.http.services.storage.loadbalancer.server.port=8000"
+      - "traefik.http.routers.agents.rule=PathPrefix(`/api/v1/agents`)"
+      - "traefik.http.services.agents.loadbalancer.server.port=8000"
    depends_on:
      db:
        condition: service_healthy
+      redis:
+        condition: service_healthy

  # ══════════════════════════════════════════════════════════
  # Billing Service (1 replica)
@@ -589,28 +710,6 @@ services:
      redis:
        condition: service_healthy

-  # ══════════════════════════════════════════════════════════
-  # Plugin Service (1 replica)
-  # ══════════════════════════════════════════════════════════
-  plugin-service:
-    build: ./plugin-service
-    deploy:
-      replicas: 1
-    env_file: .env
-    environment:
-      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
-      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
-      SERVICE_NAME: plugins
-    secrets:
-      - jwt_public_key
-    labels:
-      - "traefik.enable=true"
-      - "traefik.http.routers.plugins.rule=PathPrefix(`/api/v1/plugins`)"
-      - "traefik.http.services.plugins.loadbalancer.server.port=8000"
-    depends_on:
-      db:
-        condition: service_healthy
-
  # ══════════════════════════════════════════════════════════
  # Infrastruttura
  # ══════════════════════════════════════════════════════════
@@ -641,19 +740,6 @@ services:
      retries: 5
    restart: unless-stopped

-  minio:
-    image: minio/minio:latest
-    command: server /data --console-address ":9001"
-    ports:
-      - "9000:9000"
-      - "9001:9001"
-    environment:
-      MINIO_ROOT_USER: minioadmin
-      MINIO_ROOT_PASSWORD: minioadmin
-    volumes:
-      - minio_data:/data
-    restart: unless-stopped
-
  qdrant:
    image: qdrant/qdrant:latest
    volumes:
@@ -669,22 +755,21 @@ secrets:
 volumes:
  postgres_data:
  redis_data:
-  minio_data:
  qdrant_data:
 ```

 ---

-## 5. Configurazione Cloudflare + VPS
+## 6. Configurazione Cloudflare + VPS

-### 5.1 DNS
+### 6.1 DNS

 ```
 api.tuodominio.com  →  A record  →  IP del VPS
                    →  Proxy: ON (orange cloud)
 ```

-### 5.2 Cloudflare Settings
+### 6.2 Cloudflare Settings

 | Setting | Valore | Motivo |
 |---------|--------|--------|
@@ -693,7 +778,7 @@ api.tuodominio.com  →  A record  →  IP del VPS
 | Proxy timeout | **100s** (Enterprise) o default | Le LLM calls possono durare 30s+ |
 | Under Attack Mode | Off (attivare se necessario) | |

-### 5.3 TLS sul VPS
+### 6.3 TLS sul VPS

 Due opzioni:
 - **Opzione A (consigliata)**: Cloudflare Origin Certificate → montato in Traefik
@@ -711,7 +796,7 @@ tls:
      keyFile: /certs/origin-key.pem
 ```

-### 5.4 Rete VPS
+### 6.4 Rete VPS

 ```bash
 # UFW firewall — solo Cloudflare può raggiungere le porte 80/443
@@ -726,9 +811,9 @@ ufw enable

 ---

-## 6. Comunicazione Inter-Servizio
+## 7. Comunicazione Inter-Servizio

-### 6.1 Pattern: Event Bus via Redis Pub/Sub
+### 7.1 Redis Pub/Sub — Event Bus

 ```
 ┌──────────┐  tier_changed:user_123   ┌──────────┐
@@ -736,87 +821,55 @@ ufw enable
 │ Service  │                           │ Service  │
 └──────────┘                           └──────────┘

-┌──────────┐  agent_triggered:user_123 ┌──────────┐
-│  Chat    │ ◄──────────────────────── │  Any     │
+┌──────────┐  tool_call:user_123      ┌──────────┐
+│  Agent   │ ────────────────────────► │   Chat   │
 │ Service  │                           │ Service  │
-└──────────┘                           └──────────┘
+│ (batch)  │ ◄────────────────────────│ (ha WS)  │
+└──────────┘  tool_result:{call_id}    └──────────┘
 ```

-### 6.2 Pattern: HTTP Sincrono (per query semplici)
-
-Il Chat Service può avere bisogno del tier dell'utente per il rate-limiting degli agent. Due strategie:
-
- **Strategia A (preferita)**: Il tier è nel JWT. All'aggiornamento, il Billing Service forza token refresh invalidando i vecchi token su Redis.
- **Strategia B**: Il Chat Service chiama `http://auth-service:8000/internal/user/{id}/tier` (rete Docker interna, non esposta).
-
-### 6.3 Health Checks e Service Discovery
+### 7.2 Health Checks e Service Discovery

 Traefik gestisce automaticamente il service discovery via Docker labels. I servizi non devono conoscersi tra loro — comunicano solo via:
- **Redis pub/sub** (eventi asincroni)
- **Redis hash** (stato condiviso, es. `ws:connections`)
+- **Redis pub/sub** (tool-call cross-instance, tier events)
+- **Redis hash** (stato condiviso: `ws:connections`, rate-limit counters)
 - **PostgreSQL** (dati persistenti condivisi)

 ---

-## 7. Piano di Migrazione Incrementale
+## 8. Piano di Migrazione Incrementale (MVP)

-### Fase 1 — Preparazione (senza rompere nulla)
+### Fase 1 — Preparazione (nel monolite attuale)
 1. Aggiungere Redis al `docker-compose.yml` attuale
-2. Migrare JWT da HS256 → RS256 (backward-compatible: accetta entrambi)
-3. Implementare `RedisDeviceManager` come drop-in replacement
+2. Migrare JWT da HS256 → RS256 (backward-compatible: accetta entrambi per un periodo)
+3. Implementare `RedisDeviceManager` come drop-in replacement del singleton in-memory
 4. Estrarre `shared/` con auth verification, schemas, middleware

-### Fase 2 — Primo split: Auth Service
+### Fase 2 — Auth Service (primo split)
 1. Estrarre `auth.py` routes + models in `auth-service/`
 2. Verificare che i JWT firmati da `auth-service` vengano validati dal monolite
-3. Aggiornare Traefik per routare `/api/v1/auth/*` al nuovo servizio
+3. Aggiungere Traefik e routare `/api/v1/auth/*` al nuovo servizio
 4. Il monolite continua a servire tutto il resto

-### Fase 3 — Storage + Billing + Plugins
-1. Servizi stateless e senza WebSocket → facili da estrarre
-2. Estrarre uno alla volta, testare, routare via Traefik
-3. Il monolite diventa sempre più magro
+### Fase 3 — Billing Service
+1. Estrarre billing routes, Stripe service, tier manager
+2. Configurare Redis pub/sub per `tier_changed` events
+3. Routare via Traefik

-### Fase 4 — Chat Service (il più delicato)
-1. Il monolite residuo **diventa** il Chat Service
-2. Rimuovere i route migrati, tenere solo WS + chat + agents
-3. Testare lo scaling a 2+ istanze con `RedisDeviceManager`
-4. Verificare tool-call cross-instance
+### Fase 4 — Split Chat + Agent (il più delicato)
+1. Il monolite residuo contiene WS + chat + agents
+2. Separare Agent Service: estrarre `agent_runner`, `agent_registry`, `agent_setup`, route `/agents/*`
+3. Implementare `redis_executor.py` nell'Agent Service per tool-call via Redis
+4. Il Chat Service resta proprietario della WS e sottoscrive i canali `tool_call:{user_id}`
+5. Testare: trigger agent dall'Agent Service → tool_call via Redis → Chat Service → WS → device → risposta

-### Fase 5 — Cleanup
-1. Rimuovere il monolite originale
-2. CI/CD pipeline per build/push separati
+### Fase 5 — Scaling test
+1. Scalare Chat Service a 2 repliche, verificare sticky sessions
+2. Scalare Agent Service a 2 repliche, verificare batch processing distribuito
 3. Monitoring (Prometheus + Grafana) per ogni servizio

 ---

-## 8. Rate Limiting Distribuito
-
-Il middleware attuale usa un contatore in-memory per il rate-limiting. Con i microservizi:
-
-```python
-# shared/middleware/rate_limit.py
-import redis.asyncio as aioredis
-
-class DistributedRateLimiter:
-    def __init__(self, redis: aioredis.Redis):
-        self._redis = redis
-
-    async def check(self, user_id: str, tier: str) -> bool:
-        limits = {"free": 20, "pro": 60, "power": 120, "team": 200}
-        max_req = limits.get(tier, 20)
-        key = f"rate:{user_id}"
-
-        pipe = self._redis.pipeline()
-        pipe.incr(key)
-        pipe.expire(key, 60)  # Finestra di 60 secondi
-        count, _ = await pipe.execute()
-
-        return count <= max_req
-```
-
---
-
 ## 9. Monitoraggio e Logging

 ```yaml
@@ -845,35 +898,44 @@ Ogni servizio espone `/metrics` (Prometheus) e scrive log strutturati (JSON) rac

 ---

-## 10. Sizing VPS Minimo Consigliato
+## 10. Sizing VPS Minimo Consigliato (MVP)

 | Componente | CPU | RAM | Note |
 |---|---|---|---|
 | Traefik | 0.25 | 128MB | |
-| Auth Service ×2 | 0.25 ×2 | 128MB ×2 | |
-| Chat Service ×2 | 1.0 ×2 | 1GB ×2 | Il più pesante (LLM calls) |
-| Storage Service ×2 | 0.5 ×2 | 256MB ×2 | I/O bound |
+| Auth Service ×2 | 0.25 ×2 | 128MB ×2 | Stateless, leggero |
+| Chat Service ×2 | 1.0 ×2 | 1GB ×2 | WS + streaming LLM |
+| Agent Service ×2 | 0.75 ×2 | 512MB ×2 | Batch LLM, CPU-bound |
 | Billing Service | 0.25 | 128MB | |
-| Plugin Service | 0.25 | 128MB | |
 | PostgreSQL | 1.0 | 1GB | |
 | Redis | 0.25 | 256MB | |
 | Qdrant | 0.5 | 512MB | |
-| MinIO | 0.25 | 256MB | |
-| **Totale** | **~6 vCPU** | **~5.5 GB** | |
+| **Totale MVP** | **~5.5 vCPU** | **~5 GB** | |

-**Raccomandazione**: VPS con **8 vCPU / 16 GB RAM** per avere margine. Hetzner CPX41 (~€30/mese) o equivalente.
+**Raccomandazione**: VPS con **8 vCPU / 16 GB RAM** per avere margine. Hetzner CPX41 (~€30/mese) o equivalente. Senza Storage/Plugin si risparmia ~1 vCPU e 512MB rispetto alla versione completa.

 ---

-## Riepilogo Decisioni Architetturali
+## Riepilogo Architettura MVP
+
+| Servizio | Repliche | Proprietario di |
+|---|---|---|
+| **Traefik** | 1 | Routing, TLS, sticky sessions |
+| **Auth Service** | 2 | JWT RS256, registrazione, login, profilo |
+| **Chat Service** | 2–N | WebSocket, home/floating chat, streaming |
+| **Agent Service** | 2–N | Batch processing, directory scan, agent setup |
+| **Billing Service** | 1 | Stripe, subscriptions, tier management |

 | Decisione | Scelta | Motivazione |
 |---|---|---|
 | API Gateway | Traefik | Nativo Docker, WebSocket support, service discovery automatico |
 | JWT | RS256 (asimmetrico) | Verifica distribuita senza contattare Auth Service |
-| WebSocket scaling | Redis pub/sub + registry | Cross-instance tool-call routing |
-| Rate limiting | Redis contatori | Distribuito, sliding window |
-| Service communication | Redis pub/sub + HTTP interno | Asincrono per eventi, sincrono per query |
-| Database | PostgreSQL condiviso (un DB, schema separation opzionale) | Semplicità; split DB futuro facile |
-| TLS | Cloudflare Origin Certificate | Zero maintenance, trust Cloudflare |
+| Tier check | Claim nel JWT | Ogni servizio verifica localmente, zero roundtrip |
+| WebSocket scaling | Redis pub/sub + sticky cookies | Cross-instance tool-call routing |
+| Chat ↔ Agent split | Servizi separati | Batch CPU-bound non impatta real-time chat |
+| Agent → Device comms | Redis pub/sub via Chat Service | Agent non possiede la WS, usa un relay |
+| Rate limiting | Redis contatori distribuiti | Sliding window condivisa tra repliche |
+| Database | PostgreSQL condiviso | Semplicità MVP; split DB futuro facile |
+| TLS | Cloudflare Origin Certificate | Zero maintenance |
 | Orchestrazione | Docker Compose | Sufficiente per un singolo VPS |
+| Storage / Plugin | Post-MVP | Non critici per il lancio |