diff --git a/docs/MICROSERVICES_ARCHITECTURE.md b/docs/MICROSERVICES_ARCHITECTURE.md
new file mode 100644
index 0000000..ba21156
--- /dev/null
+++ b/docs/MICROSERVICES_ARCHITECTURE.md
@@ -0,0 +1,879 @@
+# Adiuva — Architettura Microservizi
+
+## Panoramica
+
+Il monolite attuale viene suddiviso in **5 servizi** + un **API Gateway**, orchestrati con Docker Compose e raggiungibili tramite dominio su Cloudflare.
+
+```
+                          ┌──────────────┐
+                          │  Cloudflare  │
+                          │  (DNS + CDN) │
+                          └──────┬───────┘
+                                 │ HTTPS / WSS
+                          ┌──────▼───────┐
+                          │   Traefik    │
+                          │ API Gateway  │
+                          │  (routing,   │
+                          │   TLS term.) │
+                          └──────┬───────┘
+                                 │
+          ┌──────────┬───────────┼───────────┬──────────┐
+          │          │           │           │          │
+    ┌─────▼────┐ ┌───▼───┐ ┌────▼────┐ ┌────▼───┐ ┌───▼─────┐
+    │  Auth    │ │  Chat │ │ Storage │ │Billing │ │ Plugins │
+    │ Service  │ │Service│ │ Service │ │Service │ │ Service │
+    └─────┬────┘ └───┬───┘ └────┬────┘ └────┬───┘ └───┬─────┘
+          │          │          │           │          │
+    ┌─────▼──────────▼──────────▼───────────▼──────────▼─────┐
+    │                   Infrastruttura                       │
+    │  PostgreSQL │ Redis │ MinIO (S3) │ Qdrant │ (Pinecone) │
+    └────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 1. Suddivisione dei Servizi
+
+### 1.1 Auth Service (`auth-service`)
+
+**Responsabilità**: Registrazione, login, refresh token, profilo utente, encryption key.
+
+| Endpoint originale | Metodo |
+|---|---|
+| `/api/v1/auth/register` | POST |
+| `/api/v1/auth/login` | POST |
+| `/api/v1/auth/refresh` | POST |
+| `/api/v1/auth/me` | GET / PUT |
+
+**Database**: Tabelle `users`, `refresh_tokens` (PostgreSQL condiviso, schema `auth`).
+
+**Modifica chiave — JWT con RS256**:
+Il monolite usa un `SECRET_KEY` simmetrico (HS256). Con i microservizi, passare a **RS256** (asimmetrico):
+- L'Auth Service firma i JWT con la **chiave privata**.
+- Tutti gli altri servizi verificano i JWT con la **chiave pubblica** senza mai contattare l'Auth Service.
+- La chiave pubblica viene esposta via `GET /api/v1/auth/.well-known/jwks.json` oppure montata come volume condiviso.
+
+```python
+# auth-service/app/auth/jwt.py
+from cryptography.hazmat.primitives.asymmetric import rsa
+from jose import jwt
+
+PRIVATE_KEY = ...  # Da env/secret
+PUBLIC_KEY = ...   # Derivata o da env
+
+def create_access_token(user_id: str, tier: str) -> str:
+    return jwt.encode(
+        {"sub": user_id, "tier": tier, "exp": ...},
+        PRIVATE_KEY,
+        algorithm="RS256",
+    )
+```
+
+```python
+# shared/auth.py  (usato da tutti gli altri servizi)
+from jose import jwt
+
+PUBLIC_KEY = ...  # Volume montato o fetched da JWKS endpoint
+
+def verify_token(token: str) -> dict:
+    return jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
+```
+
+**Scaling**: 2 repliche sufficienti, stateless. Rate-limit dedicato su `/login` e `/register`.
+
+---
+
+### 1.2 Chat Service (`chat-service`) ⭐ Core
+
+**Responsabilità**: WebSocket device, home chat, floating chat, agent runner, memory middleware, agent setup journeys.
+
+| Endpoint originale | Tipo |
+|---|---|
+| `/api/v1/ws/device` | WebSocket |
+| `/api/v1/chat` | POST (REST fallback) |
+| `/api/v1/agents/catalog` | GET |
+| `/api/v1/agents/can-create` | POST |
+| `/api/v1/agents/trigger` | POST |
+
+**Moduli inclusi**: `deep_agent`, `agent_runner`, `agent_registry`, `memory_middleware`, `ws_context`, `device_manager`, tutti gli agent tools (`task_agent`, `project_agent`, `note_agent`, `timeline_agent`, `filesystem_agent`).
+
+**Questa è la bestia che deve scalare orizzontalmente** — è il servizio più CPU/memory intensive (LLM calls, tool loops, WebSocket persistenti).
+
+---
+
+### 1.3 Storage Service (`storage-service`)
+
+**Responsabilità**: CRUD record crittografati su S3, vector operations, backup.
+
+| Endpoint originale | Metodo |
+|---|---|
+| `/api/v1/storage/records` | POST / GET |
+| `/api/v1/storage/records/{id}` | GET / PUT / DELETE |
+| `/api/v1/vectors/upsert` | POST |
+| `/api/v1/vectors/search` | POST |
+| `/api/v1/vectors/embed` | POST |
+| `/api/v1/vectors` | DELETE |
+| `/api/v1/backup` | PUT / GET / DELETE |
+| `/api/v1/backup/history` | GET |
+
+**Scaling**: 2–3 repliche. I/O bound (S3, Qdrant). Stateless.
+
+---
+
+### 1.4 Billing Service (`billing-service`)
+
+**Responsabilità**: Stripe checkout, webhook, subscription management, tier enforcement.
+
+| Endpoint originale | Metodo |
+|---|---|
+| `/api/v1/billing/checkout` | POST |
+| `/api/v1/billing/webhook` | POST |
+| `/api/v1/billing/subscription` | GET / DELETE |
+
+**Database**: Tabelle `subscriptions` (schema `billing`).
+
+**Comunicazione inter-servizio**: Quando Stripe invia un webhook e il tier cambia, il Billing Service pubblica un evento su **Redis pub/sub** channel `tier_changed:{user_id}`. L'Auth Service aggiorna il campo `tier` nella tabella users (oppure i servizi leggono il tier direttamente dal JWT, aggiornato al prossimo refresh).
+
+**Scaling**: 1 replica sufficiente. Basso traffico.
+
+---
+
+### 1.5 Plugin Service (`plugin-service`)
+
+**Responsabilità**: Marketplace, installazione plugin, revenue split.
+
+| Endpoint originale | Metodo |
+|---|---|
+| `/api/v1/plugins` | GET |
+| `/api/v1/plugins/{id}` | GET |
+| `/api/v1/plugins/{id}/install` | POST / DELETE |
+
+**Database**: Tabelle `plugins`, `plugin_installations`, `revenue_events`.
+
+**Scaling**: 1 replica. Basso traffico.
+
+---
+
+## 2. WebSocket con Scaling Orizzontale — Il Problema Chiave
+
+### Il problema attuale
+
+`DeviceConnectionManager` è un **singleton in-memory**:
+
+```python
+class DeviceConnectionManager:
+    def __init__(self):
+        self._connections: dict[str, DeviceConnection] = {}  # ← In-memory!
+```
+
+Con N istanze del Chat Service, il device si connette a **una sola** istanza. Quando un'altra istanza deve inviare un `tool_call` a quel device (es. un agent trigger da un'API call), non trova la connessione.
+
+### La soluzione: Redis Pub/Sub + Registry
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│                     Redis                                    │
+│                                                              │
+│  Hash: ws:connections                                        │
+│    user_123 → instance_A                                     │
+│    user_456 → instance_B                                     │
+│                                                              │
+│  Pub/Sub channels:                                           │
+│    tool_call:{user_id}  → tool call payloads                 │
+│    tool_result:{call_id} → tool result payloads              │
+│    stream:{user_id}     → text_chunk streaming               │
+└──────────────────────────────────────────────────────────────┘
+
+ Instance A (ha WS di user_123)     Instance B (deve chiamare tool su user_123)
+ ┌───────────────────────┐          ┌───────────────────────┐
+ │  1. Sottoscrive a     │          │  1. Lookup Redis Hash │
+ │     tool_call:user_123│          │     → user_123 è su A │
+ │                       │          │                       │
+ │  2. Riceve tool_call  │◄─────────│  2. PUBLISH           │
+ │     da Redis channel  │          │    tool_call:user_123 │
+ │                       │          │    {id, action, ...}  │
+ │  3. Invia al device   │          │                       │
+ │     via WS            │          │  4. SUBSCRIBE         │
+ │                       │          │    tool_result:{id}   │
+ │  4. Device risponde   │          │                       │
+ │     tool_result       │──────────│► 5. Riceve risultato  │
+ │                       │          │                       │
+ │  5. PUBLISH           │          │                       │
+ │    tool_result:{id}   │          │                       │
+ └───────────────────────┘          └───────────────────────┘
+```
+
+### Implementazione: `RedisDeviceManager`
+
+```python
+# chat-service/app/core/device_manager.py
+
+import asyncio
+import json
+import os
+import redis.asyncio as aioredis
+from dataclasses import dataclass, field
+from fastapi import WebSocket
+
+INSTANCE_ID = os.environ.get("INSTANCE_ID", os.urandom(8).hex())
+
+@dataclass
+class LocalConnection:
+    ws: WebSocket
+    device_id: str
+    pending_calls: dict[str, asyncio.Future[dict]] = field(default_factory=dict)
+
+
+class RedisDeviceManager:
+    """Device manager backed by Redis for cross-instance communication."""
+
+    def __init__(self, redis_url: str = "redis://redis:6379"):
+        self._redis = aioredis.from_url(redis_url)
+        self._pubsub = self._redis.pubsub()
+        self._local: dict[str, LocalConnection] = {}  # Solo connessioni locali
+        self._remote_futures: dict[str, asyncio.Future[dict]] = {}
+
+    async def start(self):
+        """Avvia il listener Redis per tool_call in arrivo."""
+        asyncio.create_task(self._listen_tool_calls())
+
+    # ── Registrazione ──
+
+    async def register(self, user_id: str, device_id: str, ws: WebSocket):
+        # Registra localmente
+        self._local[user_id] = LocalConnection(ws=ws, device_id=device_id)
+        # Registra in Redis quale istanza ha la connessione
+        await self._redis.hset("ws:connections", user_id, INSTANCE_ID)
+        # Sottoscrivi ai tool_call per questo utente
+        await self._pubsub.subscribe(f"tool_call:{user_id}")
+
+    async def unregister(self, user_id: str):
+        conn = self._local.pop(user_id, None)
+        if conn:
+            for fut in conn.pending_calls.values():
+                if not fut.done():
+                    fut.cancel()
+        await self._redis.hdel("ws:connections", user_id)
+        await self._pubsub.unsubscribe(f"tool_call:{user_id}")
+
+    # ── Presenza ──
+
+    async def is_online(self, user_id: str) -> bool:
+        return await self._redis.hexists("ws:connections", user_id)
+
+    # ── Tool-call round-trip (cross-instance) ──
+
+    async def execute_tool_call(self, user_id: str, payload: dict) -> dict:
+        """
+        Invia un tool_call al device dell'utente.
+        Funziona sia che la WS sia locale che su un'altra istanza.
+        """
+        call_id = payload["id"]
+
+        # Caso 1: connessione locale → invio diretto
+        if user_id in self._local:
+            conn = self._local[user_id]
+            loop = asyncio.get_event_loop()
+            fut: asyncio.Future[dict] = loop.create_future()
+            conn.pending_calls[call_id] = fut
+            await conn.ws.send_text(json.dumps({"type": "tool_call", **payload}))
+            return await asyncio.wait_for(fut, timeout=30.0)
+
+        # Caso 2: connessione remota → Redis pub/sub
+        loop = asyncio.get_event_loop()
+        fut = loop.create_future()
+        self._remote_futures[call_id] = fut
+
+        # Sottoscrivi al canale di risposta
+        result_channel = f"tool_result:{call_id}"
+        await self._pubsub.subscribe(result_channel)
+
+        # Pubblica il tool_call
+        await self._redis.publish(
+            f"tool_call:{user_id}",
+            json.dumps(payload),
+        )
+
+        try:
+            return await asyncio.wait_for(fut, timeout=30.0)
+        finally:
+            self._remote_futures.pop(call_id, None)
+            await self._pubsub.unsubscribe(result_channel)
+
+    # ── Risoluzione tool_result (da WS locale) ──
+
+    def resolve_local(self, user_id: str, call_id: str, result: dict):
+        conn = self._local.get(user_id)
+        if conn:
+            fut = conn.pending_calls.pop(call_id, None)
+            if fut and not fut.done():
+                fut.set_result(result)
+
+    async def resolve_and_publish(self, user_id: str, call_id: str, result: dict):
+        """Chiamato quando il device locale invia un tool_result."""
+        self.resolve_local(user_id, call_id, result)
+        # Pubblica anche su Redis per l'istanza remota che aspetta
+        await self._redis.publish(
+            f"tool_result:{call_id}",
+            json.dumps(result),
+        )
+
+    # ── Listener Redis ──
+
+    async def _listen_tool_calls(self):
+        """Loop che ascolta i tool_call in arrivo da altre istanze."""
+        async for message in self._pubsub.listen():
+            if message["type"] != "message":
+                continue
+            channel = message["channel"]
+            if isinstance(channel, bytes):
+                channel = channel.decode()
+
+            data = json.loads(message["data"])
+
+            if channel.startswith("tool_call:"):
+                # Un'altra istanza vuole che inviamo un tool_call al nostro device
+                user_id = channel.split(":", 1)[1]
+                conn = self._local.get(user_id)
+                if conn:
+                    await conn.ws.send_text(json.dumps({"type": "tool_call", **data}))
+
+            elif channel.startswith("tool_result:"):
+                # Risposta a un tool_call che abbiamo inviato tramite Redis
+                call_id = channel.split(":", 1)[1]
+                fut = self._remote_futures.pop(call_id, None)
+                if fut and not fut.done():
+                    fut.set_result(data)
+
+    # ── Stream cross-instance ──
+
+    async def publish_stream_chunk(self, user_id: str, chunk: dict):
+        """Pubblica un chunk di streaming su Redis (per REST→WS relay)."""
+        await self._redis.publish(f"stream:{user_id}", json.dumps(chunk))
+```
+
+---
+
+## 3. Struttura Directory Proposta
+
+```
+adiuva-api/
+├── docker-compose.yml          # Orchestrazione completa
+├── docker-compose.dev.yml      # Override per sviluppo locale
+├── shared/                     # Codice condiviso (montato come volume)
+│   ├── auth.py                 # JWT verification (chiave pubblica)
+│   ├── schemas.py              # Pydantic schemas condivisi
+│   ├── middleware/
+│   │   ├── rate_limit.py
+│   │   └── sanitizer.py
+│   └── models/
+│       └── base.py             # SQLAlchemy base condivisa
+│
+├── auth-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # users, refresh_tokens
+│       ├── routes/
+│       │   └── auth.py
+│       └── services/
+│           ├── jwt_service.py  # RS256 signing
+│           └── user_service.py
+│
+├── chat-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # agent_run_logs, memory_*
+│       ├── routes/
+│       │   ├── device_ws.py
+│       │   ├── chat.py
+│       │   └── agents.py
+│       ├── core/
+│       │   ├── device_manager.py   # RedisDeviceManager
+│       │   ├── deep_agent.py
+│       │   ├── agent_runner.py
+│       │   ├── agent_registry.py
+│       │   ├── memory_middleware.py
+│       │   ├── ws_context.py
+│       │   ├── output_formatter.py
+│       │   └── llm.py
+│       └── agents/
+│           ├── task_agent.py
+│           ├── project_agent.py
+│           ├── note_agent.py
+│           ├── timeline_agent.py
+│           └── filesystem_agent.py
+│
+├── storage-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # storage_records, backup_metadata
+│       ├── routes/
+│       │   ├── storage.py
+│       │   ├── vectors.py
+│       │   └── backup.py
+│       └── services/
+│           ├── blob_store.py
+│           └── vector_store.py
+│
+├── billing-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # subscriptions
+│       ├── routes/
+│       │   └── billing.py
+│       └── services/
+│           ├── stripe_service.py
+│           └── tier_manager.py
+│
+├── plugin-service/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── app/
+│       ├── main.py
+│       ├── config.py
+│       ├── db.py
+│       ├── models.py           # plugins, installations, revenue
+│       └── routes/
+│           └── plugins.py
+│
+└── infra/
+    ├── traefik/
+    │   └── traefik.yml
+    └── alembic/                # Migrazioni condivise o per-servizio
+```
+
+---
+
+## 4. Docker Compose — Configurazione Completa
+
+```yaml
+# docker-compose.yml
+
+services:
+
+  # ══════════════════════════════════════════════════════════
+  # API Gateway
+  # ══════════════════════════════════════════════════════════
+  traefik:
+    image: traefik:v3.2
+    command:
+      - "--api.insecure=true"
+      - "--providers.docker=true"
+      - "--providers.docker.exposedbydefault=false"
+      - "--entrypoints.web.address=:80"
+      - "--entrypoints.websecure.address=:443"
+      # Cloudflare gestisce TLS, Traefik riceve HTTP dal proxy
+      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
+    ports:
+      - "80:80"
+      - "443:443"
+      - "8080:8080"   # Dashboard Traefik
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock:ro
+    restart: unless-stopped
+
+  # ══════════════════════════════════════════════════════════
+  # Auth Service (2 repliche)
+  # ══════════════════════════════════════════════════════════
+  auth-service:
+    build: ./auth-service
+    deploy:
+      replicas: 2
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      JWT_PRIVATE_KEY_FILE: /run/secrets/jwt_private_key
+      SERVICE_NAME: auth
+    secrets:
+      - jwt_private_key
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.auth.rule=PathPrefix(`/api/v1/auth`)"
+      - "traefik.http.services.auth.loadbalancer.server.port=8000"
+    depends_on:
+      db:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Chat Service (scalabile, N repliche)
+  # ══════════════════════════════════════════════════════════
+  chat-service:
+    build: ./chat-service
+    deploy:
+      replicas: 3
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
+      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
+      SERVICE_NAME: chat
+    secrets:
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      # REST routes
+      - "traefik.http.routers.chat.rule=PathPrefix(`/api/v1/chat`) || PathPrefix(`/api/v1/agents`)"
+      - "traefik.http.services.chat.loadbalancer.server.port=8000"
+      # WebSocket route con sticky session
+      - "traefik.http.routers.ws.rule=PathPrefix(`/api/v1/ws`)"
+      - "traefik.http.routers.ws.service=chat-ws"
+      - "traefik.http.services.chat-ws.loadbalancer.server.port=8000"
+      - "traefik.http.services.chat-ws.loadbalancer.sticky.cookie.name=ws_affinity"
+      - "traefik.http.services.chat-ws.loadbalancer.sticky.cookie.httpOnly=true"
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Storage Service (2 repliche)
+  # ══════════════════════════════════════════════════════════
+  storage-service:
+    build: ./storage-service
+    deploy:
+      replicas: 2
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
+      SERVICE_NAME: storage
+    secrets:
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.storage.rule=PathPrefix(`/api/v1/storage`) || PathPrefix(`/api/v1/vectors`) || PathPrefix(`/api/v1/backup`)"
+      - "traefik.http.services.storage.loadbalancer.server.port=8000"
+    depends_on:
+      db:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Billing Service (1 replica)
+  # ══════════════════════════════════════════════════════════
+  billing-service:
+    build: ./billing-service
+    deploy:
+      replicas: 1
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      REDIS_URL: redis://redis:6379
+      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
+      SERVICE_NAME: billing
+    secrets:
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.billing.rule=PathPrefix(`/api/v1/billing`)"
+      - "traefik.http.services.billing.loadbalancer.server.port=8000"
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Plugin Service (1 replica)
+  # ══════════════════════════════════════════════════════════
+  plugin-service:
+    build: ./plugin-service
+    deploy:
+      replicas: 1
+    env_file: .env
+    environment:
+      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
+      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
+      SERVICE_NAME: plugins
+    secrets:
+      - jwt_public_key
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.plugins.rule=PathPrefix(`/api/v1/plugins`)"
+      - "traefik.http.services.plugins.loadbalancer.server.port=8000"
+    depends_on:
+      db:
+        condition: service_healthy
+
+  # ══════════════════════════════════════════════════════════
+  # Infrastruttura
+  # ══════════════════════════════════════════════════════════
+  db:
+    image: pgvector/pgvector:pg16
+    environment:
+      POSTGRES_USER: postgres
+      POSTGRES_PASSWORD: postgres
+      POSTGRES_DB: adiuva
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U postgres"]
+      interval: 5s
+      timeout: 5s
+      retries: 5
+    restart: unless-stopped
+
+  redis:
+    image: redis:7-alpine
+    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
+    volumes:
+      - redis_data:/data
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 5s
+      timeout: 3s
+      retries: 5
+    restart: unless-stopped
+
+  minio:
+    image: minio/minio:latest
+    command: server /data --console-address ":9001"
+    ports:
+      - "9000:9000"
+      - "9001:9001"
+    environment:
+      MINIO_ROOT_USER: minioadmin
+      MINIO_ROOT_PASSWORD: minioadmin
+    volumes:
+      - minio_data:/data
+    restart: unless-stopped
+
+  qdrant:
+    image: qdrant/qdrant:latest
+    volumes:
+      - qdrant_data:/qdrant/storage
+    restart: unless-stopped
+
+secrets:
+  jwt_private_key:
+    file: ./infra/keys/jwt_private.pem
+  jwt_public_key:
+    file: ./infra/keys/jwt_public.pem
+
+volumes:
+  postgres_data:
+  redis_data:
+  minio_data:
+  qdrant_data:
+```
+
+---
+
+## 5. Configurazione Cloudflare + VPS
+
+### 5.1 DNS
+
+```
+api.tuodominio.com  →  A record  →  IP del VPS
+                    →  Proxy: ON (orange cloud)
+```
+
+### 5.2 Cloudflare Settings
+
+| Setting | Valore | Motivo |
+|---------|--------|--------|
+| SSL/TLS mode | **Full (Strict)** | Cloudflare ↔ VPS con certificato valido |
+| WebSocket | **ON** | Necessario per `/api/v1/ws/device` |
+| Proxy timeout | **100s** (Enterprise) o default | Le LLM calls possono durare 30s+ |
+| Under Attack Mode | Off (attivare se necessario) | |
+
+### 5.3 TLS sul VPS
+
+Due opzioni:
+- **Opzione A (consigliata)**: Cloudflare Origin Certificate → montato in Traefik
+- **Opzione B**: Let's Encrypt via Traefik (con DNS challenge Cloudflare)
+
+```yaml
+# traefik.yml — con Cloudflare Origin Certificate
+entryPoints:
+  websecure:
+    address: ":443"
+
+tls:
+  certificates:
+    - certFile: /certs/origin.pem
+      keyFile: /certs/origin-key.pem
+```
+
+### 5.4 Rete VPS
+
+```bash
+# UFW firewall — solo Cloudflare può raggiungere le porte 80/443
+# https://www.cloudflare.com/ips/
+ufw default deny incoming
+ufw allow from 173.245.48.0/20 to any port 443
+ufw allow from 103.21.244.0/22 to any port 443
+# ... (tutti gli IP range di Cloudflare)
+ufw allow ssh
+ufw enable
+```
+
+---
+
+## 6. Comunicazione Inter-Servizio
+
+### 6.1 Pattern: Event Bus via Redis Pub/Sub
+
+```
+┌──────────┐  tier_changed:user_123   ┌──────────┐
+│ Billing  │ ────────────────────────► │   Auth   │
+│ Service  │                           │ Service  │
+└──────────┘                           └──────────┘
+
+┌──────────┐  agent_triggered:user_123 ┌──────────┐
+│  Chat    │ ◄──────────────────────── │  Any     │
+│ Service  │                           │ Service  │
+└──────────┘                           └──────────┘
+```
+
+### 6.2 Pattern: HTTP Sincrono (per query semplici)
+
+Il Chat Service può avere bisogno del tier dell'utente per il rate-limiting degli agent. Due strategie:
+
+- **Strategia A (preferita)**: Il tier è nel JWT. All'aggiornamento, il Billing Service forza token refresh invalidando i vecchi token su Redis.
+- **Strategia B**: Il Chat Service chiama `http://auth-service:8000/internal/user/{id}/tier` (rete Docker interna, non esposta).
+
+### 6.3 Health Checks e Service Discovery
+
+Traefik gestisce automaticamente il service discovery via Docker labels. I servizi non devono conoscersi tra loro — comunicano solo via:
+- **Redis pub/sub** (eventi asincroni)
+- **Redis hash** (stato condiviso, es. `ws:connections`)
+- **PostgreSQL** (dati persistenti condivisi)
+
+---
+
+## 7. Piano di Migrazione Incrementale
+
+### Fase 1 — Preparazione (senza rompere nulla)
+1. Aggiungere Redis al `docker-compose.yml` attuale
+2. Migrare JWT da HS256 → RS256 (backward-compatible: accetta entrambi)
+3. Implementare `RedisDeviceManager` come drop-in replacement
+4. Estrarre `shared/` con auth verification, schemas, middleware
+
+### Fase 2 — Primo split: Auth Service
+1. Estrarre `auth.py` routes + models in `auth-service/`
+2. Verificare che i JWT firmati da `auth-service` vengano validati dal monolite
+3. Aggiornare Traefik per routare `/api/v1/auth/*` al nuovo servizio
+4. Il monolite continua a servire tutto il resto
+
+### Fase 3 — Storage + Billing + Plugins
+1. Servizi stateless e senza WebSocket → facili da estrarre
+2. Estrarre uno alla volta, testare, routare via Traefik
+3. Il monolite diventa sempre più magro
+
+### Fase 4 — Chat Service (il più delicato)
+1. Il monolite residuo **diventa** il Chat Service
+2. Rimuovere i route migrati, tenere solo WS + chat + agents
+3. Testare lo scaling a 2+ istanze con `RedisDeviceManager`
+4. Verificare tool-call cross-instance
+
+### Fase 5 — Cleanup
+1. Rimuovere il monolite originale
+2. CI/CD pipeline per build/push separati
+3. Monitoring (Prometheus + Grafana) per ogni servizio
+
+---
+
+## 8. Rate Limiting Distribuito
+
+Il middleware attuale usa un contatore in-memory per il rate-limiting. Con i microservizi:
+
+```python
+# shared/middleware/rate_limit.py
+import redis.asyncio as aioredis
+
+class DistributedRateLimiter:
+    def __init__(self, redis: aioredis.Redis):
+        self._redis = redis
+
+    async def check(self, user_id: str, tier: str) -> bool:
+        limits = {"free": 20, "pro": 60, "power": 120, "team": 200}
+        max_req = limits.get(tier, 20)
+        key = f"rate:{user_id}"
+
+        pipe = self._redis.pipeline()
+        pipe.incr(key)
+        pipe.expire(key, 60)  # Finestra di 60 secondi
+        count, _ = await pipe.execute()
+
+        return count <= max_req
+```
+
+---
+
+## 9. Monitoraggio e Logging
+
+```yaml
+# Aggiungere al docker-compose.yml
+
+  prometheus:
+    image: prom/prometheus:latest
+    volumes:
+      - ./infra/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
+    restart: unless-stopped
+
+  grafana:
+    image: grafana/grafana:latest
+    ports:
+      - "3000:3000"
+    volumes:
+      - grafana_data:/var/lib/grafana
+    restart: unless-stopped
+
+  loki:
+    image: grafana/loki:latest
+    restart: unless-stopped
+```
+
+Ogni servizio espone `/metrics` (Prometheus) e scrive log strutturati (JSON) raccolti da Loki.
+
+---
+
+## 10. Sizing VPS Minimo Consigliato
+
+| Componente | CPU | RAM | Note |
+|---|---|---|---|
+| Traefik | 0.25 | 128MB | |
+| Auth Service ×2 | 0.25 ×2 | 128MB ×2 | |
+| Chat Service ×2 | 1.0 ×2 | 1GB ×2 | Il più pesante (LLM calls) |
+| Storage Service ×2 | 0.5 ×2 | 256MB ×2 | I/O bound |
+| Billing Service | 0.25 | 128MB | |
+| Plugin Service | 0.25 | 128MB | |
+| PostgreSQL | 1.0 | 1GB | |
+| Redis | 0.25 | 256MB | |
+| Qdrant | 0.5 | 512MB | |
+| MinIO | 0.25 | 256MB | |
+| **Totale** | **~6 vCPU** | **~5.5 GB** | |
+
+**Raccomandazione**: VPS con **8 vCPU / 16 GB RAM** per avere margine. Hetzner CPX41 (~€30/mese) o equivalente.
+
+---
+
+## Riepilogo Decisioni Architetturali
+
+| Decisione | Scelta | Motivazione |
+|---|---|---|
+| API Gateway | Traefik | Nativo Docker, WebSocket support, service discovery automatico |
+| JWT | RS256 (asimmetrico) | Verifica distribuita senza contattare Auth Service |
+| WebSocket scaling | Redis pub/sub + registry | Cross-instance tool-call routing |
+| Rate limiting | Redis contatori | Distribuito, sliding window |
+| Service communication | Redis pub/sub + HTTP interno | Asincrono per eventi, sincrono per query |
+| Database | PostgreSQL condiviso (un DB, schema separation opzionale) | Semplicità; split DB futuro facile |
+| TLS | Cloudflare Origin Certificate | Zero maintenance, trust Cloudflare |
+| Orchestrazione | Docker Compose | Sufficiente per un singolo VPS |