api/docs/MICROSERVICES_ARCHITECTURE.md

# Adiuva — Architettura Microservizi (MVP)

## Panoramica

Il monolite viene suddiviso in **4 servizi MVP** + un **API Gateway (Traefik)**, orchestrati con Docker Compose su un singolo VPS raggiungibile via Cloudflare.

> **Fuori dall'MVP**: Storage Service (S3/backup CRUD) e Plugin Service (marketplace). Verranno aggiunti come servizi indipendenti in una fase successiva.

```
                          ┌──────────────┐
                          │  Cloudflare  │
                          │  (DNS + CDN) │
                          └──────┬───────┘
                                 │ HTTPS / WSS
                          ┌──────▼───────┐
                          │   Traefik    │
                          │ API Gateway  │
                          │  (routing,   │
                          │   TLS, rate  │
                          │   limiting)  │
                          └──────┬───────┘
                                 │
          ┌──────────┬───────────┼───────────┐
          │          │           │           │
    ┌─────▼────┐ ┌───▼───┐ ┌────▼────┐ ┌────▼───┐
    │  Auth    │ │  Chat │ │  Agent  │ │Billing │
    │ Service  │ │Service│ │ Service │ │Service │
    └─────┬────┘ └───┬───┘ └────┬────┘ └────┬───┘
          │          │          │           │
    ┌─────▼──────────▼──────────▼───────────▼────┐
    │              Infrastruttura                 │
    │  PostgreSQL  │  Redis  │  Qdrant            │
    └─────────────────────────────────────────────┘
```

---

## 1. Suddivisione dei Servizi

### 1.1 Auth Service (`auth-service`)

**Responsabilità**: Registrazione, login, refresh token, profilo utente, encryption key.

| Endpoint originale | Metodo |
|---|---|
| `/api/v1/auth/register` | POST |
| `/api/v1/auth/login` | POST |
| `/api/v1/auth/refresh` | POST |
| `/api/v1/auth/me` | GET / PUT |

**Database**: Tabelle `users`, `refresh_tokens` (PostgreSQL condiviso, schema `auth`).

**Modifica chiave — JWT con RS256**:
Il monolite usa un `SECRET_KEY` simmetrico (HS256). Con i microservizi, passare a **RS256** (asimmetrico):
- L'Auth Service firma i JWT con la **chiave privata**.
- Tutti gli altri servizi verificano i JWT con la **chiave pubblica** senza mai contattare l'Auth Service.
- La chiave pubblica viene esposta via `GET /api/v1/auth/.well-known/jwks.json` oppure montata come volume condiviso.

```python
# auth-service/app/auth/jwt.py
from cryptography.hazmat.primitives.asymmetric import rsa
from jose import jwt

PRIVATE_KEY = ...  # Da env/secret
PUBLIC_KEY = ...   # Derivata o da env

def create_access_token(user_id: str, tier: str) -> str:
    return jwt.encode(
        {"sub": user_id, "tier": tier, "exp": ...},
        PRIVATE_KEY,
        algorithm="RS256",
    )
```

```python
# shared/auth.py  (usato da tutti gli altri servizi)
from jose import jwt

PUBLIC_KEY = ...  # Volume montato o fetched da JWKS endpoint

def verify_token(token: str) -> dict:
    return jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
```

**Scaling**: 2 repliche sufficienti, stateless. Rate-limit dedicato su `/login` e `/register`.

---

### 1.2 Chat Service (`chat-service`) ⭐ Real-time

**Responsabilità**: WebSocket device connection, home chat, floating chat, memory middleware, streaming LLM responses verso il client.

Questo servizio gestisce la **connessione persistente** con l'app Electron e le interazioni **real-time** dell'utente (chat home, floating chat). È il proprietario della WebSocket.

| Endpoint | Tipo |
|---|---|
| `/api/v1/ws/device` | WebSocket (connessione persistente) |
| `/api/v1/chat` | POST (REST fallback) |

**Moduli inclusi**: `deep_agent`, `memory_middleware`, `ws_context`, `device_manager` (Redis-backed), `output_formatter`, `llm`, tutti gli agent tools (`task_agent`, `project_agent`, `note_agent`, `timeline_agent`).

**Perché separato dall'Agent Service**: Il Chat Service tiene la WebSocket aperta e risponde in tempo reale (streaming). Scalare aggiungendo repliche è semplice con sticky sessions + Redis pub/sub per il cross-instance routing dei tool_call.

**Scaling**: 2–N repliche. Sticky cookies per le WS + Redis per cross-instance.

---

### 1.3 Agent Service (`agent-service`) ⭐ Batch

**Responsabilità**: Batch agent processing (directory scanning, file classification, entity extraction), agent setup journeys, agent configuration CRUD.

Questo servizio gestisce i processi **long-running** e **CPU-intensive**: scansione filesystem, classificazione file con LLM, estrazione entità in batch. Non possiede la WebSocket — comunica con il device dell'utente tramite **Redis pub/sub** passando per il Chat Service.

| Endpoint | Tipo |
|---|---|
| `/api/v1/agents/catalog` | GET |
| `/api/v1/agents/can-create` | POST |
| `/api/v1/agents/trigger` | POST |
| `/api/v1/agents/journey/start` | POST (o WS relay) |
| `/api/v1/agents/journey/message` | POST (o WS relay) |

**Moduli inclusi**: `agent_runner`, `agent_registry`, `filesystem_agent`, `llm`.

**Flusso tool-call cross-service** (l'Agent Service non ha la WS):

```
┌──────────────┐            ┌──────────────┐            ┌──────────┐
│ Agent Service│            │    Redis     │            │  Chat    │
│ (batch run)  │            │              │            │ Service  │
│              │            │              │            │ (ha WS)  │
│ 1. Needs to  │  PUBLISH   │              │ SUBSCRIBE  │          │
│    read file ├───────────►│tool_call:u123├───────────►│ 2. Invia │
│    from      │            │              │            │    al    │
│    device    │            │              │            │    device│
│              │            │              │            │    via WS│
│              │  SUBSCRIBE │              │  PUBLISH   │          │
│ 4. Riceve   ◄────────────┤tool_result:id│◄───────────┤ 3. Device│
│    risultato │            │              │            │    reply │
└──────────────┘            └──────────────┘            └──────────┘
```

**Scaling**: 1–N repliche. Completamente stateless, scala indipendentemente dalla chat. Ogni replica processa batch job diversi. Può essere scalato a 0 se non ci sono agent attivi (risparmio risorse).

**Vantaggio dello split**: Se 50 utenti triggerano agenti batch contemporaneamente, il Chat Service non ne risente — le risposte real-time rimangono veloci.

---

### 1.4 Billing Service (`billing-service`)

**Responsabilità**: Stripe checkout, webhook, subscription management.

| Endpoint originale | Metodo |
|---|---|
| `/api/v1/billing/checkout` | POST |
| `/api/v1/billing/webhook` | POST |
| `/api/v1/billing/subscription` | GET / DELETE |

**Database**: Tabelle `subscriptions` (schema `billing`).

**Comunicazione inter-servizio**: Quando Stripe invia un webhook e il tier cambia, il Billing Service pubblica un evento su **Redis pub/sub** channel `tier_changed:{user_id}`. L'Auth Service aggiorna il campo `tier` nella tabella users. Al prossimo token refresh il JWT conterrà il tier aggiornato.

**Scaling**: 1 replica sufficiente. Basso traffico.

---

### 1.5 Servizi esclusi dall'MVP

I seguenti servizi verranno aggiunti post-MVP come servizi indipendenti:

| Servizio | Responsabilità | Note |
|---|---|---|
| **Storage Service** | S3 blobs CRUD, vector ops, backup | Le funzionalità vector/embed possono restare nel Chat Service per il MVP |
| **Plugin Service** | Marketplace, install, revenue split | Feature non critica per il lancio |

---

## 2. Tier Check — Dove e Come

Il tier dell'utente (free/pro/power/team) determina rate-limiting, quote e accesso a funzionalità. Con i microservizi, **ogni servizio controlla il tier autonomamente** senza chiamare l'Auth Service.

### Strategia: Tier nel JWT

L'Auth Service include il `tier` come claim nel JWT al momento del login/refresh:

```json
{
  "sub": "user_123",
  "tier": "pro",
  "exp": 1742515200,
  "iat": 1742511600
}
```

Ogni servizio:
1. Decodifica il JWT con la chiave pubblica (già lo fa per l'auth)
2. Legge `payload["tier"]` — **zero chiamate extra**
3. Applica le sue regole di enforcement localmente

```python
# shared/auth.py — dependency FastAPI condivisa
from fastapi import Depends, HTTPException, Request
from jose import jwt

PUBLIC_KEY = ...

class CurrentUser:
    def __init__(self, user_id: str, tier: str):
        self.user_id = user_id
        self.tier = tier

async def get_current_user(request: Request) -> CurrentUser:
    token = request.headers.get("Authorization", "").removeprefix("Bearer ")
    payload = jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
    return CurrentUser(user_id=payload["sub"], tier=payload["tier"])

def require_tier(*allowed_tiers: str):
    """Dependency che blocca se il tier non è tra quelli ammessi."""
    async def check(user: CurrentUser = Depends(get_current_user)):
        if user.tier not in allowed_tiers:
            raise HTTPException(403, "Tier insufficient")
        return user
    return check
```

### Cosa succede quando il tier cambia (upgrade/downgrade)?

```
┌──────────┐  Stripe webhook   ┌──────────┐  tier_changed   ┌──────────┐
│  Stripe  │ ─────────────────►│ Billing  │ ───────────────►│   Auth   │
│          │                    │ Service  │  (Redis pub/sub) │ Service  │
└──────────┘                    └──────────┘                  └────┬─────┘
                                                                   │
                                                          UPDATE users
                                                          SET tier = 'power'
                                                                   │
                                                    Al prossimo /refresh
                                                    il JWT conterrà tier='power'
```

**Latenza del cambio**: Il tier si propaga al prossimo token refresh (tipicamente 15–30 min, o il client può forzare un refresh immediato dopo il checkout). Per il billing webhook, il downgrade può essere forzato invalidando il refresh token su Redis → il client è obbligato a ri-autenticarsi.

### Dove si applica in ciascun servizio

| Servizio | Enforcement |
|---|---|
| **Auth Service** | Nessuno (è lui che scrive il tier) |
| **Chat Service** | Rate-limit per tier (req/min), quota messaggi |
| **Agent Service** | Max agent configs, max runs/day, max concurrent batches |
| **Billing Service** | Nessuno (gestisce i tier, non li consuma) |

### Rate-limit distribuito via Redis

Poiché ogni servizio ha le sue repliche, il rate-limiting deve essere **condiviso** via Redis:

```python
# shared/middleware/rate_limit.py
import redis.asyncio as aioredis

class DistributedRateLimiter:
    def __init__(self, redis: aioredis.Redis):
        self._redis = redis

    async def check(self, user_id: str, tier: str, service: str) -> bool:
        limits = {"free": 20, "pro": 60, "power": 120, "team": 200}
        max_req = limits.get(tier, 20)
        key = f"rate:{service}:{user_id}"

        pipe = self._redis.pipeline()
        pipe.incr(key)
        pipe.expire(key, 60)
        count, _ = await pipe.execute()

        return count <= max_req
```

---

## 3. WebSocket con Scaling Orizzontale — Il Problema Chiave

`DeviceConnectionManager` è un **singleton in-memory**:

```python
class DeviceConnectionManager:
    def __init__(self):
        self._connections: dict[str, DeviceConnection] = {}  # ← In-memory!
```

Con N istanze del Chat Service, il device si connette a **una sola** istanza. Quando un'altra istanza deve inviare un `tool_call` a quel device (es. un agent trigger da un'API call), non trova la connessione.

### La soluzione: Redis Pub/Sub + Registry

```
┌──────────────────────────────────────────────────────────────┐
│                     Redis                                    │
│                                                              │
│  Hash: ws:connections                                        │
│    user_123 → instance_A                                     │
│    user_456 → instance_B                                     │
│                                                              │
│  Pub/Sub channels:                                           │
│    tool_call:{user_id}  → tool call payloads                 │
│    tool_result:{call_id} → tool result payloads              │
│    stream:{user_id}     → text_chunk streaming               │
└──────────────────────────────────────────────────────────────┘

 Instance A (ha WS di user_123)     Instance B (deve chiamare tool su user_123)
 ┌───────────────────────┐          ┌───────────────────────┐
 │  1. Sottoscrive a     │          │  1. Lookup Redis Hash │
 │     tool_call:user_123│          │     → user_123 è su A │
 │                       │          │                       │
 │  2. Riceve tool_call  │◄─────────│  2. PUBLISH           │
 │     da Redis channel  │          │    tool_call:user_123 │
 │                       │          │    {id, action, ...}  │
 │  3. Invia al device   │          │                       │
 │     via WS            │          │  4. SUBSCRIBE         │
 │                       │          │    tool_result:{id}   │
 │  4. Device risponde   │          │                       │
 │     tool_result       │──────────│► 5. Riceve risultato  │
 │                       │          │                       │
 │  5. PUBLISH           │          │                       │
 │    tool_result:{id}   │          │                       │
 └───────────────────────┘          └───────────────────────┘
```

### Implementazione: `RedisDeviceManager`

```python
# chat-service/app/core/device_manager.py

import asyncio
import json
import os
import redis.asyncio as aioredis
from dataclasses import dataclass, field
from fastapi import WebSocket

INSTANCE_ID = os.environ.get("INSTANCE_ID", os.urandom(8).hex())

@dataclass
class LocalConnection:
    ws: WebSocket
    device_id: str
    pending_calls: dict[str, asyncio.Future[dict]] = field(default_factory=dict)


class RedisDeviceManager:
    """Device manager backed by Redis for cross-instance communication."""

    def __init__(self, redis_url: str = "redis://redis:6379"):
        self._redis = aioredis.from_url(redis_url)
        self._pubsub = self._redis.pubsub()
        self._local: dict[str, LocalConnection] = {}  # Solo connessioni locali
        self._remote_futures: dict[str, asyncio.Future[dict]] = {}

    async def start(self):
        """Avvia il listener Redis per tool_call in arrivo."""
        asyncio.create_task(self._listen_tool_calls())

    # ── Registrazione ──

    async def register(self, user_id: str, device_id: str, ws: WebSocket):
        # Registra localmente
        self._local[user_id] = LocalConnection(ws=ws, device_id=device_id)
        # Registra in Redis quale istanza ha la connessione
        await self._redis.hset("ws:connections", user_id, INSTANCE_ID)
        # Sottoscrivi ai tool_call per questo utente
        await self._pubsub.subscribe(f"tool_call:{user_id}")

    async def unregister(self, user_id: str):
        conn = self._local.pop(user_id, None)
        if conn:
            for fut in conn.pending_calls.values():
                if not fut.done():
                    fut.cancel()
        await self._redis.hdel("ws:connections", user_id)
        await self._pubsub.unsubscribe(f"tool_call:{user_id}")

    # ── Presenza ──

    async def is_online(self, user_id: str) -> bool:
        return await self._redis.hexists("ws:connections", user_id)

    # ── Tool-call round-trip (cross-instance) ──

    async def execute_tool_call(self, user_id: str, payload: dict) -> dict:
        """
        Invia un tool_call al device dell'utente.
        Funziona sia che la WS sia locale che su un'altra istanza.
        """
        call_id = payload["id"]

        # Caso 1: connessione locale → invio diretto
        if user_id in self._local:
            conn = self._local[user_id]
            loop = asyncio.get_event_loop()
            fut: asyncio.Future[dict] = loop.create_future()
            conn.pending_calls[call_id] = fut
            await conn.ws.send_text(json.dumps({"type": "tool_call", **payload}))
            return await asyncio.wait_for(fut, timeout=30.0)

        # Caso 2: connessione remota → Redis pub/sub
        loop = asyncio.get_event_loop()
        fut = loop.create_future()
        self._remote_futures[call_id] = fut

        # Sottoscrivi al canale di risposta
        result_channel = f"tool_result:{call_id}"
        await self._pubsub.subscribe(result_channel)

        # Pubblica il tool_call
        await self._redis.publish(
            f"tool_call:{user_id}",
            json.dumps(payload),
        )

        try:
            return await asyncio.wait_for(fut, timeout=30.0)
        finally:
            self._remote_futures.pop(call_id, None)
            await self._pubsub.unsubscribe(result_channel)

    # ── Risoluzione tool_result (da WS locale) ──

    def resolve_local(self, user_id: str, call_id: str, result: dict):
        conn = self._local.get(user_id)
        if conn:
            fut = conn.pending_calls.pop(call_id, None)
            if fut and not fut.done():
                fut.set_result(result)

    async def resolve_and_publish(self, user_id: str, call_id: str, result: dict):
        """Chiamato quando il device locale invia un tool_result."""
        self.resolve_local(user_id, call_id, result)
        # Pubblica anche su Redis per l'istanza remota che aspetta
        await self._redis.publish(
            f"tool_result:{call_id}",
            json.dumps(result),
        )

    # ── Listener Redis ──

    async def _listen_tool_calls(self):
        """Loop che ascolta i tool_call in arrivo da altre istanze."""
        async for message in self._pubsub.listen():
            if message["type"] != "message":
                continue
            channel = message["channel"]
            if isinstance(channel, bytes):
                channel = channel.decode()

            data = json.loads(message["data"])

            if channel.startswith("tool_call:"):
                # Un'altra istanza vuole che inviamo un tool_call al nostro device
                user_id = channel.split(":", 1)[1]
                conn = self._local.get(user_id)
                if conn:
                    await conn.ws.send_text(json.dumps({"type": "tool_call", **data}))

            elif channel.startswith("tool_result:"):
                # Risposta a un tool_call che abbiamo inviato tramite Redis
                call_id = channel.split(":", 1)[1]
                fut = self._remote_futures.pop(call_id, None)
                if fut and not fut.done():
                    fut.set_result(data)

    # ── Stream cross-instance ──

    async def publish_stream_chunk(self, user_id: str, chunk: dict):
        """Pubblica un chunk di streaming su Redis (per REST→WS relay)."""
        await self._redis.publish(f"stream:{user_id}", json.dumps(chunk))
```

---

## 4. Struttura Directory Proposta (MVP)

```
adiuva-api/
├── docker-compose.yml          # Orchestrazione completa
├── docker-compose.dev.yml      # Override per sviluppo locale
├── shared/                     # Codice condiviso (montato come volume)
│   ├── auth.py                 # JWT verification (chiave pubblica)
│   ├── schemas.py              # Pydantic schemas condivisi
│   ├── middleware/
│   │   ├── rate_limit.py       # DistributedRateLimiter (Redis)
│   │   └── sanitizer.py
│   └── models/
│       └── base.py             # SQLAlchemy base condivisa
│
├── auth-service/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── app/
│       ├── main.py
│       ├── config.py
│       ├── db.py
│       ├── models.py           # users, refresh_tokens
│       ├── routes/
│       │   └── auth.py
│       └── services/
│           ├── jwt_service.py  # RS256 signing
│           └── user_service.py
│
├── chat-service/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── app/
│       ├── main.py
│       ├── config.py
│       ├── db.py
│       ├── models.py           # memory_*
│       ├── routes/
│       │   ├── device_ws.py    # WS connection owner
│       │   └── chat.py         # REST fallback
│       ├── core/
│       │   ├── device_manager.py   # RedisDeviceManager
│       │   ├── deep_agent.py       # Home + floating chat
│       │   ├── memory_middleware.py
│       │   ├── ws_context.py
│       │   ├── output_formatter.py
│       │   └── llm.py
│       └── agents/                 # Tool definitions (used by deep_agent)
│           ├── task_agent.py
│           ├── project_agent.py
│           ├── note_agent.py
│           └── timeline_agent.py
│
├── agent-service/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── app/
│       ├── main.py
│       ├── config.py
│       ├── db.py
│       ├── models.py           # agent_run_logs, local/cloud_agent_configs
│       ├── routes/
│       │   ├── agents.py       # catalog, can-create, trigger
│       │   └── agent_setup.py  # journey start/message
│       ├── core/
│       │   ├── agent_runner.py     # Batch classify → process
│       │   ├── agent_registry.py
│       │   ├── redis_executor.py   # execute_on_client via Redis pub/sub
│       │   └── llm.py
│       └── agents/
│           ├── task_agent.py       # Tool definitions (batch context)
│           ├── project_agent.py
│           ├── note_agent.py
│           ├── timeline_agent.py
│           └── filesystem_agent.py
│
├── billing-service/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── app/
│       ├── main.py
│       ├── config.py
│       ├── db.py
│       ├── models.py           # subscriptions
│       ├── routes/
│       │   └── billing.py
│       └── services/
│           ├── stripe_service.py
│           └── tier_manager.py
│
└── infra/
    ├── traefik/
    │   └── traefik.yml
    ├── keys/
    │   ├── jwt_private.pem     # Solo auth-service
    │   └── jwt_public.pem      # Tutti i servizi
    └── alembic/                # Migrazioni condivise o per-servizio
```

---

## 5. Docker Compose — Configurazione MVP

```yaml
# docker-compose.yml

services:

  # ══════════════════════════════════════════════════════════
  # API Gateway
  # ══════════════════════════════════════════════════════════
  traefik:
    image: traefik:v3.2
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"   # Dashboard Traefik (disabilitare in prod)
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./infra/certs:/certs:ro
    restart: unless-stopped

  # ══════════════════════════════════════════════════════════
  # Auth Service (2 repliche)
  # ══════════════════════════════════════════════════════════
  auth-service:
    build: ./auth-service
    deploy:
      replicas: 2
    env_file: .env
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
      REDIS_URL: redis://redis:6379
      JWT_PRIVATE_KEY_FILE: /run/secrets/jwt_private_key
      SERVICE_NAME: auth
    secrets:
      - jwt_private_key
      - jwt_public_key
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.auth.rule=PathPrefix(`/api/v1/auth`)"
      - "traefik.http.services.auth.loadbalancer.server.port=8000"
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  # ══════════════════════════════════════════════════════════
  # Chat Service — Real-time WS + Chat (scalabile)
  # ══════════════════════════════════════════════════════════
  chat-service:
    build: ./chat-service
    deploy:
      replicas: 2
    env_file: .env
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
      REDIS_URL: redis://redis:6379
      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
      SERVICE_NAME: chat
    secrets:
      - jwt_public_key
    labels:
      - "traefik.enable=true"
      # REST chat endpoint
      - "traefik.http.routers.chat.rule=PathPrefix(`/api/v1/chat`)"
      - "traefik.http.services.chat.loadbalancer.server.port=8000"
      # WebSocket route con sticky session
      - "traefik.http.routers.ws.rule=PathPrefix(`/api/v1/ws`)"
      - "traefik.http.routers.ws.service=chat-ws"
      - "traefik.http.services.chat-ws.loadbalancer.server.port=8000"
      - "traefik.http.services.chat-ws.loadbalancer.sticky.cookie.name=ws_affinity"
      - "traefik.http.services.chat-ws.loadbalancer.sticky.cookie.httpOnly=true"
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  # ══════════════════════════════════════════════════════════
  # Agent Service — Batch processing (scalabile indipendentemente)
  # ══════════════════════════════════════════════════════════
  agent-service:
    build: ./agent-service
    deploy:
      replicas: 2
    env_file: .env
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
      REDIS_URL: redis://redis:6379
      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
      SERVICE_NAME: agent
    secrets:
      - jwt_public_key
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.agents.rule=PathPrefix(`/api/v1/agents`)"
      - "traefik.http.services.agents.loadbalancer.server.port=8000"
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  # ══════════════════════════════════════════════════════════
  # Billing Service (1 replica)
  # ══════════════════════════════════════════════════════════
  billing-service:
    build: ./billing-service
    deploy:
      replicas: 1
    env_file: .env
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/adiuva
      REDIS_URL: redis://redis:6379
      JWT_PUBLIC_KEY_FILE: /run/secrets/jwt_public_key
      SERVICE_NAME: billing
    secrets:
      - jwt_public_key
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.billing.rule=PathPrefix(`/api/v1/billing`)"
      - "traefik.http.services.billing.loadbalancer.server.port=8000"
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  # ══════════════════════════════════════════════════════════
  # Infrastruttura
  # ══════════════════════════════════════════════════════════
  db:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: adiuva
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5
    restart: unless-stopped

  qdrant:
    image: qdrant/qdrant:latest
    volumes:
      - qdrant_data:/qdrant/storage
    restart: unless-stopped

secrets:
  jwt_private_key:
    file: ./infra/keys/jwt_private.pem
  jwt_public_key:
    file: ./infra/keys/jwt_public.pem

volumes:
  postgres_data:
  redis_data:
  qdrant_data:
```

---

## 6. Configurazione Cloudflare + VPS

### 6.1 DNS

```
api.tuodominio.com  →  A record  →  IP del VPS
                    →  Proxy: ON (orange cloud)
```

### 6.2 Cloudflare Settings

| Setting | Valore | Motivo |
|---------|--------|--------|
| SSL/TLS mode | **Full (Strict)** | Cloudflare ↔ VPS con certificato valido |
| WebSocket | **ON** | Necessario per `/api/v1/ws/device` |
| Proxy timeout | **100s** (Enterprise) o default | Le LLM calls possono durare 30s+ |
| Under Attack Mode | Off (attivare se necessario) | |

### 6.3 TLS sul VPS

Due opzioni:
- **Opzione A (consigliata)**: Cloudflare Origin Certificate → montato in Traefik
- **Opzione B**: Let's Encrypt via Traefik (con DNS challenge Cloudflare)

```yaml
# traefik.yml — con Cloudflare Origin Certificate
entryPoints:
  websecure:
    address: ":443"

tls:
  certificates:
    - certFile: /certs/origin.pem
      keyFile: /certs/origin-key.pem
```

### 6.4 Rete VPS

```bash
# UFW firewall — solo Cloudflare può raggiungere le porte 80/443
# https://www.cloudflare.com/ips/
ufw default deny incoming
ufw allow from 173.245.48.0/20 to any port 443
ufw allow from 103.21.244.0/22 to any port 443
# ... (tutti gli IP range di Cloudflare)
ufw allow ssh
ufw enable
```

---

## 7. Comunicazione Inter-Servizio

### 7.1 Redis Pub/Sub — Event Bus

```
┌──────────┐  tier_changed:user_123   ┌──────────┐
│ Billing  │ ────────────────────────► │   Auth   │
│ Service  │                           │ Service  │
└──────────┘                           └──────────┘

┌──────────┐  tool_call:user_123      ┌──────────┐
│  Agent   │ ────────────────────────► │   Chat   │
│ Service  │                           │ Service  │
│ (batch)  │ ◄────────────────────────│ (ha WS)  │
└──────────┘  tool_result:{call_id}    └──────────┘
```

### 7.2 Health Checks e Service Discovery

Traefik gestisce automaticamente il service discovery via Docker labels. I servizi non devono conoscersi tra loro — comunicano solo via:
- **Redis pub/sub** (tool-call cross-instance, tier events)
- **Redis hash** (stato condiviso: `ws:connections`, rate-limit counters)
- **PostgreSQL** (dati persistenti condivisi)

---

## 8. Piano di Migrazione Incrementale (MVP)

### Fase 1 — Preparazione (nel monolite attuale)
1. Aggiungere Redis al `docker-compose.yml` attuale
2. Migrare JWT da HS256 → RS256 (backward-compatible: accetta entrambi per un periodo)
3. Implementare `RedisDeviceManager` come drop-in replacement del singleton in-memory
4. Estrarre `shared/` con auth verification, schemas, middleware

### Fase 2 — Auth Service (primo split)
1. Estrarre `auth.py` routes + models in `auth-service/`
2. Verificare che i JWT firmati da `auth-service` vengano validati dal monolite
3. Aggiungere Traefik e routare `/api/v1/auth/*` al nuovo servizio
4. Il monolite continua a servire tutto il resto

### Fase 3 — Billing Service
1. Estrarre billing routes, Stripe service, tier manager
2. Configurare Redis pub/sub per `tier_changed` events
3. Routare via Traefik

### Fase 4 — Split Chat + Agent (il più delicato)
1. Il monolite residuo contiene WS + chat + agents
2. Separare Agent Service: estrarre `agent_runner`, `agent_registry`, `agent_setup`, route `/agents/*`
3. Implementare `redis_executor.py` nell'Agent Service per tool-call via Redis
4. Il Chat Service resta proprietario della WS e sottoscrive i canali `tool_call:{user_id}`
5. Testare: trigger agent dall'Agent Service → tool_call via Redis → Chat Service → WS → device → risposta

### Fase 5 — Scaling test
1. Scalare Chat Service a 2 repliche, verificare sticky sessions
2. Scalare Agent Service a 2 repliche, verificare batch processing distribuito
3. Monitoring (Prometheus + Grafana) per ogni servizio

---

## 9. Monitoraggio e Logging

```yaml
# Aggiungere al docker-compose.yml

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./infra/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    restart: unless-stopped

  loki:
    image: grafana/loki:latest
    restart: unless-stopped
```

Ogni servizio espone `/metrics` (Prometheus) e scrive log strutturati (JSON) raccolti da Loki.

---

## 10. Sizing VPS Minimo Consigliato (MVP)

| Componente | CPU | RAM | Note |
|---|---|---|---|
| Traefik | 0.25 | 128MB | |
| Auth Service ×2 | 0.25 ×2 | 128MB ×2 | Stateless, leggero |
| Chat Service ×2 | 1.0 ×2 | 1GB ×2 | WS + streaming LLM |
| Agent Service ×2 | 0.75 ×2 | 512MB ×2 | Batch LLM, CPU-bound |
| Billing Service | 0.25 | 128MB | |
| PostgreSQL | 1.0 | 1GB | |
| Redis | 0.25 | 256MB | |
| Qdrant | 0.5 | 512MB | |
| **Totale MVP** | **~5.5 vCPU** | **~5 GB** | |

**Raccomandazione**: VPS con **8 vCPU / 16 GB RAM** per avere margine. Hetzner CPX41 (~€30/mese) o equivalente. Senza Storage/Plugin si risparmia ~1 vCPU e 512MB rispetto alla versione completa.

---

## Riepilogo Architettura MVP

| Servizio | Repliche | Proprietario di |
|---|---|---|
| **Traefik** | 1 | Routing, TLS, sticky sessions |
| **Auth Service** | 2 | JWT RS256, registrazione, login, profilo |
| **Chat Service** | 2–N | WebSocket, home/floating chat, streaming |
| **Agent Service** | 2–N | Batch processing, directory scan, agent setup |
| **Billing Service** | 1 | Stripe, subscriptions, tier management |

| Decisione | Scelta | Motivazione |
|---|---|---|
| API Gateway | Traefik | Nativo Docker, WebSocket support, service discovery automatico |
| JWT | RS256 (asimmetrico) | Verifica distribuita senza contattare Auth Service |
| Tier check | Claim nel JWT | Ogni servizio verifica localmente, zero roundtrip |
| WebSocket scaling | Redis pub/sub + sticky cookies | Cross-instance tool-call routing |
| Chat ↔ Agent split | Servizi separati | Batch CPU-bound non impatta real-time chat |
| Agent → Device comms | Redis pub/sub via Chat Service | Agent non possiede la WS, usa un relay |
| Rate limiting | Redis contatori distribuiti | Sliding window condivisa tra repliche |
| Database | PostgreSQL condiviso | Semplicità MVP; split DB futuro facile |
| TLS | Cloudflare Origin Certificate | Zero maintenance |
| Orchestrazione | Docker Compose | Sufficiente per un singolo VPS |
| Storage / Plugin | Post-MVP | Non critici per il lancio |