diff --git a/README.md b/README.md new file mode 100644 index 0000000..164794c --- /dev/null +++ b/README.md @@ -0,0 +1,713 @@ +# Adiuva Cloud API + +**AI-powered project management backend with E2E encrypted cloud storage, LLM orchestration, and a plugin marketplace.** + +Built with FastAPI · Python 3.12 · PostgreSQL · LangChain · Stripe · AWS S3 + +--- + +## Table of Contents + +- [Overview](#overview) +- [Architecture](#architecture) +- [Key Features](#key-features) +- [Tech Stack](#tech-stack) +- [Getting Started](#getting-started) +- [Docker Deployment](#docker-deployment) +- [Environment Variables](#environment-variables) +- [API Reference](#api-reference) +- [Data Model](#data-model) +- [AI Agent System](#ai-agent-system) +- [Orchestration & Execution Plans](#orchestration--execution-plans) +- [Middleware](#middleware) +- [Storage Layer](#storage-layer) +- [Billing & Tiers](#billing--tiers) +- [Plugin Marketplace](#plugin-marketplace) +- [Testing](#testing) +- [Project Structure](#project-structure) +- [License](#license) + +--- + +## Overview + +Adiuva Cloud API is the FastAPI backend that powers the **Adiuva Electron desktop app**. It provides LLM-powered chat orchestration, end-to-end encrypted cloud storage, a vector search engine, an encrypted backup system, a plugin marketplace with revenue sharing, and Stripe-based subscription billing across four tiers. + +### Design Principles + +1. **Never persist user data in plaintext** — the database stores only auth, billing, storage metadata, and marketplace data. All user content is E2E encrypted by the client before reaching the server. +2. **Never expose prompts** — system prompts stay server-side; responses are sanitized to strip any leaked prompt fragments. +3. **Never decrypt user blobs** — the backend performs only checksum verification; no decryption keys ever reach the server. +4. **Stateless request handling** — all context comes from the client and JWT; no server-side session state. +5. **Tier gates enforced server-side** — the server always reads the current tier from the database, never trusting client-reported values. + +--- + +## Architecture + +``` +┌──────────────┐ ┌────────────────────────────────────────────────────────┐ +│ Electron │ │ FastAPI (Uvicorn / Gunicorn) │ +│ Desktop App │────▶│ │ +│ (Client) │◀────│ Middleware: RateLimit → Sanitizer → CORS → Router │ +└──────────────┘ │ │ + │ ┌──────────────────┐ ┌────────────────────────────┐ │ + │ │ Auth Routes │ │ Chat Routes │ │ + │ │ Billing Routes │ │ ↓ │ │ + │ │ Storage Routes │ │ Orchestrator (GPT-4o-mini)│ │ + │ │ Backup Routes │ │ ↓ classify intent │ │ + │ │ Plugin Routes │ │ Agent Registry │ │ + │ │ Vector Routes │ │ ↓ │ │ + │ │ Plans Routes │ │ TaskAgent | ProjectAgent │ │ + │ └──────────────────┘ │ NoteAgent | CheckptAgent │ │ + │ │ (GPT-4o + LangChain) │ │ + │ └────────────────────────────┘ │ + └────────────────────────────────────────────────────────┘ + │ │ │ + ┌────────▼───┐ ┌───────▼───────┐ ┌──▼─────────────┐ + │ PostgreSQL │ │ AWS S3 │ │ Pinecone / │ + │ (Auth, │ │ (E2E blobs, │ │ Qdrant │ + │ Billing, │ │ backups) │ │ (Vectors) │ + │ Metadata) │ └───────────────┘ └────────────────┘ + └────────────┘ + │ + ┌────────▼───┐ + │ Stripe │ + │ (Billing, │ + │ Connect) │ + └────────────┘ +``` + +--- + +## Key Features + +1. **LLM-powered orchestration** — GPT-4o-mini classifies user intent and routes to the appropriate domain agent. +2. **4 specialized AI agents** — Tasks (8 tools), Projects (6 tools), Checkpoints (4 tools), Notes (5 tools), all powered by GPT-4o via LangChain. +3. **Execution plans & playbooks** — Server-side prompt template registry; clients receive only opaque template IDs, never raw prompts. +4. **E2E encrypted cloud storage** — The backend never decrypts user data; SHA-256 checksum verification uses constant-time comparison to prevent timing attacks. +5. **Cloud vector store** — Pinecone or Qdrant with user-isolated namespaces and encrypted blob payloads. +6. **Encrypted backup system** — Tiered storage limits with `If-Modified-Since` support for efficient syncing. +7. **Plugin marketplace** — Catalog, admin review/approval workflow, security checklist, and 70/30 revenue sharing via Stripe Connect. +8. **Stripe billing** — Four-tier subscription model (Free / Pro / Power / Team) with checkout sessions and full webhook lifecycle handling. +9. **JWT authentication** — Access + refresh tokens with bcrypt password hashing, SHA-256 token hashing, and automatic rotation. +10. **Prompt IP protection** — Sanitizer middleware strips system prompts, reasoning markers, tool schemas, and agent routing metadata from all chat responses. +11. **Tier-based rate limiting** — Sliding-window per-user limiter scaling from 20 to 200 requests/min by subscription tier. +12. **Zero-trust data model** — User content is never stored in plaintext; the database holds only authentication, billing, and metadata records. +13. **WebSocket streaming** — Real-time chat with 30-second heartbeat keep-alive and chunked text delivery. +14. **Alembic migrations** — Versioned schema management with seed data for the plugin marketplace. +15. **Comprehensive test suite** — In-memory SQLite + moto S3 mocks, per-tier test fixtures, and full API coverage without external dependencies. + +--- + +## Tech Stack + +| Package | Version | Purpose | +|---|---|---| +| `fastapi` | ≥ 0.115.0 | Web framework | +| `uvicorn[standard]` | ≥ 0.34.0 | ASGI development server | +| `gunicorn` | ≥ 22.0.0 | Production process manager | +| `langchain` | ≥ 0.3.0 | LLM orchestration framework | +| `langchain-openai` | ≥ 0.3.0 | OpenAI LLM provider integration | +| `litellm` | ≥ 1.50.0 | Universal LLM gateway (100+ providers) | +| `pydantic` | ≥ 2.10.0 | Data validation and serialization | +| `pydantic-settings` | ≥ 2.7.0 | Environment-based configuration | +| `python-jose[cryptography]` | ≥ 3.3.0 | JWT encoding and decoding | +| `stripe` | ≥ 11.0.0 | Billing and payment integration | +| `boto3` | ≥ 1.35.0 | AWS S3 client | +| `slowapi` | ≥ 0.1.9 | Rate limiting utilities | +| `sqlalchemy` | ≥ 2.0.0 | Async ORM and query builder | +| `asyncpg` | ≥ 0.30.0 | PostgreSQL async driver | +| `alembic` | ≥ 1.14.0 | Database migration management | +| `bcrypt` | ≥ 4.2.0 | Password hashing | +| `python-dotenv` | ≥ 1.0.0 | `.env` file loading | +| `httpx` | ≥ 0.28.0 | Async HTTP client (used in tests) | +| `websockets` | ≥ 14.0 | WebSocket protocol support | +| `psycopg2-binary` | ≥ 2.9.0 | Synchronous PostgreSQL driver (Alembic) | +| `pinecone` | ≥ 5.0.0 | Pinecone vector store client | +| `qdrant-client` | ≥ 1.7.0 | Qdrant vector store client | +| `pytest` | ≥ 8.0.0 | Test framework | +| `pytest-asyncio` | ≥ 0.24.0 | Async test support | +| `aiosqlite` | ≥ 0.20.0 | In-memory SQLite for tests | +| `moto[s3]` | ≥ 5.0.0 | AWS S3 mock for tests | +| `ruff` | ≥ 0.8.0 | Linter and formatter | + +--- + +## Getting Started + +### Prerequisites + +- Python 3.12+ +- PostgreSQL 16+ +- An OpenAI API key (for LLM features) +- Stripe API keys (optional — billing stubs gracefully when unconfigured) +- AWS credentials (optional — needed for S3 storage in production) + +### Installation + +```bash +# Clone the repository +git clone && cd adiuva-api + +# Create a virtual environment +python -m venv .venv && source .venv/bin/activate + +# Install dependencies +pip install -r requirements.txt + +# Configure environment +cp .env.example .env +# Edit .env with your DATABASE_URL, OPENAI_API_KEY, etc. +``` + +### Database Setup + +```bash +# Start PostgreSQL (or use the Docker Compose database) +docker compose up db -d + +# Run migrations +alembic upgrade head +``` + +### Run the Development Server + +```bash +uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 +``` + +Interactive API docs are available at [http://localhost:8000/docs](http://localhost:8000/docs) in development mode (`ENV=dev`). The `/docs` endpoint is disabled in production. + +--- + +## Docker Deployment + +### Quick Start + +```bash +docker compose up --build +``` + +This starts two services: + +- **app** — FastAPI server on port `8000` +- **db** — PostgreSQL 16 (Alpine) on port `5432` with a persistent volume and health checks + +### Dockerfile Details + +The Dockerfile uses a multi-stage build: + +1. **Builder stage** — Installs Python dependencies into a virtual environment. +2. **Runtime stage** — Copies only the venv, app source, and Alembic migrations. Runs as a non-root user (`appuser`). +3. **Production server** — Gunicorn with 4 Uvicorn workers, 120-second timeout, listening on port 8000. + +```bash +# Production command (run by the container) +gunicorn app.main:app -k uvicorn.workers.UvicornWorker -w 4 --timeout 120 -b 0.0.0.0:8000 +``` + +--- + +## Environment Variables + +All variables are loaded from a `.env` file via Pydantic Settings. Source: `app/config/settings.py` + +| Variable | Type | Default | Description | +|---|---|---|---| +| `DATABASE_URL` | `str` | `postgresql+asyncpg://postgres:postgres@localhost:5432/adiuva` | Async SQLAlchemy connection string | +| `JWT_SECRET` | `str` | `change-me-in-production` | HMAC secret for JWT signing | +| `JWT_ALGORITHM` | `str` | `HS256` | JWT signing algorithm | +| `JWT_ACCESS_TOKEN_EXPIRE_MINUTES` | `int` | `30` | Access token time-to-live | +| `JWT_REFRESH_TOKEN_EXPIRE_DAYS` | `int` | `30` | Refresh token time-to-live | +| `STRIPE_SECRET_KEY` | `str` | `""` | Stripe API key (empty = stub mode) | +| `STRIPE_WEBHOOK_SECRET` | `str` | `""` | Stripe webhook signature secret | +| `S3_BUCKET` | `str` | `""` | S3 bucket for encrypted blobs and backups | +| `S3_REGION` | `str` | `us-east-1` | AWS region | +| `AWS_ACCESS_KEY_ID` | `str` | `""` | AWS credentials | +| `AWS_SECRET_ACCESS_KEY` | `str` | `""` | AWS credentials | +| `PINECONE_API_KEY` | `str` | `""` | Pinecone API key (if set, Pinecone is used for vectors) | +| `PINECONE_INDEX` | `str` | `adiuva` | Pinecone index name | +| `QDRANT_URL` | `str` | `""` | Qdrant URL (used when Pinecone is not configured) | +| `QDRANT_API_KEY` | `str` | `""` | Qdrant API key | +| `OPENAI_API_KEY` | `str` | `""` | OpenAI key for LLM agent calls | +| `LLM_MODEL` | `str` | `gpt-4o` | LiteLLM model identifier for agents (e.g. `anthropic/claude-3.5-sonnet`, `gemini/gemini-pro`, `ollama/llama3`) | +| `LLM_ROUTER_MODEL` | `str` | `gpt-4o-mini` | Lighter model used for intent classification / routing | +| `CORS_ORIGINS` | `list[str]` | `["app://.", "http://localhost:3000", "http://localhost:5173"]` | Allowed CORS origins | +| `ENV` | `Literal` | `dev` | `dev` or `prod` — controls `/docs` visibility and SQL echo | + +--- + +## API Reference + +All routes are prefixed with `/api/v1`. **27 endpoints** total (25 REST + 1 WebSocket + 1 health check). + +### Health + +| Method | Path | Auth | Description | +|---|---|---|---| +| `GET` | `/api/v1/health` | No | Returns `{"status": "ok", "version": "0.1.0"}` | + +### Auth + +| Method | Path | Auth | Description | +|---|---|---|---| +| `POST` | `/api/v1/auth/register` | No | Create account with bcrypt-hashed password, returns `AuthTokens` | +| `POST` | `/api/v1/auth/login` | No | Validate credentials, returns `AuthTokens` | +| `POST` | `/api/v1/auth/refresh` | No | Rotate refresh token, returns new `AuthTokens` | +| `GET` | `/api/v1/auth/me` | JWT | Returns `UserProfile` for the authenticated user | + +### Chat + +| Method | Path | Auth | Description | +|---|---|---|---| +| `POST` | `/api/v1/chat` | JWT | Route message through the orchestrator; returns `ChatResponse` or `ExecutionPlan` depending on execution mode | +| `WS` | `/api/v1/chat/stream` | JWT (query param `?token=`) | Streaming chat — first frame is a `ChatRequest`, server yields text chunks, final frame is `{"done": true, "response": "...", "actions": [...]}`. 30-second heartbeat ping. | + +### Plans + +| Method | Path | Auth | Description | +|---|---|---|---| +| `GET` | `/api/v1/plans/playbook` | JWT | List all cached execution plan playbooks | +| `GET` | `/api/v1/plans/playbook/{plan_id}` | JWT | Retrieve a specific playbook by ID | + +### Storage (Cloud Records) + +| Method | Path | Auth | Description | +|---|---|---|---| +| `POST` | `/api/v1/storage/records` | JWT | Upload an E2E encrypted record (verifies checksum, enforces storage quota) | +| `GET` | `/api/v1/storage/records` | JWT | List record metadata with pagination (`?table`, `?page`, `?limit`); no blob bytes returned | +| `GET` | `/api/v1/storage/records/{id}` | JWT | Download encrypted blob with `X-Checksum` response header | +| `PUT` | `/api/v1/storage/records/{id}` | JWT | Replace an existing blob (verifies checksum, enforces quota) | +| `DELETE` | `/api/v1/storage/records/{id}` | JWT | Delete a record and its S3 blob | + +### Vectors (Cloud Vector Store) + +| Method | Path | Auth | Description | +|---|---|---|---| +| `POST` | `/api/v1/storage/vectors/upsert` | JWT | Verify checksums and upsert encrypted vectors | +| `POST` | `/api/v1/storage/vectors/search` | JWT | Search user-scoped vector namespace | +| `DELETE` | `/api/v1/storage/vectors` | JWT | Delete vectors by ID list | + +### Backup + +| Method | Path | Auth | Description | +|---|---|---|---| +| `PUT` | `/api/v1/backup` | JWT | Upload encrypted backup blob with custom headers (`X-Backup-Version`, `X-Backup-Timestamp`, `X-Backup-Checksum`). Tier quota enforced. | +| `GET` | `/api/v1/backup` | JWT | Download latest backup blob. Supports `If-Modified-Since`. | +| `GET` | `/api/v1/backup/history` | JWT | List backup metadata (no blob content) | +| `DELETE` | `/api/v1/backup/{backup_id}` | JWT | Delete a specific backup | + +### Plugins (Marketplace) + +| Method | Path | Auth | Description | +|---|---|---|---| +| `GET` | `/api/v1/plugins` | JWT (Power+) | Browse the marketplace (`?category`, `?q`, `?page`, `?sort=rating\|installs\|newest`) | +| `GET` | `/api/v1/plugins/{id}` | JWT (Power+) | Plugin detail with install count and ratings | +| `POST` | `/api/v1/plugins/{id}/install` | JWT (Power+) | Install plugin; triggers Stripe Connect revenue split for paid plugins | +| `DELETE` | `/api/v1/plugins/{id}/install` | JWT | Uninstall plugin | + +### Billing + +| Method | Path | Auth | Description | +|---|---|---|---| +| `POST` | `/api/v1/billing/checkout` | JWT | Create a Stripe checkout session, returns `{"checkout_url": "..."}` | +| `POST` | `/api/v1/billing/webhook` | Stripe signature | Handle Stripe events: `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, `invoice.payment_failed` | +| `GET` | `/api/v1/billing/subscription` | JWT | Get current subscription information | +| `DELETE` | `/api/v1/billing/subscription` | JWT | Cancel subscription and revert to free tier | + +--- + +## Data Model + +9 tables managed by Alembic migrations. Source: `app/models.py` + +### Tables + +| Table | Primary Key | Key Columns | Purpose | +|---|---|---|---| +| `users` | `id` (UUID) | `email` (unique), `password_hash`, `tier`, `stripe_customer_id`, timestamps | User accounts | +| `refresh_tokens` | `id` (UUID) | `user_id` (FK), `token_hash` (SHA-256, unique), `expires_at` | Hashed refresh tokens for rotation | +| `subscriptions` | `id` (UUID) | `user_id` (FK, unique), `stripe_subscription_id`, `tier`, `status`, `current_period_end` | Stripe subscription records | +| `storage_records` | `id` (UUID) | `user_id` (FK), `table_name`, `s3_key`, `checksum`, `size_bytes`, timestamps | S3 blob metadata (no plaintext content) | +| `backup_metadata` | `id` (UUID) | `user_id` (FK), `s3_key`, `version`, `timestamp`, `checksum`, `size_bytes` | Backup manifests | +| `plugins` | `id` (String) | `name`, `description`, `version`, `author_id` (FK), `category`, `price_cents`, `permissions` (JSON), `status`, `s3_package_key`, `install_count`, `avg_rating` | Marketplace plugin catalog | +| `plugin_installations` | `id` (UUID) | `plugin_id` (FK), `user_id` (FK), unique constraint on (`plugin_id`, `user_id`) | Per-user install tracking | +| `plugin_reviews` | `id` (UUID) | `plugin_id` (FK), `reviewer_id` (FK), `decision`, `notes`, `reviewed_at` | Admin review decisions | +| `revenue_events` | `id` (UUID) | `plugin_id` (FK), `user_id` (FK), `amount_cents`, `developer_share_cents`, `stripe_transfer_id` | 70/30 revenue split ledger | + +### Enum Types + +| Enum | Values | +|---|---| +| `billing_tier` | `free`, `pro`, `power`, `team` | +| `plugin_status` | `pending_review`, `approved`, `rejected` | +| `review_decision` | `approved`, `rejected` | + +### Migrations + +| Version | Description | +|---|---| +| `001_initial_schema` | Creates all 9 tables with indexes and foreign key constraints | +| `002_seed_plugins` | Seeds 3 approved plugins: GitHub Sync (free), Slack Notifier (€4.99), Time Tracker (€9.99) | + +--- + +## AI Agent System + +The agent system uses a registry pattern with LangChain tool-calling agents powered by GPT-4o. Source: `app/agents/`, `app/core/agent_registry.py` + +### Architecture + +- **`BaseAgent`** — Abstract base with `user_id`, `shared_memory`, and `vector_store_context`. +- **`ChatAgent(BaseAgent)`** — Abstract `handle(query, context)` and `get_tools()` methods, plus a shared `_tool_loop(llm, messages, tools, max_iter=5)` for iterative tool calling. +- **`AgentRegistry`** — Singleton registry with `@register` decorator, `get(name)`, `list_agents()`, and `call_agent(name, query, context)`. + +### Registered Agents + +| Agent | Registry Name | Tools | Description | +|---|---|---|---| +| **TaskAgent** | `task_agent` | 8 | Full task and comment CRUD. Status: `todo` / `in_progress` / `done`. Priority: `high` / `medium` / `low`. Tools: `list_tasks`, `create_task`, `update_task`, `delete_task`, `list_tasks_due_today`, `list_task_comments`, `add_task_comment`, `delete_task_comment` | +| **ProjectAgent** | `project_agent` | 6 | Project lifecycle management. Status: `active` / `archived`. Prefers archiving over deletion. Tools: `list_projects`, `list_all_projects`, `get_project`, `create_project`, `update_project`, `delete_project` | +| **CheckpointAgent** | `checkpoint_agent` | 4 | Project milestones. Requires `project_id` for creation. Supports AI-suggestion and approval workflows. Tools: `list_checkpoints`, `create_checkpoint`, `update_checkpoint`, `delete_checkpoint` | +| **NoteAgent** | `note_agent` | 5 | Markdown note management. Optionally linked to projects. Tools: `list_notes`, `get_note`, `create_note`, `update_note`, `delete_note` | + +All agents use the model configured by `LLM_MODEL` (default: GPT-4o) with `temperature=0` via LiteLLM. Tools return JSON action descriptors that the Electron client interprets and applies locally. + +### Switching LLM Providers + +The backend uses **LiteLLM** as a universal LLM gateway. All agents and the orchestrator instantiate models through a centralized factory in `app/core/llm.py`. To switch providers, change environment variables — no code changes required: + +```bash +# OpenAI (default) +LLM_MODEL=gpt-4o +LLM_ROUTER_MODEL=gpt-4o-mini + +# Anthropic +LLM_MODEL=anthropic/claude-3.5-sonnet +LLM_ROUTER_MODEL=anthropic/claude-3-haiku + +# Google Gemini +LLM_MODEL=gemini/gemini-pro +LLM_ROUTER_MODEL=gemini/gemini-flash + +# Local Ollama +LLM_MODEL=ollama/llama3 +LLM_ROUTER_MODEL=ollama/llama3 + +# AWS Bedrock +LLM_MODEL=bedrock/anthropic.claude-v2 +LLM_ROUTER_MODEL=bedrock/anthropic.claude-instant-v1 +``` + +See the [LiteLLM provider docs](https://docs.litellm.ai/docs/providers) for the full list of 100+ supported providers and model naming conventions. + +--- + +## Orchestration & Execution Plans + +Source: `app/core/orchestrator.py`, `app/core/execution_plan.py` + +### Orchestrator + +1. **`classify_intent(message, context, registry)`** — Uses the router model (`LLM_ROUTER_MODEL`, default: GPT-4o-mini) to determine which agent should handle a message. Falls back to `task_agent` when classification is ambiguous. +2. **`route_single(agent_name, message, context)`** — Routes to a single agent and returns a `ChatResponse`. +3. **`route_pipeline(agent_names, message, context)`** — Executes agents sequentially; each receives `previous_results` from earlier agents. A final LLM synthesis step merges all results. +4. **`orchestrate(request)`** — Main entry point. In `direct` mode, returns a `ChatResponse`. In `plan` mode, returns an `ExecutionPlan`. +5. **`orchestrate_stream(request)`** — Streaming variant that yields 50-character text chunks with a final JSON frame. + +### Execution Plans + +- **`PromptTemplateRegistry`** — Maps template IDs to server-side prompt text. Clients only ever see opaque IDs, never raw prompts. +- **`ExecutionPlanBuilder`** — Fluent builder API: `add_step()`, `add_llm_step(template_id, vars)`, `add_data_step(action, data_from_step)`. Validates step references on `build()`. +- **`PlanCache`** — LRU cache (maxsize 1000) for storing plans as reusable playbooks. + +### Built-in Templates (6) + +`tpl_task_agent_default`, `tpl_checkpoint_agent_default`, `tpl_project_agent_default`, `tpl_note_agent_default`, `tpl_task_extract_from_project`, `tpl_note_weekly_summary` + +### Built-in Playbooks (2) + +| Playbook | Description | +|---|---| +| `create_tasks_from_project` | LLM extracts actionable tasks from project context, then creates task records | +| `generate_weekly_note` | LLM generates a weekly summary, then creates a note record | + +--- + +## Middleware + +Middleware executes in this order on each request: **TierRateLimit → Sanitizer → CORS → Router** + +### JWT Authentication + +Source: `app/api/middleware/auth.py` + +- FastAPI dependency `get_current_user` validates the `Bearer` JWT and extracts `user_id` and `email`. +- **Live tier lookup** — The current tier is fetched from the `subscriptions` table on every request (not cached in the JWT), so upgrades and downgrades take immediate effect. +- Falls back to `free` when no subscription row exists. +- Raises `401 Unauthorized` on invalid or expired tokens. +- **Exempt paths:** `/api/v1/auth/register`, `/api/v1/auth/login`, `/api/v1/billing/webhook` + +### Tier-Based Rate Limiter + +Source: `app/api/middleware/rate_limit.py` + +- `TierRateLimitMiddleware` — Sliding-window in-process rate limiter (no Redis dependency). +- Per-user 60-second window sized by subscription tier: + +| Tier | Requests / Minute | +|---|---| +| Free | 20 | +| Pro | 60 | +| Power | 120 | +| Team | 200 | + +- Returns `429 Too Many Requests` with a `Retry-After` header when the limit is exceeded. +- **Exempt paths:** register, login, webhook, health + +### Response Sanitizer + +Source: `app/api/middleware/sanitizer.py` + +- Runs only on `/api/v1/chat` endpoints. +- Scans JSON response bodies and replaces leaked prompt IP fragments with `[REDACTED]`. +- Detects: system prompt openers, agent routing metadata, LangChain tool schemas, internal reasoning markers (``, `[INST]`), and known prompt fingerprints. +- Logs sanitization events as `WARNING`. +- Binary responses (storage, backup) are never touched. + +--- + +## Storage Layer + +### Blob Store + +Source: `app/storage/blob_store.py` + +- S3-backed storage for E2E encrypted blobs. +- Object keys follow the pattern: `{user_id}/{table}/{record_id}` +- Server-side SSE-S3 encryption at rest (additional layer on top of client-side E2E encryption). +- Methods: `upload()`, `download()`, `delete()` (idempotent), `list_keys()` +- The backend **never inspects or decrypts blob content**. + +### Vector Store + +Source: `app/storage/vector_store.py` + +- Runtime-configurable: **Pinecone** (when `PINECONE_API_KEY` is set) or **Qdrant** (fallback). +- User isolation: Pinecone uses `namespace=user_id`; Qdrant filters by `user_id` payload field. +- 32-dimensional SHA-256-derived float vectors (deterministic, not semantically meaningful on encrypted data — a documented trade-off for privacy). +- Encrypted blobs are stored as base64 in metadata/payload for verbatim retrieval. +- Methods: `upsert()`, `search()`, `delete()` + +### Encryption Utilities + +Source: `app/storage/encryption.py` + +- `verify_checksum(blob, checksum)` — SHA-256 hash comparison using `hmac.compare_digest` (constant-time to prevent timing attacks). +- `reject_if_tampered(blob, checksum)` — Raises HTTP 400 on checksum mismatch. +- **No decryption key ever reaches the backend.** + +--- + +## Billing & Tiers + +Source: `app/billing/stripe_service.py`, `app/billing/tier_manager.py` + +### Feature Matrix + +| Feature | Free | Pro | Power | Team | +|---|---|---|---|---| +| AI Agents | 3 | Unlimited | Unlimited | Unlimited | +| Batch Active | 2 | 10 | Unlimited | Unlimited | +| Cloud Storage | 0 GB | 5 GB | 25 GB | Unlimited | +| Backup Storage | 0 GB | 5 GB | 25 GB | Unlimited | +| LLM Providers | 1 | Unlimited | Unlimited | Unlimited | +| Batch Builder | — | — | ✓ | ✓ | +| Plugin Marketplace | — | — | ✓ | ✓ | +| SSO | — | — | — | ✓ | +| Rate Limit | 20 req/min | 60 req/min | 120 req/min | 200 req/min | + +### Stripe Integration + +- **Checkout** — `create_checkout_session(user_id, tier)` creates a Stripe Checkout session. Returns a stub URL when Stripe is not configured. +- **Webhooks** — Handles `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, and `invoice.payment_failed`. +- **Subscription management** — `get_subscription()` returns the current subscription record; `cancel_subscription()` cancels via the Stripe API and reverts the user to the free tier. +- **Price IDs:** `price_pro_monthly`, `price_power_monthly`, `price_team_monthly` + +### Tier Manager + +- `get_tier(user_id)` — Returns the user's current billing tier. +- `check_feature(tier, feature)` — Boolean feature gate check. +- `require_feature(tier, feature)` — Raises HTTP 403 if the feature is not available. +- `enforce_quota(user_id, tier)` / `enforce_backup_quota(user_id, tier)` — Raises HTTP 402 if storage limits are exceeded. + +--- + +## Plugin Marketplace + +Source: `app/marketplace/` + +### Plugin Registry + +- PostgreSQL-backed catalog of submitted and approved plugins. +- `list_plugins(db, category, query, page, sort)` — Paginated listing (page size: 20) with optional filtering by category, text search, and sorting by `rating`, `installs`, or `newest`. +- `get_plugin(db, plugin_id)` — Full manifest with install count and ratings. +- `submit_plugin(db, manifest, s3_key)` — Submits a plugin with `pending_review` status. +- `approve_plugin()` / `reject_plugin(reason)` — Admin workflow for plugin approval. +- `record_install()` / `record_uninstall()` — Tracks per-user installations and updates install counts. + +### Review Queue + +- Automated security checklist before human review: + - Plugin ID must match `^[a-z0-9-]+$` + - Permissions must be from the allowed set only + - No binary blobs in the manifest +- **Allowed permissions:** `read:tasks`, `write:tasks`, `read:projects`, `write:projects`, `read:notes`, `write:notes`, `read:checkpoints`, `write:checkpoints`, `read:calendar`, `write:calendar` +- `get_pending(db)` — Lists plugins awaiting review. +- `submit_review(db, plugin_id, reviewer_id, decision, notes)` — Records the review decision. + +### Revenue Sharing + +- **70% developer / 30% platform** split on all paid plugin sales. +- `record_install(db, plugin_id, user_id, amount_cents)` — Records the revenue event and triggers a Stripe Connect transfer for the developer share. +- `get_earnings(db, developer_id, period)` — Aggregated earnings report for plugin developers. +- Gracefully stubs transfers when Stripe is not configured. + +### Seed Plugins + +| Plugin | Category | Price | +|---|---|---| +| GitHub Sync | Productivity | Free | +| Slack Notifier | Communication | €4.99 | +| Time Tracker | Productivity | €9.99 | + +--- + +## Testing + +### Running Tests + +```bash +# Run all tests +pytest + +# Run a specific test file +pytest tests/test_auth.py + +# Run with verbose output +pytest -v +``` + +### Test Infrastructure + +- **Database:** Async SQLite in-memory via `aiosqlite` + `StaticPool` — fast, no PostgreSQL needed. +- **S3 mock:** `moto[s3]` with a fixture that patches `BlobStore` settings. +- **Auth helpers:** `make_jwt(tier)` and `auth_header(tier)` generate per-tier test tokens. +- **Seed data:** Auto-creates one `User` + `Subscription` per tier (free/pro/power/team) before each test. +- **Plugin seeds:** Fixture adds 3 approved plugins for marketplace tests. +- **FK enforcement:** SQLite `PRAGMA foreign_keys=ON`. +- **No external dependencies** — all tests run fully offline. + +### Test Coverage + +| File | Coverage | +|---|---| +| `test_auth.py` | Register, login, token access, refresh, expiration | +| `test_orchestrator.py` | Intent classification, single agent routing, pipeline, plan mode | +| `test_agents.py` | Each agent with mocked LLM: registration, tools, handle method | +| `test_storage.py` | Create, list, download, update, delete records; checksum rejection; quota enforcement | +| `test_backup.py` | Upload, download, history, delete; tier-based storage limits | +| `test_plugins.py` | List, install, uninstall, revenue events, tier gate enforcement | +| `test_agent_registry.py` | Registry singleton, registration, lookup, listing | +| `test_execution_plan.py` | Plan builder, template registry, plan cache | +| `test_middleware.py` | Rate limiting by tier, sanitizer prompt leak detection | + +--- + +## Project Structure + +``` +adiuva-api/ +├── alembic.ini # Alembic configuration +├── BACKEND_PLAN.md # Architecture & design decisions +├── docker-compose.yml # Docker Compose (app + PostgreSQL) +├── Dockerfile # Multi-stage production build +├── requirements.txt # Python dependencies +│ +├── alembic/ # Database migrations +│ ├── env.py # Alembic environment config +│ ├── script.py.mako # Migration template +│ └── versions/ +│ ├── 001_initial_schema.py # Tables, indexes, FKs +│ └── 002_seed_plugins.py # Seed marketplace plugins +│ +├── app/ # Application source +│ ├── main.py # FastAPI app factory, middleware, routes +│ ├── db.py # Async SQLAlchemy engine & session +│ ├── models.py # SQLAlchemy ORM models (9 tables) +│ ├── schemas.py # Pydantic request/response schemas +│ │ +│ ├── config/ +│ │ └── settings.py # Pydantic Settings (env vars) +│ │ +│ ├── agents/ # LLM-powered domain agents +│ │ ├── task_agent.py # Task & comment CRUD (8 tools) +│ │ ├── project_agent.py # Project lifecycle (6 tools) +│ │ ├── checkpoint_agent.py # Milestones (4 tools) +│ │ └── note_agent.py # Markdown notes (5 tools) +│ │ +│ ├── core/ # Orchestration engine +│ │ ├── agent_registry.py # BaseAgent, ChatAgent, AgentRegistry +│ │ ├── llm.py # LiteLLM factory (get_llm, get_router_llm) +│ │ ├── orchestrator.py # Intent classification & routing +│ │ └── execution_plan.py # Plan builder, templates, cache +│ │ +│ ├── api/ # HTTP layer +│ │ ├── deps.py # Shared FastAPI dependencies +│ │ ├── middleware/ +│ │ │ ├── auth.py # JWT validation, live tier lookup +│ │ │ ├── rate_limit.py # Sliding-window tier rate limiter +│ │ │ └── sanitizer.py # Prompt IP leak protection +│ │ └── routes/ +│ │ ├── auth.py # Register, login, refresh, me +│ │ ├── chat.py # Chat + WebSocket streaming +│ │ ├── plans.py # Execution plan playbooks +│ │ ├── storage.py # E2E encrypted record CRUD +│ │ ├── vectors.py # Vector upsert, search, delete +│ │ ├── backup.py # Encrypted backup management +│ │ ├── plugins.py # Marketplace browse & install +│ │ └── billing.py # Stripe checkout & webhooks +│ │ +│ ├── storage/ # Storage backends +│ │ ├── blob_store.py # S3 blob storage +│ │ ├── vector_store.py # Pinecone / Qdrant vector store +│ │ └── encryption.py # Checksum verification utilities +│ │ +│ ├── billing/ # Subscription management +│ │ ├── stripe_service.py # Stripe API integration +│ │ └── tier_manager.py # Feature matrix & quota enforcement +│ │ +│ └── marketplace/ # Plugin ecosystem +│ ├── plugin_registry.py # Catalog CRUD & search +│ ├── plugin_review.py # Security checklist & review queue +│ └── revenue_share.py # 70/30 split & Stripe Connect +│ +└── tests/ # Test suite + ├── conftest.py # Fixtures: DB, S3, auth, seeds + ├── test_auth.py + ├── test_orchestrator.py + ├── test_agents.py + ├── test_storage.py + ├── test_backup.py + ├── test_plugins.py + ├── test_agent_registry.py + ├── test_execution_plan.py + └── test_middleware.py +``` + +--- + +## License + +*To be determined.* diff --git a/app/agents/checkpoint_agent.py b/app/agents/checkpoint_agent.py index 9410aab..a42f865 100644 --- a/app/agents/checkpoint_agent.py +++ b/app/agents/checkpoint_agent.py @@ -7,10 +7,9 @@ from typing import Any from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.tools import tool -from langchain_openai import ChatOpenAI -from app.config.settings import settings from app.core.agent_registry import ChatAgent, registry +from app.core.llm import get_llm _SYSTEM_PROMPT = ( "You are a project checkpoint assistant. Checkpoints are milestone dates that\n" @@ -112,7 +111,7 @@ class CheckpointAgent(ChatAgent): return [list_checkpoints, create_checkpoint, update_checkpoint, delete_checkpoint] async def handle(self, query: str, context: dict[str, Any]) -> str: - llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY) + llm = get_llm() messages = [ SystemMessage(content=_SYSTEM_PROMPT), HumanMessage( diff --git a/app/agents/note_agent.py b/app/agents/note_agent.py index 65898cc..905820e 100644 --- a/app/agents/note_agent.py +++ b/app/agents/note_agent.py @@ -7,10 +7,9 @@ from typing import Any from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.tools import tool -from langchain_openai import ChatOpenAI -from app.config.settings import settings from app.core.agent_registry import ChatAgent, registry +from app.core.llm import get_llm _SYSTEM_PROMPT = ( "You are a note-taking assistant. You help users create, retrieve, update,\n" @@ -113,7 +112,7 @@ class NoteAgent(ChatAgent): return [list_notes, get_note, create_note, update_note, delete_note] async def handle(self, query: str, context: dict[str, Any]) -> str: - llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY) + llm = get_llm() messages = [ SystemMessage(content=_SYSTEM_PROMPT), HumanMessage( diff --git a/app/agents/project_agent.py b/app/agents/project_agent.py index 1054386..b8bc14f 100644 --- a/app/agents/project_agent.py +++ b/app/agents/project_agent.py @@ -7,10 +7,9 @@ from typing import Any from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.tools import tool -from langchain_openai import ChatOpenAI -from app.config.settings import settings from app.core.agent_registry import ChatAgent, registry +from app.core.llm import get_llm _SYSTEM_PROMPT = ( "You are a project management assistant. You help users create, find,\n" @@ -148,7 +147,7 @@ class ProjectAgent(ChatAgent): ] async def handle(self, query: str, context: dict[str, Any]) -> str: - llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY) + llm = get_llm() messages = [ SystemMessage(content=_SYSTEM_PROMPT), HumanMessage( diff --git a/app/agents/task_agent.py b/app/agents/task_agent.py index df1d3c0..07ac619 100644 --- a/app/agents/task_agent.py +++ b/app/agents/task_agent.py @@ -7,10 +7,9 @@ from typing import Any from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.tools import tool -from langchain_openai import ChatOpenAI -from app.config.settings import settings from app.core.agent_registry import ChatAgent, registry +from app.core.llm import get_llm _SYSTEM_PROMPT = ( "You are a task management assistant for a project workspace.\n" @@ -219,7 +218,7 @@ class TaskAgent(ChatAgent): ] async def handle(self, query: str, context: dict[str, Any]) -> str: - llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=settings.OPENAI_API_KEY) + llm = get_llm() messages = [ SystemMessage(content=_SYSTEM_PROMPT), HumanMessage( diff --git a/app/config/settings.py b/app/config/settings.py index c9d7042..ec522c2 100644 --- a/app/config/settings.py +++ b/app/config/settings.py @@ -24,6 +24,9 @@ class Settings(BaseSettings): OPENAI_API_KEY: str = "" + LLM_MODEL: str = "gpt-4o" + LLM_ROUTER_MODEL: str = "gpt-4o-mini" + CORS_ORIGINS: list[str] = ["app://.", "http://localhost:3000", "http://localhost:5173"] ENV: Literal["dev", "prod"] = "dev" diff --git a/app/core/llm.py b/app/core/llm.py new file mode 100644 index 0000000..2787d00 --- /dev/null +++ b/app/core/llm.py @@ -0,0 +1,68 @@ +"""LLM factory — centralised model instantiation via LiteLLM. + +Every agent and the orchestrator call ``get_llm()`` or ``get_router_llm()`` +instead of directly constructing a provider-specific class. The model string +follows the `LiteLLM model naming convention +`_: + +* OpenAI: ``gpt-4o``, ``gpt-4o-mini`` +* Anthropic: ``anthropic/claude-3.5-sonnet`` +* Google: ``gemini/gemini-pro`` +* Ollama: ``ollama/llama3`` +* Bedrock: ``bedrock/anthropic.claude-v2`` + +Switch providers by changing **LLM_MODEL** / **LLM_ROUTER_MODEL** in ``.env`` +— no code changes required. +""" + +from __future__ import annotations + +from langchain_openai import ChatOpenAI +from litellm import get_supported_openai_params # noqa: F401 – validates install + +from app.config.settings import settings + + +def _api_key_for_model(model: str) -> str | None: + """Return the most appropriate API key for the given LiteLLM model string.""" + if model.startswith("anthropic/"): + return getattr(settings, "ANTHROPIC_API_KEY", None) or None + if model.startswith("gemini/") or model.startswith("google/"): + return getattr(settings, "GOOGLE_API_KEY", None) or None + # Default: OpenAI-compatible (covers plain model names like "gpt-4o") + return settings.OPENAI_API_KEY or None + + +def get_llm( + *, + model: str | None = None, + temperature: float = 0, +) -> ChatOpenAI: + """Return a LangChain chat model backed by LiteLLM. + + LiteLLM exposes an OpenAI-compatible API, so we use ``ChatOpenAI`` pointed + at the LiteLLM proxy endpoint. In practice, ``litellm`` patches the + ``openai`` client transparently when the model string contains a provider + prefix (``anthropic/…``, ``gemini/…``, etc.). + + Parameters + ---------- + model: + LiteLLM model identifier. Defaults to ``settings.LLM_MODEL``. + temperature: + Sampling temperature. ``0`` = deterministic. + """ + model = model or settings.LLM_MODEL + return ChatOpenAI( + model=model, + temperature=temperature, + api_key=_api_key_for_model(model), + ) + + +def get_router_llm( + *, + temperature: float = 0, +) -> ChatOpenAI: + """Return the lighter model used for intent classification / routing.""" + return get_llm(model=settings.LLM_ROUTER_MODEL, temperature=temperature) diff --git a/app/core/orchestrator.py b/app/core/orchestrator.py index 77d7d9f..4b5afac 100644 --- a/app/core/orchestrator.py +++ b/app/core/orchestrator.py @@ -6,10 +6,9 @@ import json from typing import Any, AsyncGenerator from langchain_core.messages import HumanMessage, SystemMessage -from langchain_openai import ChatOpenAI -from app.config.settings import settings from app.core.agent_registry import AgentRegistry +from app.core.llm import get_router_llm from app.core.agent_registry import registry as _default_registry from app.schemas import ChatRequest, ChatResponse, ExecutionPlan @@ -29,8 +28,8 @@ _SYNTHESIZE_HUMAN = ( ) -def _make_llm(model: str = "gpt-4o-mini") -> ChatOpenAI: - return ChatOpenAI(model=model, temperature=0, api_key=settings.OPENAI_API_KEY) +def _make_llm(): + return get_router_llm() async def classify_intent( diff --git a/requirements.txt b/requirements.txt index 8436567..b7409ab 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,6 +3,7 @@ uvicorn[standard]>=0.34.0 gunicorn>=22.0.0 langchain>=0.3.0 langchain-openai>=0.3.0 +litellm>=1.50.0 pydantic>=2.10.0 pydantic-settings>=2.7.0 python-jose[cryptography]>=3.3.0 diff --git a/tests/test_agents.py b/tests/test_agents.py index ebbcf86..33c17b9 100644 --- a/tests/test_agents.py +++ b/tests/test_agents.py @@ -102,21 +102,21 @@ class TestTaskAgent: @pytest.mark.asyncio async def test_handle_returns_string(self) -> None: - with patch("app.agents.task_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.task_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Task created.") result = await TaskAgent().handle("create a task", {}) assert isinstance(result, str) @pytest.mark.asyncio async def test_handle_no_tool_calls(self) -> None: - with patch("app.agents.task_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.task_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Here are your tasks.") result = await TaskAgent().handle("list my tasks", {}) assert result == "Here are your tasks." @pytest.mark.asyncio async def test_handle_with_create_task_tool_call(self) -> None: - with patch("app.agents.task_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.task_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm_with_tool_call( "create_task", {"title": "Buy groceries", "priority": "low"}, @@ -127,7 +127,7 @@ class TestTaskAgent: @pytest.mark.asyncio async def test_handle_accepts_empty_context(self) -> None: - with patch("app.agents.task_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.task_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Done.") result = await TaskAgent().handle("help", {}) assert isinstance(result, str) @@ -138,7 +138,7 @@ class TestTaskAgent: "user_profile": {"id": "u1", "tier": "pro"}, "recent_tasks": [{"id": "t1", "title": "Old task"}], } - with patch("app.agents.task_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.task_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Tasks listed.") result = await TaskAgent().handle("show tasks", context) assert isinstance(result, str) @@ -273,14 +273,14 @@ class TestCheckpointAgent: @pytest.mark.asyncio async def test_handle_no_tool_calls(self) -> None: - with patch("app.agents.checkpoint_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.checkpoint_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("No checkpoints found.") result = await CheckpointAgent().handle("list checkpoints", {}) assert result == "No checkpoints found." @pytest.mark.asyncio async def test_handle_with_create_tool_call(self) -> None: - with patch("app.agents.checkpoint_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.checkpoint_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm_with_tool_call( "create_checkpoint", {"project_id": "p1", "title": "MVP Launch", "date": 1700000000000}, @@ -291,7 +291,7 @@ class TestCheckpointAgent: @pytest.mark.asyncio async def test_handle_accepts_empty_context(self) -> None: - with patch("app.agents.checkpoint_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.checkpoint_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Done.") result = await CheckpointAgent().handle("show milestones", {}) assert isinstance(result, str) @@ -397,14 +397,14 @@ class TestProjectAgent: @pytest.mark.asyncio async def test_handle_no_tool_calls(self) -> None: - with patch("app.agents.project_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.project_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Project Alpha is active.") result = await ProjectAgent().handle("show my projects", {}) assert result == "Project Alpha is active." @pytest.mark.asyncio async def test_handle_with_create_project_tool_call(self) -> None: - with patch("app.agents.project_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.project_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm_with_tool_call( "create_project", {"name": "Pippo"}, @@ -415,7 +415,7 @@ class TestProjectAgent: @pytest.mark.asyncio async def test_handle_accepts_empty_context(self) -> None: - with patch("app.agents.project_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.project_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Done.") result = await ProjectAgent().handle("archive old project", {}) assert isinstance(result, str) @@ -515,14 +515,14 @@ class TestNoteAgent: @pytest.mark.asyncio async def test_handle_no_tool_calls(self) -> None: - with patch("app.agents.note_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.note_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Note created.") result = await NoteAgent().handle("create a note", {}) assert result == "Note created." @pytest.mark.asyncio async def test_handle_with_create_note_tool_call(self) -> None: - with patch("app.agents.note_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.note_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm_with_tool_call( "create_note", {"title": "Daily log", "content": "# Today\nAll good."}, @@ -533,7 +533,7 @@ class TestNoteAgent: @pytest.mark.asyncio async def test_handle_accepts_empty_context(self) -> None: - with patch("app.agents.note_agent.ChatOpenAI") as mock_cls: + with patch("app.agents.note_agent.get_llm") as mock_cls: mock_cls.return_value = _mock_llm("Done.") result = await NoteAgent().handle("show notes", {}) assert isinstance(result, str) diff --git a/tests/test_orchestrator.py b/tests/test_orchestrator.py index 4432e33..e157e13 100644 --- a/tests/test_orchestrator.py +++ b/tests/test_orchestrator.py @@ -87,21 +87,21 @@ def reg() -> AgentRegistry: class TestClassifyIntent: @pytest.mark.asyncio async def test_routes_to_known_agent(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") result = await classify_intent("add a task", {}, reg) assert result == "task_agent" @pytest.mark.asyncio async def test_routes_to_calendar_agent(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("calendar_agent") result = await classify_intent("schedule a meeting", {}, reg) assert result == "calendar_agent" @pytest.mark.asyncio async def test_falls_back_on_unknown_name(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("nonexistent_agent") result = await classify_intent("do something", {}, reg) assert result == "task_agent" @@ -110,14 +110,14 @@ class TestClassifyIntent: async def test_empty_registry_returns_fallback_without_llm_call(self) -> None: empty_reg = AgentRegistry() # No LLM should be instantiated — early return path - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: result = await classify_intent("anything", {}, empty_reg) mock_cls.assert_not_called() assert result == "task_agent" @pytest.mark.asyncio async def test_whitespace_stripped_from_response(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm(" task_agent \n") result = await classify_intent("create task", {}, reg) assert result == "task_agent" @@ -154,7 +154,7 @@ class TestRouteSingle: class TestRoutePipeline: @pytest.mark.asyncio async def test_returns_chat_response(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("synthesized result") result = await route_pipeline( ["task_agent", "calendar_agent"], "plan my week", {}, reg @@ -163,7 +163,7 @@ class TestRoutePipeline: @pytest.mark.asyncio async def test_response_is_synthesis_output(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("synthesized result") result = await route_pipeline( ["task_agent", "calendar_agent"], "plan my week", {}, reg @@ -193,7 +193,7 @@ class TestRoutePipeline: reg.register(_CapturingAgent) - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("done") await route_pipeline(["task_agent", "capture"], "hi", {}, reg) @@ -204,7 +204,7 @@ class TestRoutePipeline: @pytest.mark.asyncio async def test_single_agent_pipeline(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("single result") result = await route_pipeline(["task_agent"], "one agent", {}, reg) assert result.response == "single result" @@ -218,7 +218,7 @@ class TestOrchestrate: async def test_direct_mode_returns_chat_response( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="add a task", execution_mode="direct") result = await orchestrate(request, reg) @@ -226,7 +226,7 @@ class TestOrchestrate: @pytest.mark.asyncio async def test_direct_mode_response_content(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="add a task", execution_mode="direct") result = await orchestrate(request, reg) @@ -237,7 +237,7 @@ class TestOrchestrate: async def test_plan_mode_returns_execution_plan( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="plan my tasks", execution_mode="plan") result = await orchestrate(request, reg) @@ -247,7 +247,7 @@ class TestOrchestrate: async def test_plan_mode_agent_matches_classified( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("calendar_agent") request = ChatRequest( message="schedule something", execution_mode="plan" @@ -258,7 +258,7 @@ class TestOrchestrate: @pytest.mark.asyncio async def test_plan_mode_has_steps(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="plan tasks", execution_mode="plan") result = await orchestrate(request, reg) @@ -269,7 +269,7 @@ class TestOrchestrate: async def test_plan_mode_template_id_contains_agent_name( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="plan tasks", execution_mode="plan") result = await orchestrate(request, reg) @@ -281,7 +281,7 @@ class TestOrchestrate: async def test_default_execution_mode_is_direct( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") # execution_mode defaults to "direct" request = ChatRequest(message="help me") @@ -295,7 +295,7 @@ class TestOrchestrate: class TestOrchestrateStream: @pytest.mark.asyncio async def test_yields_at_least_one_chunk(self, reg: AgentRegistry) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="add a task", execution_mode="direct") chunks = [chunk async for chunk in orchestrate_stream(request, reg)] @@ -305,7 +305,7 @@ class TestOrchestrateStream: async def test_last_chunk_is_final_json_frame( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="add a task", execution_mode="direct") chunks = [chunk async for chunk in orchestrate_stream(request, reg)] @@ -319,7 +319,7 @@ class TestOrchestrateStream: async def test_final_frame_response_matches_agent_output( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest(message="create a task", execution_mode="direct") chunks = [chunk async for chunk in orchestrate_stream(request, reg)] @@ -331,7 +331,7 @@ class TestOrchestrateStream: async def test_text_chunks_before_final_frame( self, reg: AgentRegistry ) -> None: - with patch("app.core.orchestrator.ChatOpenAI") as mock_cls: + with patch("app.core.orchestrator._make_llm") as mock_cls: mock_cls.return_value = _mock_llm("task_agent") request = ChatRequest( message="x" * 200, execution_mode="direct"