Keep only 4.1 (first reply contains question) as automated eval. Multi-turn cases (4.2–4.5) are non-deterministic and tested manually with results tracked in Langfuse. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adiuva Cloud API
AI-powered project management backend with LLM orchestration and subscription billing.
Built with FastAPI · Python 3.12 · PostgreSQL · LangChain · Stripe
Table of Contents
- Overview
- Architecture
- Key Features
- Tech Stack
- Getting Started
- Docker Deployment
- Environment Variables
- API Reference
- Data Model
- AI Agent System
- Orchestration & Execution Plans
- Middleware
- Billing & Tiers
- Testing
- Project Structure
- License
Overview
Adiuva Cloud API is the FastAPI backend that powers the Adiuva Electron desktop app. It provides LLM-powered chat orchestration, text embedding generation, and Stripe-based subscription billing across four tiers.
Design Principles
- Never expose prompts — system prompts stay server-side; responses are sanitized to strip any leaked prompt fragments.
- Stateless request handling — all context comes from the client and JWT; no server-side session state.
- Tier gates enforced server-side — the server always reads the current tier from the database, never trusting client-reported values.
Architecture
┌──────────────┐ ┌────────────────────────────────────────────────────────┐
│ Electron │ │ FastAPI (Uvicorn / Gunicorn) │
│ Desktop App │────▶│ │
│ (Client) │◀────│ Middleware: RateLimit → Sanitizer → CORS → Router │
└──────────────┘ │ │
│ ┌──────────────────┐ ┌────────────────────────────┐ │
│ │ Auth Routes │ │ Chat Routes │ │
│ │ Billing Routes │ │ ↓ │ │
│ │ Agent Routes │ │ Orchestrator (GPT-4o-mini)│ │
│ │ Device WS │ │ ↓ classify intent │ │
│ └──────────────────┘ │ Agent Registry │ │
│ │ ↓ │ │
│ │ TaskAgent | ProjectAgent │ │
│ │ NoteAgent | CheckptAgent │ │
│ │ (GPT-4o + LangChain) │ │
│ └────────────────────────────┘ │
└────────────────────────────────────────────────────────┘
│
┌────────▼───┐
│ PostgreSQL │
│ (Auth, │
│ Billing, │
│ Agents) │
└────────────┘
│
┌────────▼───┐
│ Stripe │
│ (Billing) │
└────────────┘
Key Features
- LLM-powered orchestration — GPT-4o-mini classifies user intent and routes to the appropriate domain agent.
- 4 specialized AI agents — Tasks (8 tools), Projects (6 tools), Timelines (4 tools), Notes (5 tools), all powered by GPT-4o via LangChain.
- Execution plans & playbooks — Server-side prompt template registry; clients receive only opaque template IDs, never raw prompts.
- Text embeddings — Generates text-embedding-3-small vectors for local client-side note search.
- Stripe billing — Four-tier subscription model (Free / Pro / Power / Team) with checkout sessions and full webhook lifecycle handling.
- JWT authentication — Access + refresh tokens with bcrypt password hashing, SHA-256 token hashing, and automatic rotation.
- Prompt IP protection — Sanitizer middleware strips system prompts, reasoning markers, tool schemas, and agent routing metadata from all chat responses.
- Tier-based rate limiting — Sliding-window per-user limiter scaling from 20 to 200 requests/min by subscription tier.
- WebSocket streaming — Real-time chat with 30-second heartbeat keep-alive and chunked text delivery.
- Alembic migrations — Versioned schema management.
- Comprehensive test suite — In-memory SQLite, per-tier test fixtures, and full API coverage without external dependencies.
Tech Stack
| Package | Version | Purpose |
|---|---|---|
fastapi |
≥ 0.115.0 | Web framework |
uvicorn[standard] |
≥ 0.34.0 | ASGI development server |
gunicorn |
≥ 22.0.0 | Production process manager |
langchain |
≥ 0.3.0 | LLM orchestration framework |
langchain-openai |
≥ 0.3.0 | OpenAI LLM provider integration |
litellm |
≥ 1.50.0 | Universal LLM gateway (100+ providers) |
pydantic |
≥ 2.10.0 | Data validation and serialization |
pydantic-settings |
≥ 2.7.0 | Environment-based configuration |
python-jose[cryptography] |
≥ 3.3.0 | JWT encoding and decoding |
stripe |
≥ 11.0.0 | Billing and payment integration |
slowapi |
≥ 0.1.9 | Rate limiting utilities |
sqlalchemy |
≥ 2.0.0 | Async ORM and query builder |
asyncpg |
≥ 0.30.0 | PostgreSQL async driver |
alembic |
≥ 1.14.0 | Database migration management |
bcrypt |
≥ 4.2.0 | Password hashing |
python-dotenv |
≥ 1.0.0 | .env file loading |
httpx |
≥ 0.28.0 | Async HTTP client (used in tests) |
websockets |
≥ 14.0 | WebSocket protocol support |
psycopg2-binary |
≥ 2.9.0 | Synchronous PostgreSQL driver (Alembic) |
pytest |
≥ 8.0.0 | Test framework |
pytest-asyncio |
≥ 0.24.0 | Async test support |
aiosqlite |
≥ 0.20.0 | In-memory SQLite for tests |
ruff |
≥ 0.8.0 | Linter and formatter |
Getting Started
Prerequisites
- Python 3.12+
- PostgreSQL 16+
- An OpenAI API key (for LLM features)
- Stripe API keys (optional — billing stubs gracefully when unconfigured)
Installation
# Clone the repository
git clone <repo-url> && cd adiuva-api
# Create a virtual environment
python -m venv .venv && source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your DATABASE_URL, OPENAI_API_KEY, etc.
Database Setup
# Start PostgreSQL (or use the Docker Compose database)
docker compose up db -d
# Run migrations
alembic upgrade head
Run the Development Server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Interactive API docs are available at http://localhost:8000/docs in development mode (ENV=dev). The /docs endpoint is disabled in production.
Docker Deployment
Quick Start
docker compose up --build
This starts two services:
- app — FastAPI server on port
8000 - db — PostgreSQL 16 (Alpine) on port
5432with a persistent volume and health checks
Dockerfile Details
The Dockerfile uses a multi-stage build:
- Builder stage — Installs Python dependencies into a virtual environment.
- Runtime stage — Copies only the venv, app source, and Alembic migrations. Runs as a non-root user (
appuser). - Production server — Gunicorn with 4 Uvicorn workers, 120-second timeout, listening on port 8000.
# Production command (run by the container)
gunicorn app.main:app -k uvicorn.workers.UvicornWorker -w 4 --timeout 120 -b 0.0.0.0:8000
Homelab / Self-Hosted Deployment
You can run the entire stack locally on a homelab with no cloud dependencies except the LLM provider.
1. Start all services
docker compose up -d
This starts PostgreSQL alongside the app.
2. Configure your .env
# Database (uses the compose PostgreSQL)
DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/adiuva
# Billing — leave empty to stub (no Stripe needed)
STRIPE_SECRET_KEY=
STRIPE_WEBHOOK_SECRET=
# LLM — the only external service
OPENAI_API_KEY=sk-...
LLM_MODEL=gpt-4o
LLM_ROUTER_MODEL=gpt-4o-mini
# Auth
JWT_SECRET=your-secret-here
ENV=dev
3. Run migrations
docker compose exec app alembic upgrade head
What runs where
| Service | Runs on | Port | Notes |
|---|---|---|---|
| FastAPI app | Docker | 8000 | API server |
| PostgreSQL | Docker | 5432 | Auth, billing, agents |
| Stripe | — | — | Stubbed when keys are empty |
| OpenAI / LLM | Cloud | — | Only external dependency |
Want fully offline AI too? Set
LLM_MODEL=ollama/llama3andLLM_ROUTER_MODEL=ollama/llama3, then add an Ollama container or point at a local Ollama instance. See the LLM provider switching section.
Environment Variables
All variables are loaded from a .env file via Pydantic Settings. Source: app/config/settings.py
| Variable | Type | Default | Description |
|---|---|---|---|
DATABASE_URL |
str |
postgresql+asyncpg://postgres:postgres@localhost:5432/adiuva |
Async SQLAlchemy connection string |
JWT_SECRET |
str |
change-me-in-production |
HMAC secret for JWT signing |
JWT_ALGORITHM |
str |
HS256 |
JWT signing algorithm |
JWT_ACCESS_TOKEN_EXPIRE_MINUTES |
int |
30 |
Access token time-to-live |
JWT_REFRESH_TOKEN_EXPIRE_DAYS |
int |
30 |
Refresh token time-to-live |
STRIPE_SECRET_KEY |
str |
"" |
Stripe API key (empty = stub mode) |
STRIPE_WEBHOOK_SECRET |
str |
\"\" |
Stripe webhook signature secret |
LLM_MODEL |
str |
gpt-4o |
LiteLLM model identifier for agents (e.g. anthropic/claude-3.5-sonnet, gemini/gemini-pro, ollama/llama3) |
LLM_ROUTER_MODEL |
str |
gpt-4o-mini |
Lighter model used for intent classification / routing |
CORS_ORIGINS |
list[str] |
["app://.", "http://localhost:3000", "http://localhost:5173"] |
Allowed CORS origins |
ENV |
Literal |
dev |
dev or prod — controls /docs visibility and SQL echo |
API Reference
All routes are prefixed with /api/v1. 27 endpoints total (25 REST + 1 WebSocket + 1 health check).
Health
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/api/v1/health |
No | Returns {"status": "ok", "version": "0.1.0"} |
Auth
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/api/v1/auth/register |
No | Create account with bcrypt-hashed password, returns AuthTokens |
POST |
/api/v1/auth/login |
No | Validate credentials, returns AuthTokens |
POST |
/api/v1/auth/refresh |
No | Rotate refresh token, returns new AuthTokens |
GET |
/api/v1/auth/me |
JWT | Returns UserProfile for the authenticated user |
Chat
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/api/v1/chat |
JWT | Route message through the orchestrator; returns ChatResponse or ExecutionPlan depending on execution mode |
POST |
/api/v1/chat/embed |
JWT | Generate a 1536-dim text embedding vector (text-embedding-3-small). Used by Electron for local note search. |
WS |
/api/v1/chat/stream |
JWT (query param ?token=) |
Streaming chat — first frame is a ChatRequest, server yields text chunks, final frame is {"done": true, "response": "...", "actions": [...]}. 30-second heartbeat ping. |
Plans
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/api/v1/plans/playbook |
JWT | List all cached execution plan playbooks |
GET |
/api/v1/plans/playbook/{plan_id} |
JWT | Retrieve a specific playbook by ID |
Billing
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/api/v1/billing/checkout |
JWT | Create a Stripe checkout session, returns {"checkout_url": "..."} |
POST |
/api/v1/billing/webhook |
Stripe signature | Handle Stripe events: checkout.session.completed, customer.subscription.updated, customer.subscription.deleted, invoice.payment_failed |
GET |
/api/v1/billing/subscription |
JWT | Get current subscription information |
DELETE |
/api/v1/billing/subscription |
JWT | Cancel subscription and revert to free tier |
Data Model
3 tables managed by Alembic migrations. Source: app/models.py
Tables
| Table | Primary Key | Key Columns | Purpose |
|---|---|---|---|
users |
id (UUID) |
email (unique), password_hash, tier, stripe_customer_id, timestamps |
User accounts |
refresh_tokens |
id (UUID) |
user_id (FK), token_hash (SHA-256, unique), expires_at |
Hashed refresh tokens for rotation |
subscriptions |
id (UUID) |
user_id (FK, unique), stripe_subscription_id, tier, status, current_period_end |
Stripe subscription records |
Enum Types
| Enum | Values |
|---|---|
billing_tier |
free, pro, power, team |
Migrations
| Version | Description |
|---|---|
001_initial_schema |
Creates core auth and billing tables with indexes and foreign key constraints |
AI Agent System
The agent system uses a registry pattern with LangChain tool-calling agents powered by GPT-4o. Source: app/agents/, app/core/agent_registry.py
Architecture
BaseAgent— Abstract base withuser_idandshared_memory.ChatAgent(BaseAgent)— Abstracthandle(query, context)andget_tools()methods, plus a shared_tool_loop(llm, messages, tools, max_iter=5)for iterative tool calling.AgentRegistry— Singleton registry with@registerdecorator,get(name),list_agents(), andcall_agent(name, query, context).
Registered Agents
| Agent | Registry Name | Tools | Description |
|---|---|---|---|
| TaskAgent | task_agent |
8 | Full task and comment CRUD. Status: todo / in_progress / done. Priority: high / medium / low. Tools: list_tasks, create_task, update_task, delete_task, list_tasks_due_today, list_task_comments, add_task_comment, delete_task_comment |
| ProjectAgent | project_agent |
6 | Project lifecycle management. Status: active / archived. Prefers archiving over deletion. Tools: list_projects, list_all_projects, get_project, create_project, update_project, delete_project |
| TimelineAgent | timeline_agent |
4 | Project milestones. Requires project_id for creation. Supports AI-suggestion and approval workflows. Tools: list_timelines, create_timeline, update_timeline, delete_timeline |
| NoteAgent | note_agent |
5 | Markdown note management. Optionally linked to projects. Tools: list_notes, get_note, create_note, update_note, delete_note |
All agents use the model configured by LLM_MODEL (default: GPT-4o) with temperature=0 via LiteLLM. Tools return JSON action descriptors that the Electron client interprets and applies locally.
Switching LLM Providers
The backend uses LiteLLM as a universal LLM gateway. All agents and the orchestrator instantiate models through a centralized factory in app/core/llm.py. To switch providers, change environment variables — no code changes required:
# OpenAI (default)
LLM_MODEL=gpt-4o
LLM_ROUTER_MODEL=gpt-4o-mini
# Anthropic
LLM_MODEL=anthropic/claude-3.5-sonnet
LLM_ROUTER_MODEL=anthropic/claude-3-haiku
# Google Gemini
LLM_MODEL=gemini/gemini-pro
LLM_ROUTER_MODEL=gemini/gemini-flash
# Local Ollama
LLM_MODEL=ollama/llama3
LLM_ROUTER_MODEL=ollama/llama3
# AWS Bedrock
LLM_MODEL=bedrock/anthropic.claude-v2
LLM_ROUTER_MODEL=bedrock/anthropic.claude-instant-v1
See the LiteLLM provider docs for the full list of 100+ supported providers and model naming conventions.
Orchestration & Execution Plans
Source: app/core/orchestrator.py, app/core/execution_plan.py
Orchestrator
classify_intent(message, context, registry)— Uses the router model (LLM_ROUTER_MODEL, default: GPT-4o-mini) to determine which agent should handle a message. Falls back totask_agentwhen classification is ambiguous.route_single(agent_name, message, context)— Routes to a single agent and returns aChatResponse.route_pipeline(agent_names, message, context)— Executes agents sequentially; each receivesprevious_resultsfrom earlier agents. A final LLM synthesis step merges all results.orchestrate(request)— Main entry point. Indirectmode, returns aChatResponse. Inplanmode, returns anExecutionPlan.orchestrate_stream(request)— Streaming variant that yields 50-character text chunks with a final JSON frame.
Execution Plans
PromptTemplateRegistry— Maps template IDs to server-side prompt text. Clients only ever see opaque IDs, never raw prompts.ExecutionPlanBuilder— Fluent builder API:add_step(),add_llm_step(template_id, vars),add_data_step(action, data_from_step). Validates step references onbuild().PlanCache— LRU cache (maxsize 1000) for storing plans as reusable playbooks.
Built-in Templates (6)
tpl_task_agent_default, tpl_timeline_agent_default, tpl_project_agent_default, tpl_note_agent_default, tpl_task_extract_from_project, tpl_note_weekly_summary
Built-in Playbooks (2)
| Playbook | Description |
|---|---|
create_tasks_from_project |
LLM extracts actionable tasks from project context, then creates task records |
generate_weekly_note |
LLM generates a weekly summary, then creates a note record |
Middleware
Middleware executes in this order on each request: TierRateLimit → Sanitizer → CORS → Router
JWT Authentication
Source: app/api/middleware/auth.py
- FastAPI dependency
get_current_uservalidates theBearerJWT and extractsuser_idandemail. - Live tier lookup — The current tier is fetched from the
subscriptionstable on every request (not cached in the JWT), so upgrades and downgrades take immediate effect. - Falls back to
freewhen no subscription row exists. - Raises
401 Unauthorizedon invalid or expired tokens. - Exempt paths:
/api/v1/auth/register,/api/v1/auth/login,/api/v1/billing/webhook
Tier-Based Rate Limiter
Source: app/api/middleware/rate_limit.py
TierRateLimitMiddleware— Sliding-window in-process rate limiter (no Redis dependency).- Per-user 60-second window sized by subscription tier:
| Tier | Requests / Minute |
|---|---|
| Free | 20 |
| Pro | 60 |
| Power | 120 |
| Team | 200 |
- Returns
429 Too Many Requestswith aRetry-Afterheader when the limit is exceeded. - Exempt paths: register, login, webhook, health
Response Sanitizer
Source: app/api/middleware/sanitizer.py
- Runs only on
/api/v1/chatendpoints. - Scans JSON response bodies and replaces leaked prompt IP fragments with
[REDACTED]. - Detects: system prompt openers, agent routing metadata, LangChain tool schemas, internal reasoning markers (
<thinking>,[INST]), and known prompt fingerprints. - Logs sanitization events as
WARNING.
Billing & Tiers
Source: app/billing/stripe_service.py, app/billing/tier_manager.py
Feature Matrix
| Feature | Free | Pro | Power | Team |
|---|---|---|---|---|
| AI Agents | 3 | Unlimited | Unlimited | Unlimited |
| Batch Active | 2 | 10 | Unlimited | Unlimited |
| LLM Providers | 1 | Unlimited | Unlimited | Unlimited |
| Batch Builder | — | — | ✓ | ✓ |
| SSO | — | — | — | ✓ |
| Rate Limit | 20 req/min | 60 req/min | 120 req/min | 200 req/min |
Stripe Integration
- Checkout —
create_checkout_session(user_id, tier)creates a Stripe Checkout session. Returns a stub URL when Stripe is not configured. - Webhooks — Handles
checkout.session.completed,customer.subscription.updated,customer.subscription.deleted, andinvoice.payment_failed. - Subscription management —
get_subscription()returns the current subscription record;cancel_subscription()cancels via the Stripe API and reverts the user to the free tier. - Price IDs:
price_pro_monthly,price_power_monthly,price_team_monthly
Tier Manager
get_tier(user_id)— Returns the user's current billing tier.check_feature(tier, feature)— Boolean feature gate check.require_feature(tier, feature)— Raises HTTP 403 if the feature is not available.
Testing
Running Tests
# Run all tests
pytest
# Run a specific test file
pytest tests/test_auth.py
# Run with verbose output
pytest -v
Test Infrastructure
- Database: Async SQLite in-memory via
aiosqlite+StaticPool— fast, no PostgreSQL needed. - Auth helpers:
make_jwt(tier)andauth_header(tier)generate per-tier test tokens. - Seed data: Auto-creates one
User+Subscriptionper tier (free/pro/power/team) before each test. - FK enforcement: SQLite
PRAGMA foreign_keys=ON. - No external dependencies — all tests run fully offline.
Test Coverage
| File | Coverage |
|---|---|
test_auth.py |
Register, login, token access, refresh, expiration |
test_middleware.py |
Rate limiting by tier, sanitizer prompt leak detection |
Project Structure
adiuva-api/
├── alembic.ini # Alembic configuration
├── docker-compose.yml # Docker Compose (app + PostgreSQL)
├── Dockerfile # Multi-stage production build
├── requirements.txt # Python dependencies
│
├── alembic/ # Database migrations
│ ├── env.py # Alembic environment config
│ ├── script.py.mako # Migration template
│ └── versions/
│ └── 001_initial_schema.py # Tables, indexes, FKs
│
├── app/ # Application source
│ ├── main.py # FastAPI app factory, middleware, routes
│ ├── db.py # Async SQLAlchemy engine & session
│ ├── models.py # SQLAlchemy ORM models
│ ├── schemas.py # Pydantic request/response schemas
│ │
│ ├── config/
│ │ └── settings.py # Pydantic Settings (env vars)
│ │
│ ├── agents/ # LLM-powered domain agents
│ │ ├── task_agent.py # Task & comment CRUD (8 tools)
│ │ ├── project_agent.py # Project lifecycle (6 tools)
│ │ ├── timeline_agent.py # Milestones (4 tools)
│ │ └── note_agent.py # Markdown notes (5 tools)
│ │
│ ├── core/ # Orchestration engine
│ │ ├── agent_registry.py # BaseAgent, ChatAgent, AgentRegistry
│ │ ├── llm.py # LiteLLM factory (get_llm, get_router_llm)
│ │ └── deep_agent.py # Deep agent orchestration
│ │
│ ├── api/ # HTTP layer
│ │ ├── deps.py # Shared FastAPI dependencies
│ │ ├── middleware/
│ │ │ ├── rate_limit.py # Sliding-window tier rate limiter
│ │ │ └── sanitizer.py # Prompt IP leak protection
│ │ └── routes/
│ │ ├── auth.py # Register, login, refresh, me
│ │ ├── chat.py # Chat + embed endpoint
│ │ ├── billing.py # Stripe checkout, webhooks, subscription
│ │ ├── agents.py # Agent catalog, config, runs
│ │ └── device_ws.py # Persistent device WebSocket
│ │
│ └── billing/
│ ├── stripe_service.py # Stripe API wrapper
│ └── tier_manager.py # Feature matrix, rate limits
│
└── tests/ # Test suite
├── conftest.py # Fixtures: DB, auth, seeds
├── test_auth.py
├── test_orchestrator.py
├── test_agents.py
├── test_agent_registry.py
├── test_execution_plan.py
└── test_middleware.py
License
To be determined.