feat: integrate vectordb for note embeddings

- Added `vectordb` as a dependency in `package.json`.
- Implemented `embedText` function in `src/main/ai/embeddings.ts` to handle text embeddings using GitHub Copilot OAuth token or OpenAI token.
- Created `vectordb.ts` for managing LanceDB connection and embedding notes with upsert strategy.
- Updated `index.ts` to initialize vector database and migrate existing notes on app ready.
- Modified `router/index.ts` to fire-and-forget embedding calls on note creation and updates.
- Enhanced `progress.txt` with detailed implementation notes and learnings regarding the integration.
This commit is contained in:
Roberto Musso
2026-02-24 21:34:48 +01:00
parent e70982c8b6
commit 2cb2f0e4e8
9 changed files with 750 additions and 27 deletions

View File

@@ -493,3 +493,29 @@
- Short-circuit optimizations that skip LLM nodes are only safe for providers where the SDK has no internal state to initialize (OpenAI, Anthropic)
- If you want to restore the short-circuit as an OpenAI/Anthropic optimization, gate it: `if (TOOL_CALLING_PROVIDERS.has(getActiveProviderName()) && state.chatContext.type === 'project')`
---
## [2026-02-24] - US-022
- What was implemented:
- Installed `vectordb` (LanceDB Node.js binding v0.21.2) as a project dependency
- Created `src/main/ai/embeddings.ts`: reads GitHub Copilot OAuth token from `~/.copilot/config.json` (via `copilot_tokens` map), falls back to stored OpenAI token via `getToken('openai')`. Uses `@langchain/openai` `OpenAIEmbeddings` with `baseURL: 'https://api.githubcopilot.com'` for Copilot path, or standard OpenAI API for fallback. Exposes `embedText(text): Promise<number[]>`
- Created `src/main/db/vectordb.ts`: LanceDB singleton (`initVectorDb()` / `getConn()`), `upsertNoteEmbedding(noteId, projectId, content)` with delete-then-add upsert strategy (first call auto-creates table with schema inferred from first record), `migrateNotesIfNeeded()` that checks table existence on startup and bulk-embeds all SQLite notes sequentially with per-note error isolation
- Modified `src/main/router/index.ts`: imported `upsertNoteEmbedding`, made `notes.create` and `notes.update` async, added fire-and-forget embedding calls with `.catch(console.error)` in both handlers; `notes.update` re-fetches the full note from SQLite after the write to embed current title+content
- Modified `src/main/index.ts`: imported `initVectorDb` + `migrateNotesIfNeeded`, added `initVectorDb().then(() => migrateNotesIfNeeded()).catch(...)` chain in `app.on('ready')`
- Modified `vite.main.config.mts`: added `'vectordb'` to the `external` array so ViteRollup doesn't try to bundle the NAPI-RS binary
- Files changed:
- `package.json` (vectordb dependency added)
- `vite.main.config.mts`
- `src/main/index.ts`
- `src/main/router/index.ts`
- `src/main/ai/embeddings.ts` (new)
- `src/main/db/vectordb.ts` (new)
- **Learnings for future iterations:**
- `@github/copilot-sdk` has **no embeddings API** — it is a pure chat/session SDK. The `CopilotClient` type definitions contain zero mention of "embedding". Do not assume any LLM provider SDK supports embeddings
- GitHub Copilot CLI stores OAuth tokens in `~/.copilot/config.json` under `copilot_tokens["{host}:{login}"]`. The token format is `gho_*` (GitHub OAuth). These tokens work with the GitHub Copilot REST API (`https://api.githubcopilot.com`) which is OpenAI-compatible — including embeddings
- `@langchain/openai`'s `OpenAIEmbeddings` accepts a `configuration.baseURL` option that makes it work against any OpenAI-compatible endpoint
- `vectordb` (v0.21.2): deprecated but functional. The new package name is `@lancedb/lancedb`. `vectordb` requires at least one data record for `createTable()` (cannot create an empty table — schema is inferred from the first record). Use delete-then-add for upsert since there's no native upsert API at this version
- When using dynamic `import('@langchain/openai')`, TypeScript cannot infer the exact return type of `embedDocuments()` — it resolves to `{}` instead of `number[][]`. Fix: cast explicitly `as number[][]`
- tRPC mutation handlers support both sync and async functions transparently — making a mutation `async` does not break the renderer-side interface
- `notes.update` allows partial field updates (title or content can be omitted). Always re-fetch the full note from SQLite after the update write to get the correct combined text for embedding
- `vectordb`'s `table.delete(where)` accepts a raw SQL WHERE clause string. UUID v4 IDs are safe to interpolate directly (only `[0-9a-f-]` characters)
---