feat: integrate vectordb for note embeddings
- Added `vectordb` as a dependency in `package.json`. - Implemented `embedText` function in `src/main/ai/embeddings.ts` to handle text embeddings using GitHub Copilot OAuth token or OpenAI token. - Created `vectordb.ts` for managing LanceDB connection and embedding notes with upsert strategy. - Updated `index.ts` to initialize vector database and migrate existing notes on app ready. - Modified `router/index.ts` to fire-and-forget embedding calls on note creation and updates. - Enhanced `progress.txt` with detailed implementation notes and learnings regarding the integration.
This commit is contained in:
26
progress.txt
26
progress.txt
@@ -493,3 +493,29 @@
|
||||
- Short-circuit optimizations that skip LLM nodes are only safe for providers where the SDK has no internal state to initialize (OpenAI, Anthropic)
|
||||
- If you want to restore the short-circuit as an OpenAI/Anthropic optimization, gate it: `if (TOOL_CALLING_PROVIDERS.has(getActiveProviderName()) && state.chatContext.type === 'project')`
|
||||
---
|
||||
|
||||
## [2026-02-24] - US-022
|
||||
- What was implemented:
|
||||
- Installed `vectordb` (LanceDB Node.js binding v0.21.2) as a project dependency
|
||||
- Created `src/main/ai/embeddings.ts`: reads GitHub Copilot OAuth token from `~/.copilot/config.json` (via `copilot_tokens` map), falls back to stored OpenAI token via `getToken('openai')`. Uses `@langchain/openai` `OpenAIEmbeddings` with `baseURL: 'https://api.githubcopilot.com'` for Copilot path, or standard OpenAI API for fallback. Exposes `embedText(text): Promise<number[]>`
|
||||
- Created `src/main/db/vectordb.ts`: LanceDB singleton (`initVectorDb()` / `getConn()`), `upsertNoteEmbedding(noteId, projectId, content)` with delete-then-add upsert strategy (first call auto-creates table with schema inferred from first record), `migrateNotesIfNeeded()` that checks table existence on startup and bulk-embeds all SQLite notes sequentially with per-note error isolation
|
||||
- Modified `src/main/router/index.ts`: imported `upsertNoteEmbedding`, made `notes.create` and `notes.update` async, added fire-and-forget embedding calls with `.catch(console.error)` in both handlers; `notes.update` re-fetches the full note from SQLite after the write to embed current title+content
|
||||
- Modified `src/main/index.ts`: imported `initVectorDb` + `migrateNotesIfNeeded`, added `initVectorDb().then(() => migrateNotesIfNeeded()).catch(...)` chain in `app.on('ready')`
|
||||
- Modified `vite.main.config.mts`: added `'vectordb'` to the `external` array so ViteRollup doesn't try to bundle the NAPI-RS binary
|
||||
- Files changed:
|
||||
- `package.json` (vectordb dependency added)
|
||||
- `vite.main.config.mts`
|
||||
- `src/main/index.ts`
|
||||
- `src/main/router/index.ts`
|
||||
- `src/main/ai/embeddings.ts` (new)
|
||||
- `src/main/db/vectordb.ts` (new)
|
||||
- **Learnings for future iterations:**
|
||||
- `@github/copilot-sdk` has **no embeddings API** — it is a pure chat/session SDK. The `CopilotClient` type definitions contain zero mention of "embedding". Do not assume any LLM provider SDK supports embeddings
|
||||
- GitHub Copilot CLI stores OAuth tokens in `~/.copilot/config.json` under `copilot_tokens["{host}:{login}"]`. The token format is `gho_*` (GitHub OAuth). These tokens work with the GitHub Copilot REST API (`https://api.githubcopilot.com`) which is OpenAI-compatible — including embeddings
|
||||
- `@langchain/openai`'s `OpenAIEmbeddings` accepts a `configuration.baseURL` option that makes it work against any OpenAI-compatible endpoint
|
||||
- `vectordb` (v0.21.2): deprecated but functional. The new package name is `@lancedb/lancedb`. `vectordb` requires at least one data record for `createTable()` (cannot create an empty table — schema is inferred from the first record). Use delete-then-add for upsert since there's no native upsert API at this version
|
||||
- When using dynamic `import('@langchain/openai')`, TypeScript cannot infer the exact return type of `embedDocuments()` — it resolves to `{}` instead of `number[][]`. Fix: cast explicitly `as number[][]`
|
||||
- tRPC mutation handlers support both sync and async functions transparently — making a mutation `async` does not break the renderer-side interface
|
||||
- `notes.update` allows partial field updates (title or content can be omitted). Always re-fetch the full note from SQLite after the update write to get the correct combined text for embedding
|
||||
- `vectordb`'s `table.delete(where)` accepts a raw SQL WHERE clause string. UUID v4 IDs are safe to interpolate directly (only `[0-9a-f-]` characters)
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user