Commit 1d3cca4
feat(knowledge): add embedding model selection and Cohere reranker (#4349)
* feat(knowledge): add embedding model selection and Cohere reranker
* fix(knowledge): split reranker model constants into client-safe module
* fix(knowledge): bill rerank on every successful API call and fix MDX docs literal
* test(knowledge): align embedding tests with provider abstraction changes
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): require explicit Azure deployment per OpenAI embedding model
Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was
routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with
text-embedding-3-large would be embedded by whatever model that deployment
serves while billing tracked 3-large pricing — and chunks ingested via Azure
versus queried via real OpenAI would land in mismatched vector spaces.
Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model.
Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy).
If no deployment is configured for the chosen model, route to direct OpenAI
instead of silently routing to the wrong deployment.
Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult
so the build passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): skip platform reranker billing for BYOK Cohere keys
Cursor bugbot found that resolveCohereKey discarded BYOK status, so the
search route always added platform rerankerCost even when the workspace
supplied its own Cohere key.
Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns
{ results, isBYOK }. The search route checks rerankIsBYOK before adding
rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields,
mirroring how generateEmbeddings handles BYOK billing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): match search tokenizer to embedding provider; remove dead var
Cursor bugbot:
- Token estimation was hardcoded to 'openai' for every embedding model.
For gemini-embedding-001 the cost was computed against an OpenAI-tokenized
count, producing wrong input.tokens.prompt and (slightly) wrong cost.
Now derive the tokenizer provider from the embedding model's provider.
- rerankApplied was set but never read. Removed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): match chunk tokenizer to KB embedding provider
Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer
when computing the stored tokenCount. For KBs using gemini-embedding-001 the
count was estimated with the wrong heuristic, leading to inaccurate stored
counts (and any billing derived from them). Now derive the tokenizer from
the KB's embedding model provider, matching the search route.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* refactor(knowledge): centralize tokenizer mapping on EmbeddingModelInfo
Add tokenizerProvider directly to EmbeddingModelInfo so callers read it
from the registry instead of reimplementing the gemini→google / openai→openai
map at each call site. Removes the local helper in chunks/service.ts and
the inline ternary in search/route.ts.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* refactor(knowledge): lock embedding model to KB_EMBEDDING_MODEL env var
Remove the user-facing model picker from the KB create modal and the
embeddingModel field from the create/update API schemas. The active model
is now selected server-side via KB_EMBEDDING_MODEL, which collapses Azure
routing to a single deployment (KB_OPENAI_MODEL_NAME) and drops the
per-model AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_* env vars and
SUPPORTED_EMBEDDING_MODEL_IDS / UI-only label+description registry fields.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): use provider tokenizer for chunks and bound rerank indices
- documents/service.ts: replace ceil(len/4) heuristic with estimateTokenCount using the embedding model's tokenizerProvider so token counts match billing
- reranker.ts: filter Cohere rerank results to valid indices before mapping to defend against malformed responses
- utils.test.ts: add embeddingModel to kb fixture so getEmbeddingModelInfo resolves
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): use .count from estimateTokenCount return value
estimateTokenCount returns a TokenEstimate object, not a number — access
.count so the integer token count is stored instead of an object.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): only enforce single embedding model when query is present
Tag-only searches don't generate a query embedding, so two KBs with
different embedding models can be filtered together. Gate the guard on
hasQuery so cross-model tag-only queries no longer 400.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): use getConfiguredEmbeddingModel in copilot KB creation
Copilot-created KBs were hardcoded to text-embedding-3-small, ignoring
KB_EMBEDDING_MODEL. This caused cross-KB searches mixing copilot- and
API-created KBs to hit the embedding-model-mismatch guard.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): make EMBEDDING_DIMENSIONS a literal type
CreateKnowledgeBaseData.embeddingDimension is typed as the literal 1536,
so EMBEDDING_DIMENSIONS needs `as const` to satisfy it after the copilot
path switched to passing the constant.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): use per-KB embedding model in v1 search route
The v1 search endpoint was passing undefined to generateSearchEmbedding,
which silently fell back to text-embedding-3-small. KBs created while
KB_EMBEDDING_MODEL=gemini-embedding-001 (or any non-default) would have
their queries embedded with the wrong model. Now resolves the model from
the KB rows like the internal route, with the same multi-model guard.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(knowledge): polish embedding/reranker implementation
- Drop unused supportsCustomDimensions from EmbeddingModelInfo (every
registered model supports it; OpenAI/Azure paths now always send
dimensions: 1536).
- Type SUPPORTED_EMBEDDING_MODELS as Partial<Record<...>> so index lookups
surface as possibly-undefined in the type system instead of relying on
runtime null checks alone.
- Require AZURE_OPENAI_API_VERSION in the Azure routing gate. Missing
api-version no longer slips through as ?api-version=undefined; it now
falls back to direct OpenAI.
- Use the embedding provider's tokenizer (estimateTokenCount) for the
Gemini fallback token estimate instead of len/4, so billing matches
the model's tokenization.
- Drop unreachable 'text-embedding-3-small' fallback in the manual chunk
upload route — accessCheck.knowledgeBase is non-null after the access
guard.
- docs-chunker now reads getConfiguredEmbeddingModel() so Sim's docs
ingestion respects KB_EMBEDDING_MODEL like the user-facing paths.
- Add v1 search route test covering per-KB model resolution and the
cross-KB mixed-model rejection.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): resolve type errors and unhandled rejection in search routes
- Use accessCheck.knowledgeBase.embeddingModel directly in chunks response
- Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search
- Move inaccessible-KB 404 check before query embedding promise creation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): pass Gemini API key via x-goog-api-key header
URLs end up in server access logs, proxy logs, and APM tools, so embedding
the key as a query param risks accidental exposure. Google explicitly
recommends the header form for the Gemini REST API.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): default Azure deployment name to embedding model name
Restore the prior fallback so existing Azure deployments — which conventionally
name the deployment after the model — continue to route through Azure when
KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell
through to direct OpenAI.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): cap Gemini batches at 100 items, add singular GEMINI_API_KEY fallback
- Gemini's batchEmbedContents API rejects requests with more than 100
items. The token-based batcher could pack hundreds of short chunks
into a single request, causing 400s. Add maxItemsPerRequest on
ResolvedProvider and split token batches further when set.
- Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a
fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(knowledge): prefer singular Cohere key before rotation
Match resolveOpenAIKey/resolveGeminiKey order: check the singular
COHERE_API_KEY before falling back to rotating keys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>1 parent eeba7d9 commit 1d3cca4
29 files changed
Lines changed: 1030 additions & 179 deletions
File tree
- apps
- docs/content/docs/en/tools
- sim
- app/api
- knowledge
- [id]
- documents/[documentId]/chunks
- search
- v1/knowledge
- search
- blocks/blocks
- lib
- chunkers
- copilot/tools/server/knowledge
- core/config
- knowledge
- chunks
- documents
- providers
- tools
- knowledge
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
| |||
Lines changed: 7 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
218 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
219 | 224 | | |
220 | 225 | | |
221 | 226 | | |
| |||
240 | 245 | | |
241 | 246 | | |
242 | 247 | | |
243 | | - | |
| 248 | + | |
244 | 249 | | |
245 | 250 | | |
246 | 251 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | 30 | | |
33 | 31 | | |
34 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | | - | |
24 | | - | |
25 | 24 | | |
26 | 25 | | |
27 | 26 | | |
| |||
118 | 117 | | |
119 | 118 | | |
120 | 119 | | |
| 120 | + | |
| 121 | + | |
121 | 122 | | |
122 | 123 | | |
123 | 124 | | |
| 125 | + | |
| 126 | + | |
124 | 127 | | |
125 | 128 | | |
126 | 129 | | |
| |||
166 | 169 | | |
167 | 170 | | |
168 | 171 | | |
169 | | - | |
170 | | - | |
| 172 | + | |
| 173 | + | |
171 | 174 | | |
172 | 175 | | |
173 | 176 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
432 | 432 | | |
433 | 433 | | |
434 | 434 | | |
| 435 | + | |
435 | 436 | | |
436 | 437 | | |
437 | 438 | | |
| |||
524 | 525 | | |
525 | 526 | | |
526 | 527 | | |
| 528 | + | |
527 | 529 | | |
528 | 530 | | |
529 | 531 | | |
| |||
571 | 573 | | |
572 | 574 | | |
573 | 575 | | |
| 576 | + | |
574 | 577 | | |
575 | 578 | | |
576 | 579 | | |
| |||
625 | 628 | | |
626 | 629 | | |
627 | 630 | | |
| 631 | + | |
628 | 632 | | |
629 | 633 | | |
630 | 634 | | |
| |||
694 | 698 | | |
695 | 699 | | |
696 | 700 | | |
| 701 | + | |
697 | 702 | | |
698 | 703 | | |
699 | 704 | | |
| |||
739 | 744 | | |
740 | 745 | | |
741 | 746 | | |
| 747 | + | |
742 | 748 | | |
743 | 749 | | |
744 | 750 | | |
| |||
877 | 883 | | |
878 | 884 | | |
879 | 885 | | |
| 886 | + | |
880 | 887 | | |
881 | 888 | | |
882 | 889 | | |
| |||
921 | 928 | | |
922 | 929 | | |
923 | 930 | | |
| 931 | + | |
924 | 932 | | |
925 | 933 | | |
926 | 934 | | |
927 | 935 | | |
928 | | - | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
929 | 942 | | |
930 | 943 | | |
931 | 944 | | |
| |||
0 commit comments