Skip to content

Commit 1d3cca4

Browse files
waleedlatif1claude
andcommitted
feat(knowledge): add embedding model selection and Cohere reranker (#4349)
* feat(knowledge): add embedding model selection and Cohere reranker * fix(knowledge): split reranker model constants into client-safe module * fix(knowledge): bill rerank on every successful API call and fix MDX docs literal * test(knowledge): align embedding tests with provider abstraction changes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): require explicit Azure deployment per OpenAI embedding model Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with text-embedding-3-large would be embedded by whatever model that deployment serves while billing tracked 3-large pricing — and chunks ingested via Azure versus queried via real OpenAI would land in mismatched vector spaces. Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model. Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy). If no deployment is configured for the chosen model, route to direct OpenAI instead of silently routing to the wrong deployment. Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult so the build passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): skip platform reranker billing for BYOK Cohere keys Cursor bugbot found that resolveCohereKey discarded BYOK status, so the search route always added platform rerankerCost even when the workspace supplied its own Cohere key. Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns { results, isBYOK }. The search route checks rerankIsBYOK before adding rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields, mirroring how generateEmbeddings handles BYOK billing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): match search tokenizer to embedding provider; remove dead var Cursor bugbot: - Token estimation was hardcoded to 'openai' for every embedding model. For gemini-embedding-001 the cost was computed against an OpenAI-tokenized count, producing wrong input.tokens.prompt and (slightly) wrong cost. Now derive the tokenizer provider from the embedding model's provider. - rerankApplied was set but never read. Removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): match chunk tokenizer to KB embedding provider Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer when computing the stored tokenCount. For KBs using gemini-embedding-001 the count was estimated with the wrong heuristic, leading to inaccurate stored counts (and any billing derived from them). Now derive the tokenizer from the KB's embedding model provider, matching the search route. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(knowledge): centralize tokenizer mapping on EmbeddingModelInfo Add tokenizerProvider directly to EmbeddingModelInfo so callers read it from the registry instead of reimplementing the gemini→google / openai→openai map at each call site. Removes the local helper in chunks/service.ts and the inline ternary in search/route.ts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(knowledge): lock embedding model to KB_EMBEDDING_MODEL env var Remove the user-facing model picker from the KB create modal and the embeddingModel field from the create/update API schemas. The active model is now selected server-side via KB_EMBEDDING_MODEL, which collapses Azure routing to a single deployment (KB_OPENAI_MODEL_NAME) and drops the per-model AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_* env vars and SUPPORTED_EMBEDDING_MODEL_IDS / UI-only label+description registry fields. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use provider tokenizer for chunks and bound rerank indices - documents/service.ts: replace ceil(len/4) heuristic with estimateTokenCount using the embedding model's tokenizerProvider so token counts match billing - reranker.ts: filter Cohere rerank results to valid indices before mapping to defend against malformed responses - utils.test.ts: add embeddingModel to kb fixture so getEmbeddingModelInfo resolves Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use .count from estimateTokenCount return value estimateTokenCount returns a TokenEstimate object, not a number — access .count so the integer token count is stored instead of an object. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): only enforce single embedding model when query is present Tag-only searches don't generate a query embedding, so two KBs with different embedding models can be filtered together. Gate the guard on hasQuery so cross-model tag-only queries no longer 400. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use getConfiguredEmbeddingModel in copilot KB creation Copilot-created KBs were hardcoded to text-embedding-3-small, ignoring KB_EMBEDDING_MODEL. This caused cross-KB searches mixing copilot- and API-created KBs to hit the embedding-model-mismatch guard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): make EMBEDDING_DIMENSIONS a literal type CreateKnowledgeBaseData.embeddingDimension is typed as the literal 1536, so EMBEDDING_DIMENSIONS needs `as const` to satisfy it after the copilot path switched to passing the constant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use per-KB embedding model in v1 search route The v1 search endpoint was passing undefined to generateSearchEmbedding, which silently fell back to text-embedding-3-small. KBs created while KB_EMBEDDING_MODEL=gemini-embedding-001 (or any non-default) would have their queries embedded with the wrong model. Now resolves the model from the KB rows like the internal route, with the same multi-model guard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(knowledge): polish embedding/reranker implementation - Drop unused supportsCustomDimensions from EmbeddingModelInfo (every registered model supports it; OpenAI/Azure paths now always send dimensions: 1536). - Type SUPPORTED_EMBEDDING_MODELS as Partial<Record<...>> so index lookups surface as possibly-undefined in the type system instead of relying on runtime null checks alone. - Require AZURE_OPENAI_API_VERSION in the Azure routing gate. Missing api-version no longer slips through as ?api-version=undefined; it now falls back to direct OpenAI. - Use the embedding provider's tokenizer (estimateTokenCount) for the Gemini fallback token estimate instead of len/4, so billing matches the model's tokenization. - Drop unreachable 'text-embedding-3-small' fallback in the manual chunk upload route — accessCheck.knowledgeBase is non-null after the access guard. - docs-chunker now reads getConfiguredEmbeddingModel() so Sim's docs ingestion respects KB_EMBEDDING_MODEL like the user-facing paths. - Add v1 search route test covering per-KB model resolution and the cross-KB mixed-model rejection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): resolve type errors and unhandled rejection in search routes - Use accessCheck.knowledgeBase.embeddingModel directly in chunks response - Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search - Move inaccessible-KB 404 check before query embedding promise creation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): pass Gemini API key via x-goog-api-key header URLs end up in server access logs, proxy logs, and APM tools, so embedding the key as a query param risks accidental exposure. Google explicitly recommends the header form for the Gemini REST API. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): default Azure deployment name to embedding model name Restore the prior fallback so existing Azure deployments — which conventionally name the deployment after the model — continue to route through Azure when KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell through to direct OpenAI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): cap Gemini batches at 100 items, add singular GEMINI_API_KEY fallback - Gemini's batchEmbedContents API rejects requests with more than 100 items. The token-based batcher could pack hundreds of short chunks into a single request, causing 400s. Add maxItemsPerRequest on ResolvedProvider and split token batches further when set. - Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): prefer singular Cohere key before rotation Match resolveOpenAIKey/resolveGeminiKey order: check the singular COHERE_API_KEY before falling back to rotating keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent eeba7d9 commit 1d3cca4

29 files changed

Lines changed: 1030 additions & 179 deletions

File tree

apps/docs/content/docs/en/tools/knowledge.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ Search for similar content in a knowledge base using vector similarity
4747
| `properties` | string | No | No description |
4848
| `tagName` | string | No | No description |
4949
| `tagValue` | string | No | No description |
50+
| `rerankerEnabled` | boolean | No | Whether to apply Cohere reranking to vector search results |
51+
| `rerankerModel` | string | No | Cohere rerank model to use \(one of: rerank-v4.0-pro, rerank-v4.0-fast, rerank-v3.5\) |
5052
| `tagFilters` | string | No | No description |
5153

5254
#### Output

apps/sim/app/api/knowledge/[id]/documents/[documentId]/chunks/route.ts

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,12 @@ export const POST = withRouteHandler(
215215

216216
let cost = null
217217
try {
218-
cost = calculateCost('text-embedding-3-small', newChunk.tokenCount, 0, false)
218+
cost = calculateCost(
219+
accessCheck.knowledgeBase.embeddingModel,
220+
newChunk.tokenCount,
221+
0,
222+
false
223+
)
219224
} catch (error) {
220225
logger.warn(`[${requestId}] Failed to calculate cost for chunk upload`, {
221226
error: error instanceof Error ? error.message : 'Unknown error',
@@ -240,7 +245,7 @@ export const POST = withRouteHandler(
240245
completion: 0,
241246
total: newChunk.tokenCount,
242247
},
243-
model: 'text-embedding-3-small',
248+
model: accessCheck.knowledgeBase.embeddingModel,
244249
pricing: cost.pricing,
245250
},
246251
}

apps/sim/app/api/knowledge/[id]/route.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,6 @@ const logger = createLogger('KnowledgeBaseByIdAPI')
2727
const UpdateKnowledgeBaseSchema = z.object({
2828
name: z.string().min(1, 'Name is required').optional(),
2929
description: z.string().optional(),
30-
embeddingModel: z.literal('text-embedding-3-small').optional(),
31-
embeddingDimension: z.literal(1536).optional(),
3230
workspaceId: z.string().nullable().optional(),
3331
chunkingConfig: z
3432
.object({

apps/sim/app/api/knowledge/route.ts

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import { getSession } from '@/lib/auth'
66
import { PlatformEvents } from '@/lib/core/telemetry'
77
import { generateRequestId } from '@/lib/core/utils/request'
88
import { withRouteHandler } from '@/lib/core/utils/with-route-handler'
9+
import { EMBEDDING_DIMENSIONS, getConfiguredEmbeddingModel } from '@/lib/knowledge/embeddings'
910
import {
1011
createKnowledgeBase,
1112
getKnowledgeBases,
@@ -20,8 +21,6 @@ const CreateKnowledgeBaseSchema = z.object({
2021
name: z.string().min(1, 'Name is required'),
2122
description: z.string().optional(),
2223
workspaceId: z.string().min(1, 'Workspace ID is required'),
23-
embeddingModel: z.literal('text-embedding-3-small').default('text-embedding-3-small'),
24-
embeddingDimension: z.literal(1536).default(1536),
2524
chunkingConfig: z
2625
.object({
2726
maxSize: z.number().min(100).max(4000).default(1024),
@@ -118,9 +117,13 @@ export const POST = withRouteHandler(async (req: NextRequest) => {
118117
try {
119118
const validatedData = CreateKnowledgeBaseSchema.parse(body)
120119

120+
const embeddingModel = getConfiguredEmbeddingModel()
121+
121122
const createData = {
122123
...validatedData,
123124
userId: session.user.id,
125+
embeddingModel,
126+
embeddingDimension: EMBEDDING_DIMENSIONS,
124127
}
125128

126129
const newKnowledgeBase = await createKnowledgeBase(createData, requestId)
@@ -166,8 +169,8 @@ export const POST = withRouteHandler(async (req: NextRequest) => {
166169
metadata: {
167170
name: validatedData.name,
168171
description: validatedData.description,
169-
embeddingModel: validatedData.embeddingModel,
170-
embeddingDimension: validatedData.embeddingDimension,
172+
embeddingModel,
173+
embeddingDimension: EMBEDDING_DIMENSIONS,
171174
chunkingStrategy: validatedData.chunkingConfig.strategy,
172175
chunkMaxSize: validatedData.chunkingConfig.maxSize,
173176
chunkMinSize: validatedData.chunkingConfig.minSize,

apps/sim/app/api/knowledge/search/route.test.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,7 @@ describe('Knowledge Search API Route', () => {
432432
userId: 'user-123',
433433
name: 'Test KB',
434434
deletedAt: null,
435+
embeddingModel: 'text-embedding-3-small',
435436
},
436437
})
437438

@@ -524,6 +525,7 @@ describe('Knowledge Search API Route', () => {
524525
userId: 'user-123',
525526
name: 'Test KB',
526527
deletedAt: null,
528+
embeddingModel: 'text-embedding-3-small',
527529
},
528530
})
529531

@@ -571,6 +573,7 @@ describe('Knowledge Search API Route', () => {
571573
userId: 'user-123',
572574
name: 'Test KB',
573575
deletedAt: null,
576+
embeddingModel: 'text-embedding-3-small',
574577
},
575578
})
576579

@@ -625,6 +628,7 @@ describe('Knowledge Search API Route', () => {
625628
userId: 'user-123',
626629
name: 'Test KB',
627630
deletedAt: null,
631+
embeddingModel: 'text-embedding-3-small',
628632
},
629633
})
630634

@@ -694,6 +698,7 @@ describe('Knowledge Search API Route', () => {
694698
userId: 'user-123',
695699
name: 'Test KB',
696700
deletedAt: null,
701+
embeddingModel: 'text-embedding-3-small',
697702
},
698703
})
699704

@@ -739,6 +744,7 @@ describe('Knowledge Search API Route', () => {
739744
userId: 'user-123',
740745
name: 'Test KB',
741746
deletedAt: null,
747+
embeddingModel: 'text-embedding-3-small',
742748
},
743749
})
744750

@@ -877,6 +883,7 @@ describe('Knowledge Search API Route', () => {
877883
userId: 'user-123',
878884
name: 'Test KB',
879885
deletedAt: null,
886+
embeddingModel: 'text-embedding-3-small',
880887
},
881888
})
882889

@@ -921,11 +928,17 @@ describe('Knowledge Search API Route', () => {
921928
userId: 'user-123',
922929
name: 'Test KB',
923930
deletedAt: null,
931+
embeddingModel: 'text-embedding-3-small',
924932
},
925933
})
926934
.mockResolvedValueOnce({
927935
hasAccess: true,
928-
knowledgeBase: { id: 'kb-456', userId: 'user-123', name: 'Test KB 2' },
936+
knowledgeBase: {
937+
id: 'kb-456',
938+
userId: 'user-123',
939+
name: 'Test KB 2',
940+
embeddingModel: 'text-embedding-3-small',
941+
},
929942
})
930943

931944
mockGetDocumentTagDefinitions.mockResolvedValue(mockTagDefinitions)

0 commit comments

Comments
 (0)