You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add failure diagnostics to tool-call telemetry (#631)
* feat: add failure diagnostics to tool-call telemetry
Tool failures in Mixpanel were blind spots — every failure was FAILED/SOFT_FAIL
with no way to tell a user typo from a server crash, making it impossible to
prioritize fixes or understand error patterns.
New telemetry fields (non-success only): failure_category (INVALID_INPUT |
AUTH | INTERNAL_ERROR), failure_http_status, failure_detail (truncated error
message), actor_name, and AJV validation diagnostics (keyword, path,
missing/additional property).
Key decisions:
- Three-value enum, not ten — INVALID_INPUT covers both validation and
not-found (user-caused); validation_* fields disambiguate when needed
- Wire protocol: buildMCPResponse carries transient internal* fields from
tool helpers to server dispatch, stripped before reaching MCP clients
- actor-mcp is a known gap (opaque isError, defaults to INTERNAL_ERROR) —
documented as TODO, deferred to future iteration
- Kept server.ts handler structure to minimize structural diff
Other changes:
- preparePayment → prepareToolCallContext (name was misleading — handles
all tool calls, not just paid ones); cleanArgs → toolArgs, logArgs →
logSafeArgs, client → apifyClient
- dedent for multi-line error messages in server.ts
- actor_name extracted via extractActorName() — especially needed for
call-actor where the actor slug is only in arguments
- INVALID_INPUT propagated across tool helpers (one-line additions per file)
- 402 Payment Required now tracked with failure_category instead of empty
diagnostics
- New helpers: classifyFailureCategory(), extractValidationDiagnostics()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Review comments
* fix: improve failure category classification in telemetry
* refactor: introduce `ValidationDiagnostics` type for cleaner validation handling
Replaced inline Pick definitions for validation diagnostics across multiple files with a shared `ValidationDiagnostics` type in `types.ts`. Simplifies code and ensures consistency while extracting or handling validation-related diagnostics.
* refactor: simplify failure diagnostics extraction in server handlers
Replaced inline diagnostic extraction logic with `extractToolResponseDiagnostics` utility in `tool_status.ts`. Centralizes diagnostic handling, reduces repetition, and ensures consistent behavior across server responses.
* refactor: use constants for HTTP status codes in failure category classification
Replaced hardcoded HTTP status codes with constants (`HTTP_UNAUTHORIZED`, `HTTP_FORBIDDEN`, `HTTP_NOT_FOUND`) in `tool_status.ts`. Ensures clarity and reduces potential for errors. Added constant definitions to `const.ts`.
* feat: add `actor-mcp` handling and enhance validation diagnostics
Added support for extracting `actorId` for `actor-mcp` tools in `extractActorName`. Enhanced validation diagnostics by including error count (`validation_error_count`) and updated related types (`ValidationDiagnostics`, `ToolCallTelemetryProperties`) to ensure consistency. Improved error handling in `extractValidationDiagnostics` to account for multiple validation errors.
* refactor: centralize `getToolFullName` and `extractActorName` logic in utilities
Removed redundant implementations of actor name and tool full name extraction. Introduced `getToolFullName` and relocated `extractActorName` to `tools.ts` for consistency and reuse. Updated server handlers to use the new utility, simplifying telemetry and tool resolution logic.
* test: add unit tests for `getToolFullName` and `extractActorName` functions
* fix: reformat failure diagnostics assertion in `extractToolResponseDiagnostics` test
* refactor: replace inline failure diagnostics with centralized `telemetry` object
Standardized telemetry reporting across tools. Introduced `telemetry` object to replace inline `toolStatus` and failure diagnostics. Updated utility functions to extract AJV error details and telemetry consistently. Refactored server handlers to align with these changes, improving maintainability and consistency in error handling.
* fix: include actor details in failure diagnostics for enhanced telemetry logging
* refactor: replace `FailureDetails` with `CallDiagnostics` for unified telemetry management
Standardized telemetry tracking across the server by replacing `FailureDetails` with the more comprehensive `CallDiagnostics` type. Updated server handlers, utility functions, and tests to consistently use `CallDiagnostics` for handling failure diagnostics and actor-related fields. Improved maintainability and enhanced consistency in telemetry data logging.
* fix: enhance failure diagnostics with detailed error information and categories
* refactor: clean up CallActorResolvedContext type
* fix: simplify isActorMcpServer assignment in call_actor_common.ts
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments