mcp-devtools includes optional distributed tracing and metrics support via OpenTelemetry (OTEL).
With tracing enabled, you can observe:
Traces (Request-Level Detail):
- Tool execution latency and errors
- HTTP requests to external APIs
- Security framework checks (opt-in)
- W3C Trace Context propagation across HTTP boundaries
Metrics (Aggregated Operational Data):
- Tool invocation counts and error rates
- Latency percentiles (P50/P95/P99)
- Active session tracking
- Cache hit ratios
- Security check overhead (opt-in)
Key Design Principles:
- Disabled by default - Zero overhead when not configured
- Environment-based - Standard OTEL environment variables
- Privacy-first - Automatic sanitisation of sensitive data
- Vendor-neutral - Works with any OTLP-compatible backend
- See ./observability/examples/README.md for example setups with Jaeger and optionally Prometheus + Grafana.
# Enable tracing (required)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 # HTTP
# OR
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 # gRPC
# Service identification (optional)
OTEL_SERVICE_NAME=mcp-devtools
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,service.version=1.0.0
# Sampling (optional, default: always on when enabled)
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1 # 10% sampling
# Protocol selection (optional, default: http/protobuf)
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf # or grpc
# Authentication headers (optional)
OTEL_EXPORTER_OTLP_HEADERS=x-api-key=secret123
# Explicit disable (optional)
OTEL_SDK_DISABLED=false# Disable tracing for specific tools (comma-separated)
MCP_TRACING_DISABLED_TOOLS=filesystem,memory
# Maximum span attribute size (default: 4KB, range: 1KB-64KB)
MCP_TRACING_MAX_ATTRIBUTE_SIZE=4096
# Enable security framework tracing (default: false)
MCP_TRACING_SECURITY_ENABLED=true
# Enable specific metric groups (comma-separated, default: tool,session)
# Available groups: tool, session, cache, security
MCP_METRICS_GROUPS=tool,session,cache,security
# Metric export interval (default: 60s)
OTEL_METRIC_EXPORT_INTERVAL=60Metrics provide aggregated operational insights that complement request-level traces. Whilst traces help diagnose individual requests, metrics enable trend analysis, capacity planning, and alerting.
Metrics vs Traces:
- Traces: "Why did this specific request fail?" → Detailed debugging
- Metrics: "Is my error rate increasing?" → Operational trends
Both use the same OTLP endpoint and are enabled/disabled together.
mcp.tool.calls (Counter)
- Total tool invocations
- Labels:
tool.name,transport,result(success/error) - Use case: Track usage patterns, error rates per tool
mcp.tool.duration (Histogram, milliseconds)
- Tool execution latency distribution
- Labels:
tool.name,transport - Buckets:
[10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000] - Use case: P50/P95/P99 latency analysis, identify slow tools
mcp.tool.errors (Counter)
- Categorised tool errors
- Labels:
tool.name,error.type - Error types:
network,timeout,validation,external_api,internal,security - Use case: Identify common failure patterns
mcp.session.active (UpDownCounter)
- Current concurrent sessions
- Labels:
transport - Use case: Monitor load, capacity planning
mcp.session.duration (Histogram, seconds)
- Session lifetime distribution
- Labels:
transport - Buckets:
[10, 30, 60, 300, 600, 1800, 3600, 7200] - Use case: Understand typical session lengths
mcp.session.tool_count (Histogram)
- Tools executed per session
- Buckets:
[1, 2, 5, 10, 20, 50, 100, 200] - Use case: Analyse workflow complexity
cache.operations (Counter)
- Cache operations
- Labels:
tool.name,operation(get/set/delete),result(hit/miss) - Use case: Cache effectiveness analysis
cache.hit_ratio (Gauge, 0.0-1.0)
- Real-time cache hit ratio
- Labels:
tool.name - Use case: Optimisation opportunities
Enabled when security is included in MCP_METRICS_GROUPS
security.checks (Counter)
- Security framework activity
- Labels:
action(allow/warn/block),source.type(url/file/text)
security.rule.triggers (Counter)
- Rule trigger frequency
- Labels:
rule.name,action
security.check.duration (Histogram, milliseconds)
- Security check overhead
- Labels:
action - Buckets:
[1, 5, 10, 25, 50, 100, 250]
# Tool error rate (requests/second)
sum(rate(mcp_tool_errors[5m])) by (tool_name, error_type)
# P95 latency by tool
histogram_quantile(0.95,
sum(rate(mcp_tool_duration_bucket[5m])) by (tool_name, le)
)
# Cache hit ratio
sum(rate(cache_operations{result="hit"}[5m]))
/
sum(rate(cache_operations{operation="get"}[5m]))
# Active sessions by transport
mcp_session_active
# Top 5 slowest tools (P95)
topk(5,
histogram_quantile(0.95,
sum(rate(mcp_tool_duration_bucket[5m])) by (tool_name, le)
)
)
# Error rate threshold alerting
rate(mcp_tool_errors[5m]) > 0.1
Metrics are opt-in via the MCP_METRICS_GROUPS environment variable:
# Enable all metric groups
MCP_METRICS_GROUPS=tool,session,cache,security
# Enable only tool and session metrics (default if not specified)
MCP_METRICS_GROUPS=tool,session
# Enable only tool metrics
MCP_METRICS_GROUPS=toolDefault behaviour: If MCP_METRICS_GROUPS is not set, tool and session metrics are enabled by default.
Available groups: tool, session, cache, security
Every tool invocation creates a span with:
- Tool name - Which tool was executed
- Arguments - Sanitised arguments (no sensitive data)
- Success/failure - Whether the tool succeeded
- Session ID - For correlating multi-tool workflows
- Duration - How long the tool took
Example span attributes:
mcp.tool.name: internet_search
mcp.tool.arguments: {"query":"golang best practices","count":5}
mcp.tool.result.success: true
mcp.session.id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
All HTTP requests to external APIs create child spans with:
- HTTP method - GET, POST, etc.
- URL - Sanitised URL (credentials removed)
- Status code - Response status
- Response size - Size in bytes
- Target host - Hostname of external service
This applies to all tools making HTTP requests (internet search, package docs, GitHub, AWS docs, etc.).
When MCP_TRACING_SECURITY_ENABLED=true, security content analysis creates spans with:
- Content size - Size of content analysed
- Security action - allow/warn/block
- Risk factors - Which rules matched
HTTP transport automatically:
- Extracts trace context from incoming HTTP request headers (
traceparent,tracestate) - Propagates trace context to all outbound HTTP requests
- Supports Baggage headers for optional metadata
This enables distributed tracing when mcp-devtools is part of a larger OTEL-instrumented system.
mcp-devtools follows standardised attribute naming for ecosystem interoperability:
| Attribute | Type | Description |
|---|---|---|
mcp.tool.name |
string | Tool identifier |
mcp.tool.arguments |
string | Sanitised arguments (JSON) |
mcp.tool.result.success |
boolean | Execution success |
mcp.tool.result.error |
string | Error message if failed |
mcp.session.id |
string | Session identifier |
Standard OTEL HTTP semantic conventions are used:
| Attribute | Type | Description |
|---|---|---|
http.method |
string | HTTP method |
http.url |
string | Sanitised URL |
http.status_code |
int | Response status |
http.response.size |
int | Response size in bytes |
net.peer.name |
string | Target hostname |
| Attribute | Type | Description |
|---|---|---|
security.action |
string | allow/warn/block |
security.content.size |
int | Size of content scanned |
security.risk.factor |
string | First matched risk factor |
mcp-devtools uses W3C Trace Context for session correlation. Related tool calls are automatically grouped through:
-
stdio Transport: Single session span created at startup
- Session span is created when stdio server starts
- Session span context is stored globally
- All tool calls during the process lifetime become child spans of the session span
- All tool spans share the same trace ID (the session's trace ID)
- Each tool call has a unique span ID but inherits the parent trace ID
- Session ID included in all spans for additional filtering
-
HTTP Transport: Trace context propagation
- Client sends
traceparentheader with requests - All tool calls with same trace ID are automatically grouped
- Works with any OTEL-compatible client
- Client sends
Example workflow (AI agent calling multiple tools):
Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736 (single trace for entire session)
Session ID: abc123
Session Span (parent):
└─ mcp.session (stdio transport)
├─ internet_search (200ms) - child span
│ └─ HTTP GET brave.com (180ms)
├─ fetch_url (350ms) - child span
│ └─ HTTP GET example.com (320ms)
└─ think (50ms) - child span
All spans share trace ID: 4bf92f3577b34da6a3ce929d0e0e4736
Each span has unique span ID
In Jaeger UI:
- View by trace ID
4bf92f3577b34da6a3ce929d0e0e4736to see the complete workflow - All tool calls appear as children of the session span
- Timing waterfall shows the execution sequence
- Filter by
mcp.session.id:abc123for additional session filtering - Filter by service="mcp-devtools" to see all activity
In AWS X-Ray:
- Service map shows:
mcp-devtools→ external APIs - Trace view shows complete workflow with timing breakdown
- Group by trace ID for session-level analysis
How it works:
- Initialisation: On startup, any previous session span context is cleared to prevent cross-session contamination
- Session span creation: When stdio transport starts, a session span is created with session metadata
- Immediate span end: The session span is ended immediately after creation
- Force flush:
ForceFlush()is called on the tracer provider to ensure the session span exports to the backend immediately - Span context storage: The session span's context is stored globally for tool spans to reference
- Tool execution: When
StartToolSpan()is called, it retrieves the global session span context - Trace context propagation: The session span context is injected into a W3C Trace Context carrier, then extracted into the tool's context using the text map propagator
- Parent-child relationship: The extracted trace context ensures tool spans inherit the session's trace ID and become proper children
- Tool span export: Tool spans export immediately after ending via the batch processor
- Session cleanup: Global span context is cleared when the session ends
This approach creates a proper parent-child hierarchy:
- Session span exports first via immediate end + force flush
- Tool spans inherit trace context via W3C Trace Context propagation (inject → extract pattern)
- All spans grouped under the same trace ID
- Proper hierarchy shows tool spans nested under session span
- No clock skew warnings because the parent span is already in the backend when children arrive
The key insights:
- Force flush is critical: The OTEL batch processor only exports ended spans asynchronously. Without
ForceFlush(), the session span might not export before tool spans, causing "invalid parent span IDs" warnings. - W3C Trace Context propagation: Using the inject→extract pattern properly establishes parent-child relationships across context boundaries, the same mechanism used for distributed tracing.
mcp-devtools should not include the following in traces:
- API keys, tokens, passwords
- SSH keys, certificates
- User credentials
- OAuth tokens or secrets
Sanitisation applied to:
- URLs: Credentials and sensitive query parameters removed
- Arguments: Known sensitive fields redacted
- File paths: Truncated to avoid exposing directory structures
- Large attributes: Truncated to max size with
truncated=trueflag
Example URL sanitisation:
Input: https://user:pass@api.example.com/search?api_key=secret&q=test
Output: https://api.example.com/search?q=test
For production HTTP deployments with high request rates:
# Sample 1% of traces
OTEL_TRACES_SAMPLER=traceidratio
OTEL_TRACES_SAMPLER_ARG=0.01
# Parent-based sampling (always sample errors)
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1When disabled (default):
- Zero overhead
- Noop tracer used
- No network calls
When enabled:
- Tool execution: < 1ms per span
- HTTP client: < 2ms per request
- Memory: ~100KB per 1000 spans (buffered)
- Export: Async and batched (non-blocking)
Recommendations:
- Use sampling for high-volume HTTP deployments
- Disable tracing for specific tools if needed via
MCP_TRACING_DISABLED_TOOLS - Monitor OTLP collector health to avoid export retries
Check endpoint is set:
echo $OTEL_EXPORTER_OTLP_ENDPOINTCheck collector is running:
curl http://localhost:4318/v1/traces # Should return 405 Method Not AllowedCheck logs for OTEL errors:
# Look for "OTEL:" prefix in logs
./bin/mcp-devtools stdio 2>&1 | grep OTELCheck if tracing is enabled:
# Should see "OTEL: Initialising tracer" if enabled
./bin/mcp-devtools stdio 2>&1 | grep "Initialising tracer"- Reduce
OTEL_TRACES_SAMPLER_ARGto sample fewer traces - Check OTLP collector is healthy and processing exports
- Reduce
MCP_TRACING_MAX_ATTRIBUTE_SIZEif argument values are very large - Disable tracing for high-volume tools via
MCP_TRACING_DISABLED_TOOLS
Tracing is designed to gracefully degrade:
- Application continues to work normally
- Warning logged: "OTEL: Failed to create exporter, falling back to noop tracer"
- No performance impact beyond initial connection attempt
