Skip to content

Commit 9be73be

Browse files
Copilotpelikhan
andauthored
docs: add context observability section to aw-harness spec
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/44933735-2676-4d52-a7c8-b2d7b7628f90 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
1 parent 355543b commit 9be73be

1 file changed

Lines changed: 100 additions & 6 deletions

File tree

specs/aw-harness.md

Lines changed: 100 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,7 @@ A conforming implementation **MUST** exit with code `0` if and only if the agent
239239

240240
- **stdout**: Reserved for structured output (e.g., JSON summaries). A conforming implementation **SHOULD NOT** write diagnostic messages to stdout.
241241
- **stderr**: All diagnostic messages, JSONL event stream, and debug output **MUST** be written to stderr.
242+
- **GitHub Actions step summary** (`$GITHUB_STEP_SUMMARY`): The harness **MUST** write a Markdown-formatted execution summary to the file path indicated by the `GITHUB_STEP_SUMMARY` environment variable when that variable is set. The summary **MUST** be valid GitHub-flavored Markdown so that it renders correctly in the GitHub Actions step summary UI.
242243

243244
---
244245

@@ -670,25 +671,109 @@ The following six extensions **MUST** be loaded into the `AgentSession` created
670671
671672
### 8.6 Extension 6: Observability
672673
673-
**Purpose:** Emits JSONL events to stderr and generates OTel spans.
674+
**Purpose:** Emits structured event streams to stderr, writes a context provenance file for downstream analysis, renders a Markdown step summary, and reports per-turn token consumption.
674675
675676
**Requirements:**
676677
678+
#### 8.6.1 JSONL Event Stream
679+
677680
- The extension **MUST** subscribe to `agent_start`, `turn_end`, `tool_execution_end`, and `agent_end` events.
678681
- On each event, the extension **MUST** emit a corresponding JSONL record to stderr.
679-
- If `observability.otlp.endpoint` is configured in the workflow frontmatter, the extension **MUST** create and close OTel spans for each task.
682+
- If `observability.otlp.endpoint` is configured in the workflow frontmatter, the extension **MUST** create and close OTel spans for the session.
680683
- OTel span attributes **MUST** include at minimum: model, token counts, and cost.
681684
682-
> [!NOTE] Non-normative example.
685+
#### 8.6.2 Context Provenance File
686+
687+
- The extension **MUST** produce a context provenance file at a well-known path (e.g., `/tmp/gh-aw/context-provenance.jsonl`) when the session completes.
688+
- The file **MUST** contain one JSON record per context entry added to the session, in chronological order. Each record **MUST** include:
689+
- `timestamp` (ISO 8601 string): When the entry was added.
690+
- `source` (string): The declared origin of the text — one of `"prompt"` (from `prompt.txt`), `"import"` (from an `imports:` file, with `path` sub-field), or `"system"` (from `harness.system`).
691+
- `path` (string, **OPTIONAL**): Repository-relative path for `"import"` entries.
692+
- `tokens` (number): Estimated token count for this entry at the time it was added.
693+
- `cumulative_tokens` (number): Running total of tokens in the context window at the time of this entry.
694+
- `role` (string): The message role — `"user"`, `"assistant"`, or `"system"`.
695+
- The purpose of this file is to allow downstream tools (e.g., `gh aw audit`) to perform deep analysis of context growth, identify which imports consumed the most token budget, and diagnose context-window pressure.
696+
697+
#### 8.6.3 GitHub Actions Step Summary
698+
699+
- When the `GITHUB_STEP_SUMMARY` environment variable is set, the extension **MUST** write a Markdown-formatted execution summary to the file at that path.
700+
- The summary **MUST** be valid GitHub-flavored Markdown so that it renders correctly in the GitHub Actions step summary UI.
701+
- The summary **MUST** include at minimum:
702+
- A header identifying the workflow and model used.
703+
- A table showing per-turn token consumption (input tokens, output tokens, cumulative total, and estimated cost).
704+
- A final row with session totals (total tokens, total cost, elapsed time).
705+
- A context provenance section listing each `imports:` file with its token contribution.
706+
707+
#### 8.6.4 Per-Turn Token Consumption Output
708+
709+
- The extension **MUST** subscribe to `turn_end` events and emit a human-readable token consumption line to stderr after each turn.
710+
- The line **MUST** report: turn number, input tokens, output tokens, cumulative total tokens, and estimated cumulative cost.
711+
- The line **MUST** be formatted as valid GitHub-flavored Markdown (e.g., using a `>` blockquote prefix) so that it renders correctly when appended to the step summary.
712+
713+
> [!NOTE] Non-normative examples.
714+
>
715+
> **JSONL event (turn_end):**
716+
> ```json
717+
> {"event":"turn_end","turn":3,"input_tokens":4200,"output_tokens":850,"cumulative_tokens":15320,"cumulative_cost_usd":0.0412,"model":"claude-sonnet-4.6","ts":"2026-05-02T10:30:00.000Z"}
718+
> ```
719+
>
720+
> **Context provenance record:**
721+
> ```json
722+
> {"timestamp":"2026-05-02T10:29:00.000Z","source":"import","path":"skills/reporting/SKILL.md","tokens":1240,"cumulative_tokens":1240,"role":"user"}
723+
> {"timestamp":"2026-05-02T10:29:00.001Z","source":"prompt","tokens":520,"cumulative_tokens":1760,"role":"user"}
724+
> ```
725+
>
726+
> **Step summary (excerpt):**
727+
> ```markdown
728+
> ## AW Harness Run`claude-sonnet-4.6`
729+
>
730+
> | Turn | Input Tokens | Output Tokens | Cumulative | Est. Cost |
731+
> |------|-------------|---------------|------------|-----------|
732+
> | 1 | 1,760 | 420 | 2,180 | $0.0058 |
733+
> | 2 | 2,180 | 640 | 2,820 | $0.0076 |
734+
> | **Total** | | | **2,820** | **$0.0076** |
735+
>
736+
> ### Context Provenance
737+
> | Source | Path | Tokens |
738+
> |--------|------|--------|
739+
> | import | skills/reporting/SKILL.md | 1,240 |
740+
> | prompt | _(prompt.txt)_ | 520 |
741+
> ```
742+
>
743+
> **Implementation sketch:**
683744
>
684745
> ```typescript
685746
> export default function(pi: ExtensionAPI) {
747+
> let turnCount = 0;
748+
> let cumulativeTokens = 0;
749+
> let cumulativeCost = 0;
750+
> const provenanceLog: ProvenanceEntry[] = [];
751+
>
686752
> pi.on("agent_start", async (event) => {
687753
> emitJsonl({ event: "session_start", model: currentModel });
688754
> startOtelSpan("aw_session");
755+
> recordContextProvenance(provenanceLog); // records imports + prompt entries
689756
> });
690757
>
691-
> pi.on("turn_end", async (event) => {
758+
> pi.on("turn_end", async (event, ctx) => {
759+
> turnCount++;
760+
> cumulativeTokens += event.inputTokens + event.outputTokens;
761+
> cumulativeCost += event.costUsd ?? 0;
762+
> emitJsonl({
763+
> event: "turn_end",
764+
> turn: turnCount,
765+
> input_tokens: event.inputTokens,
766+
> output_tokens: event.outputTokens,
767+
> cumulative_tokens: cumulativeTokens,
768+
> cumulative_cost_usd: cumulativeCost,
769+
> model: currentModel,
770+
> ts: new Date().toISOString(),
771+
> });
772+
> // Human-readable per-turn line to stderr (markdown blockquote)
773+
> process.stderr.write(
774+
> `> **Turn ${turnCount}**: ${event.inputTokens} in / ${event.outputTokens} out ` +
775+
> `| cumulative ${cumulativeTokens.toLocaleString()} tokens ($${cumulativeCost.toFixed(4)})\n`
776+
> );
692777
> recordOtelAttributes(event);
693778
> });
694779
>
@@ -697,8 +782,10 @@ The following six extensions **MUST** be loaded into the `AgentSession` created
697782
> });
698783
>
699784
> pi.on("agent_end", async (event) => {
700-
> emitJsonl({ event: "session_end", tokens: event.tokens, cost: event.cost });
785+
> emitJsonl({ event: "session_end", tokens: cumulativeTokens, cost: cumulativeCost });
701786
> endOtelSpan("aw_session");
787+
> await writeContextProvenanceFile(provenanceLog);
788+
> await writeStepSummary({ turnCount, cumulativeTokens, cumulativeCost, provenanceLog });
702789
> });
703790
> }
704791
> ```
@@ -870,7 +957,12 @@ The following ordered work items describe the implementation sequence:
870957

871958
9. **Implement repair extension** — Pi extension that detects broken tool calls via `tool_result` events. Repairs via message truncation or summarize-and-restart.
872959

873-
10. **Implement observability extension** — Pi extension that emits JSONL to stderr on agent/tool events. Generates OTel spans using `observability.otlp` config.
960+
10. **Implement observability extension** — Pi extension that:
961+
- Emits JSONL to stderr on agent/tool events (§8.6.1).
962+
- Writes a context provenance file (`/tmp/gh-aw/context-provenance.jsonl`) on `agent_end` recording the source and token cost of every context entry (§8.6.2).
963+
- Appends a Markdown execution summary table (per-turn tokens + context provenance) to `$GITHUB_STEP_SUMMARY` when that env var is set (§8.6.3).
964+
- Emits a human-readable per-turn token consumption line to stderr after each `turn_end` (§8.6.4).
965+
- Generates OTel spans using `observability.otlp` config.
874966

875967
11. **Write tests** — Unit tests for loader, each extension (mock `ExtensionAPI`). Integration tests with `createAgentSession()` + `SessionManager.inMemory()`.
876968

@@ -902,6 +994,8 @@ The following ordered work items describe the implementation sequence:
902994

903995
**Telemetry scope.** When `observability.otlp` is configured, OTel spans contain model names, token counts, and cost data. They **SHOULD NOT** contain raw prompt or response text. Implementations **SHOULD** redact sensitive content from span attributes.
904996

997+
**Context provenance file.** The context provenance file (`/tmp/gh-aw/context-provenance.jsonl`) records the source path and token count of every context entry added to the session. It **MUST NOT** include raw prompt or response text; only metadata (source type, path, token counts) is recorded. Workflow authors **SHOULD** evaluate the sensitivity of file paths before enabling downstream analysis tools that read this file.
998+
905999
**Model provider data handling.** Prompt content is transmitted to the LLM provider using the credentials AWF injects into the container. Workflow authors are responsible for ensuring that content transmitted to LLM providers complies with applicable data handling policies.
9061000

9071001
---

0 commit comments

Comments
 (0)