You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: specs/aw-harness.md
+100-6Lines changed: 100 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -239,6 +239,7 @@ A conforming implementation **MUST** exit with code `0` if and only if the agent
239
239
240
240
-**stdout**: Reserved for structured output (e.g., JSON summaries). A conforming implementation **SHOULD NOT** write diagnostic messages to stdout.
241
241
-**stderr**: All diagnostic messages, JSONL event stream, and debug output **MUST** be written to stderr.
242
+
-**GitHub Actions step summary** (`$GITHUB_STEP_SUMMARY`): The harness **MUST** write a Markdown-formatted execution summary to the file path indicated by the `GITHUB_STEP_SUMMARY` environment variable when that variable is set. The summary **MUST** be valid GitHub-flavored Markdown so that it renders correctly in the GitHub Actions step summary UI.
242
243
243
244
---
244
245
@@ -670,25 +671,109 @@ The following six extensions **MUST** be loaded into the `AgentSession` created
670
671
671
672
### 8.6 Extension 6: Observability
672
673
673
-
**Purpose:** Emits JSONL events to stderrand generates OTel spans.
674
+
**Purpose:** Emits structured event streams to stderr, writes a context provenance file for downstream analysis, renders a Markdown step summary, and reports per-turn token consumption.
674
675
675
676
**Requirements:**
676
677
678
+
#### 8.6.1 JSONL Event Stream
679
+
677
680
- The extension **MUST** subscribe to `agent_start`, `turn_end`, `tool_execution_end`, and `agent_end` events.
678
681
- On each event, the extension **MUST** emit a corresponding JSONL record to stderr.
679
-
- If `observability.otlp.endpoint` is configured in the workflow frontmatter, the extension **MUST** create and close OTel spans for each task.
682
+
- If `observability.otlp.endpoint` is configured in the workflow frontmatter, the extension **MUST** create and close OTel spans for the session.
680
683
- OTel span attributes **MUST** include at minimum: model, token counts, and cost.
681
684
682
-
> [!NOTE] Non-normative example.
685
+
#### 8.6.2 Context Provenance File
686
+
687
+
- The extension **MUST** produce a context provenance file at a well-known path (e.g., `/tmp/gh-aw/context-provenance.jsonl`) when the session completes.
688
+
- The file **MUST** contain one JSON record per context entry added to the session, in chronological order. Each record **MUST** include:
689
+
- `timestamp` (ISO 8601 string): When the entry was added.
690
+
- `source` (string): The declared origin of the text — one of `"prompt"` (from `prompt.txt`), `"import"` (from an `imports:` file, with `path` sub-field), or `"system"` (from `harness.system`).
691
+
- `path` (string, **OPTIONAL**): Repository-relative path for `"import"` entries.
692
+
- `tokens` (number): Estimated token count for this entry at the time it was added.
693
+
- `cumulative_tokens` (number): Running total of tokens in the context window at the time of this entry.
694
+
- `role` (string): The message role — `"user"`, `"assistant"`, or `"system"`.
695
+
- The purpose of this file is to allow downstream tools (e.g., `ghawaudit`) to perform deep analysis of context growth, identify which imports consumed the most token budget, and diagnose context-window pressure.
696
+
697
+
#### 8.6.3 GitHub Actions Step Summary
698
+
699
+
- When the `GITHUB_STEP_SUMMARY` environment variable is set, the extension **MUST** write a Markdown-formatted execution summary to the file at that path.
700
+
- The summary **MUST** be valid GitHub-flavored Markdown so that it renders correctly in the GitHub Actions step summary UI.
701
+
- The summary **MUST** include at minimum:
702
+
- A header identifying the workflow and model used.
703
+
- A table showing per-turn token consumption (input tokens, output tokens, cumulative total, and estimated cost).
704
+
- A final row with session totals (total tokens, total cost, elapsed time).
705
+
- A context provenance section listing each `imports:` file with its token contribution.
706
+
707
+
#### 8.6.4 Per-Turn Token Consumption Output
708
+
709
+
- The extension **MUST** subscribe to `turn_end` events and emit a human-readable token consumption line to stderr after each turn.
710
+
- The line **MUST** report: turn number, input tokens, output tokens, cumulative total tokens, and estimated cumulative cost.
711
+
- The line **MUST** be formatted as valid GitHub-flavored Markdown (e.g., using a `>` blockquote prefix) so that it renders correctly when appended to the step summary.
@@ -870,7 +957,12 @@ The following ordered work items describe the implementation sequence:
870
957
871
958
9.**Implement repair extension** — Pi extension that detects broken tool calls via `tool_result` events. Repairs via message truncation or summarize-and-restart.
872
959
873
-
10.**Implement observability extension** — Pi extension that emits JSONL to stderr on agent/tool events. Generates OTel spans using `observability.otlp` config.
960
+
10.**Implement observability extension** — Pi extension that:
961
+
- Emits JSONL to stderr on agent/tool events (§8.6.1).
962
+
- Writes a context provenance file (`/tmp/gh-aw/context-provenance.jsonl`) on `agent_end` recording the source and token cost of every context entry (§8.6.2).
963
+
- Appends a Markdown execution summary table (per-turn tokens + context provenance) to `$GITHUB_STEP_SUMMARY` when that env var is set (§8.6.3).
964
+
- Emits a human-readable per-turn token consumption line to stderr after each `turn_end` (§8.6.4).
965
+
- Generates OTel spans using `observability.otlp` config.
874
966
875
967
11.**Write tests** — Unit tests for loader, each extension (mock `ExtensionAPI`). Integration tests with `createAgentSession()` + `SessionManager.inMemory()`.
876
968
@@ -902,6 +994,8 @@ The following ordered work items describe the implementation sequence:
902
994
903
995
**Telemetry scope.** When `observability.otlp` is configured, OTel spans contain model names, token counts, and cost data. They **SHOULD NOT** contain raw prompt or response text. Implementations **SHOULD** redact sensitive content from span attributes.
904
996
997
+
**Context provenance file.** The context provenance file (`/tmp/gh-aw/context-provenance.jsonl`) records the source path and token count of every context entry added to the session. It **MUST NOT** include raw prompt or response text; only metadata (source type, path, token counts) is recorded. Workflow authors **SHOULD** evaluate the sensitivity of file paths before enabling downstream analysis tools that read this file.
998
+
905
999
**Model provider data handling.** Prompt content is transmitted to the LLM provider using the credentials AWF injects into the container. Workflow authors are responsible for ensuring that content transmitted to LLM providers complies with applicable data handling policies.
0 commit comments