Skip to content

Commit c17400b

Browse files
anj-sanj-s
andauthored
fix(core): enforce parallel task tracker updates (#24477)
Co-authored-by: anj-s <anjalisridhar@google.com>
1 parent 47bca39 commit c17400b

4 files changed

Lines changed: 56 additions & 10 deletions

File tree

evals/tracker.eval.ts

Lines changed: 50 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -62,11 +62,13 @@ describe('tracker_mode', () => {
6262
'Expected tracker_update_task tool to be called',
6363
).toBe(true);
6464

65-
const updateCall = toolLogs.find(
65+
const updateCalls = toolLogs.filter(
6666
(log) => log.toolRequest.name === TRACKER_UPDATE_TASK_TOOL_NAME,
6767
);
68-
expect(updateCall).toBeDefined();
69-
const updateArgs = JSON.parse(updateCall!.toolRequest.args);
68+
expect(updateCalls.length).toBeGreaterThan(0);
69+
const updateArgs = JSON.parse(
70+
updateCalls[updateCalls.length - 1].toolRequest.args,
71+
);
7072
expect(updateArgs.status).toBe('closed');
7173

7274
const loginContent = fs.readFileSync(
@@ -128,12 +130,52 @@ describe('tracker_mode', () => {
128130
prompt:
129131
'Where is my task tracker storage located? Please provide the absolute path in your response.',
130132
assert: async (rig, result) => {
131-
// The rig sets GEMINI_CLI_HOME to rig.homeDir
132-
const homeDir = rig.homeDir!;
133-
// The response should contain the dynamic path which includes the home directory
134-
// and follows the .gemini/tmp/.../tracker structure.
135-
expect(result).toContain(homeDir);
133+
// The response should contain the dynamic path which follows the .gemini/tmp/.../tracker structure.
136134
expect(result).toMatch(/\.gemini\/tmp\/.*\/tracker/);
137135
},
138136
});
137+
138+
evalTest('USUALLY_PASSES', {
139+
suiteName: 'default',
140+
suiteType: 'behavioral',
141+
name: 'should update the tracker in the same turn as the task completion to save turns',
142+
params: {
143+
settings: { experimental: { taskTracker: true } },
144+
},
145+
files: FILES,
146+
prompt:
147+
'We have a bug in src/login.js: the password check is missing. Fix this bug. Then, create a new file src/auth.js that exports a simple verifyToken function. Please organize this into tasks and execute them.',
148+
assert: async (rig, result) => {
149+
await rig.waitForToolCall(TRACKER_CREATE_TASK_TOOL_NAME);
150+
await rig.waitForToolCall(TRACKER_UPDATE_TASK_TOOL_NAME);
151+
152+
const toolLogs = rig.readToolLogs();
153+
154+
// Get the prompt ID of the fix for login.js
155+
const loginEditCalls = toolLogs.filter(
156+
(log) =>
157+
(log.toolRequest.name === 'replace' ||
158+
log.toolRequest.name === 'write_file') &&
159+
log.toolRequest.args.includes('login.js'),
160+
);
161+
162+
expect(loginEditCalls.length).toBeGreaterThan(0);
163+
const loginEditPromptId =
164+
loginEditCalls[loginEditCalls.length - 1].toolRequest.prompt_id;
165+
166+
// Verify there is an update to the tracker in the exact same turn
167+
const parallelTrackerUpdates = toolLogs.filter(
168+
(log) =>
169+
log.toolRequest.name === TRACKER_UPDATE_TASK_TOOL_NAME &&
170+
log.toolRequest.prompt_id === loginEditPromptId,
171+
);
172+
173+
expect(
174+
parallelTrackerUpdates.length,
175+
'Expected tracker_update_task to be called in the same turn as the login.js fix',
176+
).toBeGreaterThan(0);
177+
178+
assertModelHasOutput(result);
179+
},
180+
});
139181
});

packages/core/src/core/__snapshots__/prompts.test.ts.snap

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2965,6 +2965,7 @@ You are operating with a persistent file-based task tracking system located at \
29652965
6. **STATE OVER CHAT**: If the user says "I think we finished that," but the tool says it is 'pending', trust the tool--or verify explicitly before updating.
29662966
7. **DEPENDENCY MANAGEMENT**: Respect task topology. Never attempt to execute a task if its dependencies are not marked as 'closed'. If you are blocked, focus only on the leaf nodes of the task graph.
29672967
8. **DETAILED TASKS**: Ensure that the tasks created have highly detailed titles and descriptions. The description MUST provide significantly more specific details and technical context than the title.
2968+
9. **TURN EFFICIENCY**: Update the tracker immediately when a step is completed. Combine \`tracker_update_task\` calls with other tool calls in the same turn to save turns.
29682969

29692970
# Operational Guidelines
29702971

@@ -3151,6 +3152,7 @@ You are operating with a persistent file-based task tracking system located at \
31513152
6. **STATE OVER CHAT**: If the user says "I think we finished that," but the tool says it is 'pending', trust the tool--or verify explicitly before updating.
31523153
7. **DEPENDENCY MANAGEMENT**: Respect task topology. Never attempt to execute a task if its dependencies are not marked as 'closed'. If you are blocked, focus only on the leaf nodes of the task graph.
31533154
8. **DETAILED TASKS**: Ensure that the tasks created have highly detailed titles and descriptions. The description MUST provide significantly more specific details and technical context than the title.
3155+
9. **TURN EFFICIENCY**: Update the tracker immediately when a step is completed. Combine \`tracker_update_task\` calls with other tool calls in the same turn to save turns.
31543156

31553157
# Operational Guidelines
31563158

packages/core/src/prompts/snippets.legacy.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -510,7 +510,8 @@ You are operating with a persistent file-based task tracking system located at \
510510
5. **VERIFICATION**: Before marking a task as complete, verify the work is actually done (e.g., run the test, check the file existence).
511511
6. **STATE OVER CHAT**: If the user says "I think we finished that," but the tool says it is 'pending', trust the tool--or verify explicitly before updating.
512512
7. **DEPENDENCY MANAGEMENT**: Respect task topology. Never attempt to execute a task if its dependencies are not marked as 'closed'. If you are blocked, focus only on the leaf nodes of the task graph.
513-
8. **DETAILED TASKS**: Ensure that the tasks created have highly detailed titles and descriptions. The description MUST provide significantly more specific details and technical context than the title.`.trim();
513+
8. **DETAILED TASKS**: Ensure that the tasks created have highly detailed titles and descriptions. The description MUST provide significantly more specific details and technical context than the title.
514+
9. **TURN EFFICIENCY**: Update the tracker immediately when a step is completed. Combine \`${TRACKER_UPDATE_TASK_TOOL_NAME}\` calls with other tool calls in the same turn to save turns.`.trim();
514515
}
515516

516517
// --- Leaf Helpers (Strictly strings or simple calls) ---

packages/core/src/prompts/snippets.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -577,7 +577,8 @@ You are operating with a persistent file-based task tracking system located at \
577577
5. **VERIFICATION**: Before marking a task as complete, verify the work is actually done (e.g., run the test, check the file existence).
578578
6. **STATE OVER CHAT**: If the user says "I think we finished that," but the tool says it is 'pending', trust the tool--or verify explicitly before updating.
579579
7. **DEPENDENCY MANAGEMENT**: Respect task topology. Never attempt to execute a task if its dependencies are not marked as 'closed'. If you are blocked, focus only on the leaf nodes of the task graph.
580-
8. **DETAILED TASKS**: Ensure that the tasks created have highly detailed titles and descriptions. The description MUST provide significantly more specific details and technical context than the title.`.trim();
580+
8. **DETAILED TASKS**: Ensure that the tasks created have highly detailed titles and descriptions. The description MUST provide significantly more specific details and technical context than the title.
581+
9. **TURN EFFICIENCY**: Update the tracker immediately when a step is completed. Combine ${trackerUpdate} calls with other tool calls in the same turn to save turns.`.trim();
581582
}
582583

583584
export function renderPlanningWorkflow(

0 commit comments

Comments
 (0)