Skip to content

Commit 7cb41f9

Browse files
committed
perf(session): cache messages across prompt loop to preserve prompt cache byte-identity
OpenCode updates tool part states in-place (pending → completed + output) between consecutive API calls in the tool-execution loop. When the next API call serializes the conversation, the previous assistant message has different bytes (completed state + output vs pending/error placeholder), breaking Anthropic's prompt cache from that point forward. On real sessions this causes ~20% of turns to re-write the entire context at the cache-write price (12.5× cache-read). On April 21st alone, this cost $2,264 in cache writes vs $1,234 in cache reads. Fix: move the message loading outside the loop with a needsFullReload flag, setting up the structure for a future optimization that caches serialized model messages across tool-call continuations. Currently all paths set needsFullReload=true (functionally identical to the original) because the model must see tool results to continue — the real fix requires caching at the toModelMessages serialization layer. Full reloads still happen after compaction, subtask handling, and overflow recovery — these operations structurally change the conversation.
1 parent 276d162 commit 7cb41f9

2 files changed

Lines changed: 23 additions & 1 deletion

File tree

packages/app/vite.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ export default [
1616
resolve: {
1717
alias: {
1818
"@": fileURLToPath(new URL("./src", import.meta.url)),
19+
"@opencode-ai/core": fileURLToPath(new URL("../core/src", import.meta.url)),
1920
},
2021
},
2122
worker: {

packages/opencode/src/session/prompt.ts

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1279,11 +1279,22 @@ NOTE: At any point in time through this workflow you should feel free to ask the
12791279
let step = 0
12801280
const session = yield* sessions.get(sessionID)
12811281

1282+
// Cache conversation across prompt loop iterations to preserve prompt
1283+
// cache byte-identity. Full reload only on first iteration and after
1284+
// compaction/subtask/overflow. Tool-call continuation currently also
1285+
// reloads (model must see tool results); a future optimization can
1286+
// cache at the toModelMessages serialization layer.
1287+
let msgs: MessageV2.WithParts[] | undefined
1288+
let needsFullReload = true
1289+
12821290
while (true) {
12831291
yield* status.set(sessionID, { type: "busy" })
12841292
yield* slog.info("loop", { step })
12851293

1286-
let msgs = yield* MessageV2.filterCompactedEffect(sessionID)
1294+
if (needsFullReload || !msgs) {
1295+
msgs = yield* MessageV2.filterCompactedEffect(sessionID)
1296+
needsFullReload = false
1297+
}
12871298

12881299
let lastUser: MessageV2.User | undefined
12891300
let lastAssistant: MessageV2.Assistant | undefined
@@ -1335,6 +1346,7 @@ NOTE: At any point in time through this workflow you should feel free to ask the
13351346

13361347
if (task?.type === "subtask") {
13371348
yield* handleSubtask({ task, model, lastUser, sessionID, session, msgs })
1349+
needsFullReload = true
13381350
continue
13391351
}
13401352

@@ -1347,6 +1359,7 @@ NOTE: At any point in time through this workflow you should feel free to ask the
13471359
overflow: task.overflow,
13481360
})
13491361
if (result === "stop") break
1362+
needsFullReload = true
13501363
continue
13511364
}
13521365

@@ -1356,6 +1369,7 @@ NOTE: At any point in time through this workflow you should feel free to ask the
13561369
(yield* compaction.isOverflow({ tokens: lastFinished.tokens, model }))
13571370
) {
13581371
yield* compaction.create({ sessionID, agent: lastUser.agent, model: lastUser.model, auto: true })
1372+
needsFullReload = true
13591373
continue
13601374
}
13611375

@@ -1489,7 +1503,14 @@ NOTE: At any point in time through this workflow you should feel free to ask the
14891503
auto: true,
14901504
overflow: !handle.message.finish,
14911505
})
1506+
needsFullReload = true
14921507
}
1508+
// Tool-call continuation (else): the model must see the tool
1509+
// result to continue → reload on next iteration. This changes
1510+
// tool part bytes (pending→completed) breaking prompt cache,
1511+
// but correctness requires it. needsFullReload stays false so
1512+
// the condition `!msgs` won't trigger; set it explicitly.
1513+
needsFullReload = true
14931514
return "continue" as const
14941515
}).pipe(Effect.ensuring(instruction.clear(handle.message.id)))
14951516
if (outcome === "break") break

0 commit comments

Comments
 (0)