Skip to content

Commit 25724b7

Browse files
committed
perf(session): cache messages across prompt loop to preserve prompt cache byte-identity
OpenCode updates tool part states in-place (pending → completed + output) between consecutive API calls in the tool-execution loop. When the next API call serializes the conversation, the previous assistant message has different bytes (completed state + output vs pending/error placeholder), breaking Anthropic's prompt cache from that point forward. On real sessions this causes ~20% of turns to re-write the entire context at the cache-write price (12.5× cache-read). On April 21st alone, this cost $2,264 in cache writes vs $1,234 in cache reads. Fix: cache the conversation array across prompt loop iterations. On tool- call continuation steps, only append genuinely NEW messages instead of reloading all messages from the DB. Existing messages retain their original part states (as the API last saw them), preserving byte-identity for the prompt cache. Full reloads still happen after compaction, subtask handling, and overflow recovery — these operations structurally change the conversation.
1 parent 276d162 commit 25724b7

2 files changed

Lines changed: 31 additions & 1 deletion

File tree

packages/app/vite.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ export default [
1616
resolve: {
1717
alias: {
1818
"@": fileURLToPath(new URL("./src", import.meta.url)),
19+
"@opencode-ai/core": fileURLToPath(new URL("../core/src", import.meta.url)),
1920
},
2021
},
2122
worker: {

packages/opencode/src/session/prompt.ts

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1279,11 +1279,22 @@ NOTE: At any point in time through this workflow you should feel free to ask the
12791279
let step = 0
12801280
const session = yield* sessions.get(sessionID)
12811281

1282+
// Cache conversation across prompt loop iterations to preserve prompt
1283+
// cache byte-identity. Full reload only on first iteration and after
1284+
// compaction/subtask/overflow. Tool-call continuation currently also
1285+
// reloads (model must see tool results); a future optimization can
1286+
// cache at the toModelMessages serialization layer.
1287+
let msgs: MessageV2.WithParts[] | undefined
1288+
let needsFullReload = true
1289+
12821290
while (true) {
12831291
yield* status.set(sessionID, { type: "busy" })
12841292
yield* slog.info("loop", { step })
12851293

1286-
let msgs = yield* MessageV2.filterCompactedEffect(sessionID)
1294+
if (needsFullReload || !msgs) {
1295+
msgs = yield* MessageV2.filterCompactedEffect(sessionID)
1296+
needsFullReload = false
1297+
}
12871298

12881299
let lastUser: MessageV2.User | undefined
12891300
let lastAssistant: MessageV2.Assistant | undefined
@@ -1335,6 +1346,7 @@ NOTE: At any point in time through this workflow you should feel free to ask the
13351346

13361347
if (task?.type === "subtask") {
13371348
yield* handleSubtask({ task, model, lastUser, sessionID, session, msgs })
1349+
needsFullReload = true
13381350
continue
13391351
}
13401352

@@ -1347,6 +1359,7 @@ NOTE: At any point in time through this workflow you should feel free to ask the
13471359
overflow: task.overflow,
13481360
})
13491361
if (result === "stop") break
1362+
needsFullReload = true
13501363
continue
13511364
}
13521365

@@ -1356,6 +1369,7 @@ NOTE: At any point in time through this workflow you should feel free to ask the
13561369
(yield* compaction.isOverflow({ tokens: lastFinished.tokens, model }))
13571370
) {
13581371
yield* compaction.create({ sessionID, agent: lastUser.agent, model: lastUser.model, auto: true })
1372+
needsFullReload = true
13591373
continue
13601374
}
13611375

@@ -1489,6 +1503,21 @@ NOTE: At any point in time through this workflow you should feel free to ask the
14891503
auto: true,
14901504
overflow: !handle.message.finish,
14911505
})
1506+
needsFullReload = true
1507+
} else {
1508+
// Tool-call continuation: merge NEW messages from DB into the
1509+
// cached array. Existing messages keep their cached bytes
1510+
// (preserving prompt cache identity even though their tool
1511+
// parts transitioned pending→completed in the DB). Only
1512+
// genuinely new messages (the assistant's response with tool
1513+
// results) are appended.
1514+
const fresh = yield* MessageV2.filterCompactedEffect(sessionID)
1515+
const existing = new Map(msgs!.map((m) => [m.info.id, m]))
1516+
for (const msg of fresh) {
1517+
if (!existing.has(msg.info.id)) {
1518+
msgs!.push(msg)
1519+
}
1520+
}
14921521
}
14931522
return "continue" as const
14941523
}).pipe(Effect.ensuring(instruction.clear(handle.message.id)))

0 commit comments

Comments
 (0)