fix(session): cache messages across prompt loop to preserve prompt cache byte-identity#24842
fix(session): cache messages across prompt loop to preserve prompt cache byte-identity#24842BYK wants to merge 1 commit intoanomalyco:devfrom
Conversation
|
Hey! Your PR title Please update it to start with one of:
Where See CONTRIBUTING.md for details. |
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, here are the potentially related PRs:
The primary PR #24842 appears to be novel in its specific approach of caching messages across prompt loop iterations. The related issues mentioned (#24841, #20110, #20565) suggest this is addressing a known performance problem, but no other open PRs are directly duplicating this work. No duplicate PRs found |
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
d5baeca to
7cb41f9
Compare
…ache byte-identity OpenCode updates tool part states in-place (pending → completed + output) between consecutive API calls in the tool-execution loop. When the next API call serializes the conversation, the previous assistant message has different bytes (completed state + output vs pending/error placeholder), breaking Anthropic's prompt cache from that point forward. On real sessions this causes ~20% of turns to re-write the entire context at the cache-write price (12.5× cache-read). On April 21st alone, this cost $2,264 in cache writes vs $1,234 in cache reads. Fix: cache the conversation array across prompt loop iterations. On tool- call continuation steps, only append genuinely NEW messages instead of reloading all messages from the DB. Existing messages retain their original part states (as the API last saw them), preserving byte-identity for the prompt cache. Full reloads still happen after compaction, subtask handling, and overflow recovery — these operations structurally change the conversation.
7cb41f9 to
25724b7
Compare
Issue for this PR
Closes #24841. Related: #20110, #20565, #14743.
Type of change
What does this PR do?
filterCompactedEffect(sessionID)reloads ALL messages from the DB at the start of every prompt loop iteration. Between tool-call steps, tool parts transition frompending→completedwith output text.toModelMessages()serializes these states differently:state: "output-error", errorText: "[Tool execution was interrupted]"state: "output-available", output: <actual text>Anthropic's prompt cache matches on byte-identity. The changed bytes at that message position invalidate the cache from there forward — the entire remaining context becomes a cache WRITE at $6.25/MTok (12.5× the cache-read price of $0.50/MTok for Opus).
The fix: move
filterCompactedEffect()above the loop and cache the result. On tool-call continuation, reload from DB but only append messages with genuinely new IDs — existing messages retain their original serialized state as the API last saw them. Full reloads still happen after compaction, subtask handling, and overflow recovery since those structurally change the conversation.I understand why this works: the key insight is that the only message whose tool parts change between API calls is the most recent assistant message (the one whose tool just executed). All prior messages were already
completedwhen the previous API call sent them. By not re-reading that one message from the DB, its serialized form stays byte-identical with what Anthropic cached.Cost data from real sessions (Opus 4.7, 1M context, April 21st):
cache_read=614K, cache_write=1Kcache_read=54K, cache_write=560K(only system prompt survives)How did you verify your code works?
Analyzed cache patterns from the OpenCode DB across multiple sessions. Verified the root cause by correlating bust events with tool-call timing — 95% of rapid busts (<60s) are preceded by a tool-bearing message, and the exact cache_read/write pattern matches the pending→completed byte change. The fix preserves the message array identity across tool-call steps while still correctly appending new messages.
Checklist