Skip to content

Conversation

@Tarquinen
Copy link
Collaborator

Summary

  • Refactors token calculation in /dcp context to minimize tokenizer estimation
  • Uses API-reported values where possible, calculates residuals for the rest
  • Batches tokenizer calls for efficiency (4 calls instead of N)

Changes

Token calculation strategy:

  • System: firstAssistant.input + cache.read - tokenizer(firstUser)
  • User: tokenizer(all user messages) - batched into single call
  • Tools: tokenizer(inputs + outputs) - pruned - batched into 2 calls
  • Assistant: total - system - user - tools (residual)

Why assistant is the residual:
If reasoning tokens persist in context (model-dependent behavior varies by provider), they semantically belong with "Assistant" since reasoning IS assistant-generated content.

Removed:

  • reasoning category from breakdown (absorbed into assistant residual)
  • Per-message tokenizer calls (now batched)

Documentation

Added detailed comment block at top of context.ts explaining the full calculation strategy.

- Use API-reported values instead of tokenizer estimation where possible
- Remove reasoning category (absorbed into assistant as residual)
- Batch tokenizer calls for efficiency (4 calls instead of N)
- System: derived from first assistant input - first user tokens
- User: tokenized (batched)
- Tools: tokenized (batched, needed for pruning anyway)
- Assistant: calculated as residual (total - system - user - tools)
- Add detailed documentation explaining the calculation strategy
@Tarquinen Tarquinen merged commit 17bc214 into dev Jan 20, 2026
1 check passed
@Tarquinen Tarquinen deleted the feature/context-token-calculation-optimization branch January 20, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants