Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jan 7, 2026

Fixes non-UTF8 file decoding for native tool-calling reads by preferring VSCode’s document decoding (and falling back to the existing byte-based readers).

Closes #3555

·········································································

Test Files 2 passed (2)
Tests 73 passed (73)
Start at 08:28:19
Duration 759ms (transform 262ms, setup 82ms, collect 458ms, tests 189ms, environment 0ms, prepare 62ms)


Important

Enhance ReadFileTool to use VSCode's document decoding for non-UTF8 files and add readTextWithTokenBudget() utility.

  • Behavior:
    • ReadFileTool in ReadFileTool.ts now uses VSCode's document decoding for non-UTF8 files, falling back to byte-based readers if unavailable.
    • Introduces tryReadTextViaVscode() and sliceTextLines() functions for handling text reading and slicing.
    • Limits VSCode text reads to files under 2MB.
  • Utilities:
    • Adds readTextWithTokenBudget() in read-text-with-budget.ts for reading text with a token budget, mirroring readFileWithTokenBudget().
  • Tests:
    • Updates readFileTool.spec.ts to mock VSCode and test new VSCode decoding path.
    • Adds tests for readTextWithTokenBudget() behavior.

This description was created by Ellipsis for a1aae5d. You can customize this summary. It will automatically update as commits are pushed.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jan 7, 2026
@roomote
Copy link
Contributor

roomote bot commented Jan 7, 2026

Oroocle Clock   See task on Roo Cloud

Review completed. I left 1 inline suggestion to keep native VSCode-decoded reads consistent with the existing out-of-range <line_range> behavior.

  • Native VSCode-decoding path: handle out-of-range <line_range> the same way as readLines() (RangeError) rather than returning an empty slice.

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jan 7, 2026
Comment on lines +499 to +502
const rawRangeText =
useNative && maybeText !== undefined
? sliceTextLines(maybeText, range.start - 1, range.end - 1)
: await readLines(fullPath, range.end - 1, range.start - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When native reads use VSCode-decoded text, out-of-range <line_range> values no longer error (for example range.start beyond the file length). readLines() rejects in that case, but sliceTextLines() returns an empty string and addLineNumbers() can emit a misleading blank line like 1000 | .

Suggested change
const rawRangeText =
useNative && maybeText !== undefined
? sliceTextLines(maybeText, range.start - 1, range.end - 1)
: await readLines(fullPath, range.end - 1, range.start - 1)
if (useNative && maybeText !== undefined && range.start > totalLines) {
throw new RangeError("Line with index ${range.start - 1} does not exist in \"${fullPath}\". Note that line indexing is zero-based")
}
const rawRangeText =
useNative && maybeText !== undefined
? sliceTextLines(maybeText, range.start - 1, range.end - 1)
: await readLines(fullPath, range.end - 1, range.start - 1)

Fix it with Roo Code or mention @roomote and request a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

When the file encoding is GBK, reading the file will result in garbled characters

2 participants