Use string cache in UTF8ToString by sbc100 · Pull Request #26232 · emscripten-core/emscripten

sbc100 · 2026-02-09T22:17:19Z

We can use linker-generated symbols to know the bounds on the rodata region so we know which strings are safe to cache. Since its slight code size regression we don't enable this for -Oz or -Os.

Speeds up this microbenchmark by ~5%:

EM_JS_DEPS(deps, "$UTF8ToString");

int main() {
  for (int i = 0; i < 100000; i++) {
    EM_ASM({
      var Str = UTF8ToString($0);
      if (Str != "hello world")
        console.log("OOPS");
    }, "hello world");
  }
  return 0;
}

Before:

Benchmark 1: node
Time (mean ± σ): 109.7 ms ± 3.0 ms [User: 89.0 ms, System: 23.6 ms]
Range (min … max): 104.4 ms … 116.7 ms 27 runs

After:

Benchmark 1: node
Time (mean ± σ): 104.7 ms ± 2.8 ms [User: 86.9 ms, System: 20.5 ms]
Range (min … max): 100.4 ms … 111.5 ms 27 runs

See #25939 (reply in thread)

sbc100 · 2026-02-09T22:17:58Z

WDYT, is this worth it?

kripken · 2026-02-10T00:03:39Z

Impressive this is so easy to do!

Any idea why it's just a 5% speedup? I'd expect more, even on such a moderately-sized string.

sbc100 · 2026-02-10T00:57:57Z

Impressive this is so easy to do!

Any idea why it's just a 5% speedup? I'd expect more, even on such a moderately-sized string.

I'm not sure.. maybe the JS->wasm boundary is costly enough here to effect the benchmark?

src/lib/libstrings.js

juj · 2026-02-11T15:33:00Z

src/lib/libstrings.js

+    var rtn = UTF8Decoder.decode({{{ getUnsharedTextDecoderView('HEAPU8', 'ptr', 'end') }}});
 #else
-    return ptr ? UTF8ArrayToString(HEAPU8, ptr, maxBytesToRead, ignoreNul) : '';
+    var rtn = ptr ? UTF8ArrayToString(HEAPU8, ptr, maxBytesToRead, ignoreNul) : '';


I wonder if it might be better to place this cache inside UTF8ArrayToString()? That way it would apply a bit wider.

I don't think we can do this in UTF8ArrayToString because the array bring used might not be the wasm memory.

Yeah, maybe it would be overly complex.. all entry points do flow through UTF8ToString already, so sgtm.

juj · 2026-02-11T15:42:42Z

I gave this a test on JSON parsing a GLTF file, using this test GLTF file: https://github.com/KhronosGroup/glTF-Sample-Assets/blob/main/Models/NodePerformanceTest/README.md

In the baseline before this PR, when I dump the number of times UTF8ToString(x) is being called for a string x throughout the parse, I get the following data:

"rgba": 712
"byteLength": 120003
"target": 39993
"buffer": 239979
"extensions": 40005
"source": 30000
"sampler": 10000
"doubleSided": 10000
"pbrMetallicRoughness": 10000
"metallicFactor": 10000
"roughnessFactor": 10000
"baseColorFactor": 10000
"baseColorTexture": 10000
"index": 30000
"texCoord": 10000
"metallicRoughnessTexture": 10000
"normalTexture": 10000
"emissiveTexture": 10000
"occlusionTexture": 10000
"emissiveFactor": 10000
"alphaCutoff": 10000
"primitives": 10000
"name": 40004
"mode": 10000
"indices": 40000
"bufferView": 120000
"componentType": 120000
"count": 120000
"byteOffset": 80000
"attributes": 10000
"POSITION": 70000
"normalized": 30000
"byteStride": 30000
"material": 30000
"NORMAL": 40000
"TANGENT": 10000
"TEXCOORD_0": 40000
"TEXCOORD_1": 10000
"COLOR_0": 10000
"JOINTS_0": 10000
"WEIGHTS_0": 10000
"min": 10000
"max": 10000
"children": 10002
"matrix": 10002
"translation": 10002
"rotation": 10002
"scale": 10002
"camera": 10005
"skin": 10002
"mesh": 40002

These are all from keywords in the GLTF JSON standard.

In the baseline, in release -O3 build, parsing the GLTF file takes up 330 milliseconds (avg. of three runs)

In this PR branch, all the above counts are reduced to just one call, as expected. The total parsing time is then 92 milliseconds (avg. of three runs), i.e. -72.12% runtime.

So this optimization has a massive benefit for this usage.

The JSON parsing API I am using here is a very rudimentary one that offloads all JSON parsing from C to JS:

typedef int json_t;
bool json_has(json_t object, const char *member);
json_t json_get(json_t object, const char *member);
int json_get_int_or_default(json_t object, const char *member, int default_value);
double json_get_double_or_default(json_t object, const char *member, double default_value);
char *json_get_str(json_t object, const char *member);
int json_array_length(json_t object);
json_t json_get_nth(json_t object, int index);
int json_get_nth_int(json_t object, int index);
double json_get_nth_double(json_t object, int index);
char *json_get_nth_string(json_t object, int index);

Thanks @sbc100 for working on this optimization, it really helps here.

We can use linker-generated symbols to know the bounds on the `rodata` region so we know which strings are safe to cache. Speeds up this microbenchmark by ~5%: ``` EM_JS_DEPS(deps, "$UTF8ToString"); int main() { for (int i = 0; i < 100000; i++) { EM_ASM({ var Str = UTF8ToString($0); if (Str != "hello world") console.log("OOPS"); }, "hello world"); } return 0; } ``` Before: Benchmark 1: node Time (mean ± σ): 109.7 ms ± 3.0 ms [User: 89.0 ms, System: 23.6 ms] Range (min … max): 104.4 ms … 116.7 ms 27 runs After: Benchmark 1: node Time (mean ± σ): 104.7 ms ± 2.8 ms [User: 86.9 ms, System: 20.5 ms] Range (min … max): 100.4 ms … 111.5 ms 27 runs

sbc100 requested review from brendandahl, dschuff and kripken February 9, 2026 22:17

sbc100 requested a review from juj February 9, 2026 22:18

sbc100 force-pushed the string_cache branch 3 times, most recently from bc10559 to 0c5c2c2 Compare February 10, 2026 20:05

juj reviewed Feb 11, 2026

View reviewed changes

src/lib/libstrings.js Outdated Show resolved Hide resolved

juj reviewed Feb 11, 2026

View reviewed changes

sbc100 force-pushed the string_cache branch from 0c5c2c2 to 8541895 Compare February 11, 2026 17:36

juj approved these changes Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use string cache in UTF8ToString#26232

Use string cache in UTF8ToString#26232
sbc100 wants to merge 1 commit intoemscripten-core:mainfrom
sbc100:string_cache

sbc100 commented Feb 9, 2026 •

edited

Loading

Uh oh!

sbc100 commented Feb 9, 2026

Uh oh!

kripken commented Feb 10, 2026

Uh oh!

sbc100 commented Feb 10, 2026

Uh oh!

Uh oh!

juj Feb 11, 2026

Uh oh!

sbc100 Feb 11, 2026

Uh oh!

juj Feb 11, 2026

Uh oh!

juj commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sbc100 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 commented Feb 9, 2026

Uh oh!

kripken commented Feb 10, 2026

Uh oh!

sbc100 commented Feb 10, 2026

Uh oh!

Uh oh!

juj Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

sbc100 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

juj Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

juj commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbc100 commented Feb 9, 2026 •

edited

Loading