Skip to content

Use string cache in UTF8ToString#26232

Open
sbc100 wants to merge 1 commit intoemscripten-core:mainfrom
sbc100:string_cache
Open

Use string cache in UTF8ToString#26232
sbc100 wants to merge 1 commit intoemscripten-core:mainfrom
sbc100:string_cache

Conversation

@sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Feb 9, 2026

We can use linker-generated symbols to know the bounds on the rodata region so we know which strings are safe to cache. Since its slight code size regression we don't enable this for -Oz or -Os.

Speeds up this microbenchmark by ~5%:

EM_JS_DEPS(deps, "$UTF8ToString");

int main() {
  for (int i = 0; i < 100000; i++) {
    EM_ASM({
      var Str = UTF8ToString($0);
      if (Str != "hello world")
        console.log("OOPS");
    }, "hello world");
  }
  return 0;
}

Before:

Benchmark 1: node
Time (mean ± σ): 109.7 ms ± 3.0 ms [User: 89.0 ms, System: 23.6 ms]
Range (min … max): 104.4 ms … 116.7 ms 27 runs

After:

Benchmark 1: node
Time (mean ± σ): 104.7 ms ± 2.8 ms [User: 86.9 ms, System: 20.5 ms]
Range (min … max): 100.4 ms … 111.5 ms 27 runs

See #25939 (reply in thread)

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 9, 2026

WDYT, is this worth it?

@sbc100 sbc100 requested a review from juj February 9, 2026 22:18
@kripken
Copy link
Member

kripken commented Feb 10, 2026

Impressive this is so easy to do!

Any idea why it's just a 5% speedup? I'd expect more, even on such a moderately-sized string.

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 10, 2026

Impressive this is so easy to do!

Any idea why it's just a 5% speedup? I'd expect more, even on such a moderately-sized string.

I'm not sure.. maybe the JS->wasm boundary is costly enough here to effect the benchmark?

@sbc100 sbc100 force-pushed the string_cache branch 3 times, most recently from bc10559 to 0c5c2c2 Compare February 10, 2026 20:05
var rtn = UTF8Decoder.decode({{{ getUnsharedTextDecoderView('HEAPU8', 'ptr', 'end') }}});
#else
return ptr ? UTF8ArrayToString(HEAPU8, ptr, maxBytesToRead, ignoreNul) : '';
var rtn = ptr ? UTF8ArrayToString(HEAPU8, ptr, maxBytesToRead, ignoreNul) : '';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it might be better to place this cache inside UTF8ArrayToString()? That way it would apply a bit wider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do this in UTF8ArrayToString because the array bring used might not be the wasm memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe it would be overly complex.. all entry points do flow through UTF8ToString already, so sgtm.

@juj
Copy link
Collaborator

juj commented Feb 11, 2026

I gave this a test on JSON parsing a GLTF file, using this test GLTF file: https://github.com/KhronosGroup/glTF-Sample-Assets/blob/main/Models/NodePerformanceTest/README.md

In the baseline before this PR, when I dump the number of times UTF8ToString(x) is being called for a string x throughout the parse, I get the following data:

"rgba": 712
"byteLength": 120003
"target": 39993
"buffer": 239979
"extensions": 40005
"source": 30000
"sampler": 10000
"doubleSided": 10000
"pbrMetallicRoughness": 10000
"metallicFactor": 10000
"roughnessFactor": 10000
"baseColorFactor": 10000
"baseColorTexture": 10000
"index": 30000
"texCoord": 10000
"metallicRoughnessTexture": 10000
"normalTexture": 10000
"emissiveTexture": 10000
"occlusionTexture": 10000
"emissiveFactor": 10000
"alphaCutoff": 10000
"primitives": 10000
"name": 40004
"mode": 10000
"indices": 40000
"bufferView": 120000
"componentType": 120000
"count": 120000
"byteOffset": 80000
"attributes": 10000
"POSITION": 70000
"normalized": 30000
"byteStride": 30000
"material": 30000
"NORMAL": 40000
"TANGENT": 10000
"TEXCOORD_0": 40000
"TEXCOORD_1": 10000
"COLOR_0": 10000
"JOINTS_0": 10000
"WEIGHTS_0": 10000
"min": 10000
"max": 10000
"children": 10002
"matrix": 10002
"translation": 10002
"rotation": 10002
"scale": 10002
"camera": 10005
"skin": 10002
"mesh": 40002

These are all from keywords in the GLTF JSON standard.

In the baseline, in release -O3 build, parsing the GLTF file takes up 330 milliseconds (avg. of three runs)

In this PR branch, all the above counts are reduced to just one call, as expected. The total parsing time is then 92 milliseconds (avg. of three runs), i.e. -72.12% runtime.

So this optimization has a massive benefit for this usage.

The JSON parsing API I am using here is a very rudimentary one that offloads all JSON parsing from C to JS:

typedef int json_t;
bool json_has(json_t object, const char *member);
json_t json_get(json_t object, const char *member);
int json_get_int_or_default(json_t object, const char *member, int default_value);
double json_get_double_or_default(json_t object, const char *member, double default_value);
char *json_get_str(json_t object, const char *member);
int json_array_length(json_t object);
json_t json_get_nth(json_t object, int index);
int json_get_nth_int(json_t object, int index);
double json_get_nth_double(json_t object, int index);
char *json_get_nth_string(json_t object, int index);

Thanks @sbc100 for working on this optimization, it really helps here.

We can use linker-generated symbols to know the bounds on the `rodata`
region so we know which strings are safe to cache.

Speeds up this microbenchmark by ~5%:

```

EM_JS_DEPS(deps, "$UTF8ToString");

int main() {
  for (int i = 0; i < 100000; i++) {
    EM_ASM({
      var Str = UTF8ToString($0);
      if (Str != "hello world")
        console.log("OOPS");
    }, "hello world");
  }
  return 0;
}
```

Before:

Benchmark 1: node
  Time (mean ± σ):     109.7 ms ±   3.0 ms    [User: 89.0 ms, System: 23.6 ms]
  Range (min … max):   104.4 ms … 116.7 ms    27 runs

After:

Benchmark 1: node
  Time (mean ± σ):     104.7 ms ±   2.8 ms    [User: 86.9 ms, System: 20.5 ms]
  Range (min … max):   100.4 ms … 111.5 ms    27 runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants