Use string cache in UTF8ToString#26232
Conversation
|
WDYT, is this worth it? |
|
Impressive this is so easy to do! Any idea why it's just a 5% speedup? I'd expect more, even on such a moderately-sized string. |
I'm not sure.. maybe the JS->wasm boundary is costly enough here to effect the benchmark? |
bc10559 to
0c5c2c2
Compare
| var rtn = UTF8Decoder.decode({{{ getUnsharedTextDecoderView('HEAPU8', 'ptr', 'end') }}}); | ||
| #else | ||
| return ptr ? UTF8ArrayToString(HEAPU8, ptr, maxBytesToRead, ignoreNul) : ''; | ||
| var rtn = ptr ? UTF8ArrayToString(HEAPU8, ptr, maxBytesToRead, ignoreNul) : ''; |
There was a problem hiding this comment.
I wonder if it might be better to place this cache inside UTF8ArrayToString()? That way it would apply a bit wider.
There was a problem hiding this comment.
I don't think we can do this in UTF8ArrayToString because the array bring used might not be the wasm memory.
There was a problem hiding this comment.
Yeah, maybe it would be overly complex.. all entry points do flow through UTF8ToString already, so sgtm.
|
I gave this a test on JSON parsing a GLTF file, using this test GLTF file: https://github.com/KhronosGroup/glTF-Sample-Assets/blob/main/Models/NodePerformanceTest/README.md In the baseline before this PR, when I dump the number of times These are all from keywords in the GLTF JSON standard. In the baseline, in release -O3 build, parsing the GLTF file takes up 330 milliseconds (avg. of three runs) In this PR branch, all the above counts are reduced to just one call, as expected. The total parsing time is then 92 milliseconds (avg. of three runs), i.e. -72.12% runtime. So this optimization has a massive benefit for this usage. The JSON parsing API I am using here is a very rudimentary one that offloads all JSON parsing from C to JS: typedef int json_t;
bool json_has(json_t object, const char *member);
json_t json_get(json_t object, const char *member);
int json_get_int_or_default(json_t object, const char *member, int default_value);
double json_get_double_or_default(json_t object, const char *member, double default_value);
char *json_get_str(json_t object, const char *member);
int json_array_length(json_t object);
json_t json_get_nth(json_t object, int index);
int json_get_nth_int(json_t object, int index);
double json_get_nth_double(json_t object, int index);
char *json_get_nth_string(json_t object, int index);Thanks @sbc100 for working on this optimization, it really helps here. |
We can use linker-generated symbols to know the bounds on the `rodata`
region so we know which strings are safe to cache.
Speeds up this microbenchmark by ~5%:
```
EM_JS_DEPS(deps, "$UTF8ToString");
int main() {
for (int i = 0; i < 100000; i++) {
EM_ASM({
var Str = UTF8ToString($0);
if (Str != "hello world")
console.log("OOPS");
}, "hello world");
}
return 0;
}
```
Before:
Benchmark 1: node
Time (mean ± σ): 109.7 ms ± 3.0 ms [User: 89.0 ms, System: 23.6 ms]
Range (min … max): 104.4 ms … 116.7 ms 27 runs
After:
Benchmark 1: node
Time (mean ± σ): 104.7 ms ± 2.8 ms [User: 86.9 ms, System: 20.5 ms]
Range (min … max): 100.4 ms … 111.5 ms 27 runs
We can use linker-generated symbols to know the bounds on the
rodataregion so we know which strings are safe to cache. Since its slight code size regression we don't enable this for-Ozor-Os.Speeds up this microbenchmark by ~5%:
Before:
Benchmark 1: node
Time (mean ± σ): 109.7 ms ± 3.0 ms [User: 89.0 ms, System: 23.6 ms]
Range (min … max): 104.4 ms … 116.7 ms 27 runs
After:
Benchmark 1: node
Time (mean ± σ): 104.7 ms ± 2.8 ms [User: 86.9 ms, System: 20.5 ms]
Range (min … max): 100.4 ms … 111.5 ms 27 runs
See #25939 (reply in thread)