Skip to content

Cranelift: update regalloc2 to 0.15.0 to permit more VRegs.#12611

Merged
cfallin merged 2 commits intobytecodealliance:mainfrom
cfallin:regalloc2-upgrade-vreg-bounds
Feb 18, 2026
Merged

Cranelift: update regalloc2 to 0.15.0 to permit more VRegs.#12611
cfallin merged 2 commits intobytecodealliance:mainfrom
cfallin:regalloc2-upgrade-vreg-bounds

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Feb 17, 2026

This pulls in bytecodealliance/regalloc2#257 to permit more VRegs to be used in a single function body, addressing #12229 and our followup discussions about supporting function body sizes up to the Wasm implementation limit standard.

In addition to the RA2 upgrade, this also includes a bit more explicit limit-checking on the Cranelift side: note that we don't directly use regalloc2::VReg but instead we further bitpack it into Reg, which is logically a sum type of VReg, PReg and SpillSlot (the last one needed to represent stack allocation locations on defs, e.g. on callsites with many returns). PRegs are packed into the beginning of the VReg index space but SpillSlots are distinguished by stealing the upper bit of a u32. This was previously not a problem given the smaller VReg index space but now we need to check explicitly; hence Reg::from_virtual_reg_checked and its use in the lowering vreg allocator. Because the VReg index packs the class into the bottom two bits, and index into the upper 30, but we steal one bit at the top, the true limit for VReg count is thus actually 2^29, or 512M.

Fixes #12229.

This pulls in bytecodealliance/regalloc2#257 to permit more VRegs to
be used in a single function body, addressing bytecodealliance#12229 and our followup
discussions about supporting function body sizes up to the Wasm
implementation limit standard.

In addition to the RA2 upgrade, this also includes a bit more explicit
limit-checking on the Cranelift side: note that we don't directly use
`regalloc2::VReg` but instead we further bitpack it into `Reg`, which
is logically a sum type of `VReg`, `PReg` and `SpillSlot` (the last
one needed to represent stack allocation locations on defs, e.g. on
callsites with many returns). `PReg`s are packed into the beginning of
the `VReg` index space but `SpillSlot`s are distinguished by stealing
the upper bit of a `u32`. This was previously not a problem given the
smaller `VReg` index space but now we need to check explicitly; hence
`Reg::from_virtual_reg_checked` and its use in the lowering vreg
allocator. Because the `VReg` index packs the class into the bottom
two bits, and index into the upper 30, but we steal one bit at the
top, the true limit for VReg count is thus actually 2^29, or
512M.

Fixes bytecodealliance#12229.
@cfallin cfallin requested review from a team as code owners February 17, 2026 22:26
@cfallin cfallin requested review from alexcrichton and removed request for a team February 17, 2026 22:26
@cfallin
Copy link
Member Author

cfallin commented Feb 17, 2026

The failing test (code_too_large) fails as expected because we now support the generated function body size. A few thoughts though:

  • In theory it should be impossible to write this test once we fully hit the goal of "compile any valid Wasm function that wasmparser's implementation-limit checks pass on";
  • But there's still the interesting question of whether there's a "gap" as we sweep the code size upward where we start to throw errors;
  • In attempting to check this by increasing the test's N, I found that our ValueDataPacked encoding gives 24 bits to value numbers, so I'm working on reclaiming some bits there. In particular the Type encoding seems pretty inefficient so I'll try to steal some of those bits back.

@alexcrichton
Copy link
Member

I think that test already takes forever to compile/run in debug mode on all platforms so jettisoning that test doesn't seem unreasonable to me, and agreed it in theory should not be possible to write. Maybe it's sufficient to have a local script to run or some online corpus of "must compile big modules" or something like that?

Either that or we could create a dedicated test job for "compile wasmtime in release on one fast platform and compile big modules" to make sure everything succeeds.

cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 18, 2026
This updates the `ValueDataPacked` scheme from the old

```
            (enum tag)    (CLIF type)    (value 1)     (value 2)
///        | tag:2 |  type:14        |    x:24       | y:24          |
```

encoding in a `u64` to a new

```
///        | tag:2 |  type:14        |    x:32       | y:32
|
```

encoding, with a `packed` tag attribute to ensure the struct fits in
10 bytes. This permits the full range of `Value` (a `u32` entity
index) to be encoded, removing the remaining major limit on function
body size after the work in bytecodealliance#12611 to address bytecodealliance#12229.

Curiously, this appears to be a *speedup* in compile time of 3-5% on
bz2 and 3% on spidermonkey-json (Sightglass, 50 data points each). My
best guess as to why is that putting the value fields in their own
`u32`s allows for quick access without shifts/masks, which is actually
better than the unaligned accesses (caused by 10-byte size) -- which
have no penalty on modern mainstream CPUs -- and 25% size inflation of
the value-definitions array.
@cfallin cfallin requested a review from a team as a code owner February 18, 2026 00:10
@cfallin cfallin requested review from fitzgen and removed request for a team and fitzgen February 18, 2026 00:10
@cfallin
Copy link
Member Author

cfallin commented Feb 18, 2026

Sure thing -- I'll push the "corpus of big inputs to test" as followup (after #12613 lands as well); removed the code_too_large test here.

@cfallin cfallin enabled auto-merge February 18, 2026 00:11
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 18, 2026
This updates the `ValueDataPacked` scheme from the old

```
            (enum tag)    (CLIF type)    (value 1)     (value 2)
///        | tag:2 |  type:14        |    x:24       | y:24
```

encoding in a `u64` to a new

```
///        | tag:2 |  type:14        |    x:32       | y:32
```

encoding, with a `packed` tag attribute to ensure the struct fits in
10 bytes. This permits the full range of `Value` (a `u32` entity
index) to be encoded, removing the remaining major limit on function
body size after the work in bytecodealliance#12611 to address bytecodealliance#12229.

Curiously, this appears to be a *speedup* in compile time of 3-5% on
bz2 and 3% on spidermonkey-json (Sightglass, 50 data points each). My
best guess as to why is that putting the value fields in their own
`u32`s allows for quick access without shifts/masks, which is actually
better than the unaligned accesses (caused by 10-byte size) -- which
have no penalty on modern mainstream CPUs -- and 25% size inflation of
the value-definitions array.
@cfallin cfallin added this pull request to the merge queue Feb 18, 2026
Merged via the queue into bytecodealliance:main with commit b5d2ff5 Feb 18, 2026
45 checks passed
@cfallin cfallin deleted the regalloc2-upgrade-vreg-bounds branch February 18, 2026 01:04
github-merge-queue bot pushed a commit that referenced this pull request Feb 18, 2026
This updates the `ValueDataPacked` scheme from the old

```
            (enum tag)    (CLIF type)    (value 1)     (value 2)
///        | tag:2 |  type:14        |    x:24       | y:24
```

encoding in a `u64` to a new

```
///        | tag:2 |  type:14        |    x:32       | y:32
```

encoding, with a `packed` tag attribute to ensure the struct fits in
10 bytes. This permits the full range of `Value` (a `u32` entity
index) to be encoded, removing the remaining major limit on function
body size after the work in #12611 to address #12229.

Curiously, this appears to be a *speedup* in compile time of 3-5% on
bz2 and 3% on spidermonkey-json (Sightglass, 50 data points each). My
best guess as to why is that putting the value fields in their own
`u32`s allows for quick access without shifts/masks, which is actually
better than the unaligned accesses (caused by 10-byte size) -- which
have no penalty on modern mainstream CPUs -- and 25% size inflation of
the value-definitions array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code for function is too large、High memory usage

2 participants

Comments