[HW] Add HWVectorization pass #9222

mafeguimaraes · 2025-11-10T17:58:00Z

This patch introduces the HWVectorization pass, which identifies bitwise patterns in hardware modules that can be represented as vectorized operations instead of per-bit logic.
The pass aims to simplify the IR by grouping related scalar bit operations (such as comb.extract and comb.concat) into higher-level vector constructs like comb.reverse, comb.replicate, or direct multi-bit comb.and, comb.or, and comb.xor.

The pass scans each hw.module and identifies groups of bit-level operations that can be merged into vector-level constructs. This version supports several key patterns based on bit-level dataflow analysis and structural analysis.

This patch was co-authored by @RosaUlisses.

Supported transformations include:

1. Linear concatenations (identity):

Pattern: Bits are extracted in ascending order (identity permutation) and concatenated.
Transformation: The entire comb.concat chain is replaced with the original input vector.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%concat = comb.concat %3, %2, %1, %0 : i1, i1, i1, i1
hw.output %concat : i4

// After
hw.output %in : i4

2. Bit reversal:

Pattern: Bits are extracted in descending (reverse) order and concatenated.
Transformation: The chain is replaced with a single comb.reverse.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%rev = comb.concat %0, %1, %2, %3 : i1, i1, i1, i1
hw.output %rev : i4

// After
%0 = comb.reverse %in : i4
hw.output %0 : i4

3. Structural Patterns (e.g., Vectorized Mux)

Pattern: Isomorphic, bit-parallel logic cones are detected. For example, a scalarized mux structure that uses a replicated i1 control signal for each bit.
Transformation: The replicated scalar operations are collapsed into equivalent vector-level operations (e.g., comb.replicate, comb.and, comb.xor, comb.or).

// Before (scalarized mux)
%sel_inv = comb.xor %sel, %true : i1
%and_a = comb.and %a, %sel : i1
%and_b = comb.and %b, %sel_inv : i1
%mux = comb.or %and_a, %and_b : i1
...
(repeated for each bit)

// After (vectorized mux)
%true = hw.constant true
%sel_vec = comb.replicate %sel : (i1) -> i4
%a_masked = comb.and %a, %sel_vec : i4
%sel_inv_vec = comb.xor %sel_vec, (comb.replicate %true) : i4
%b_masked = comb.and %b, %sel_inv_vec : i4
%mux = comb.or %a_masked, %b_masked : i4
hw.output %mux : i4

4. Partial Vectorization (Chunking):

Pattern: The pass identifies contiguous sub-ranges (chunks) that can be vectorized independently, even if the entire bus cannot be.
Transformation: The pass vectorizes the identifiable chunks (e.g., a linear chunk) and leaves the remaining scalar or structural logic as another chunk, then concatenates the chunks back together.

// Before (Mixed linear and structural patterns)
// out[3:1] = in[3:1] (linear)
// out[0]   = in[1] ^ in[0] (structural)
%in_3 = comb.extract %in from 3 : (i4) -> i1
%in_2 = comb.extract %in from 2 : (i4) -> i1
%in_1 = comb.extract %in from 1 : (i4) -> i1
// Logic for bit 0
%in_1_for_0 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%bit_0 = comb.xor %in_1_for_0, %in_0 : i1
// Final concatenation
%concat = comb.concat %in_3, %in_2, %in_1, %bit_0 : i1, i1, i1, i1
hw.output %concat : i4

// After (Partially vectorized)
// Chunk 1: [3:1] (vectorized)
%chunk_1 = comb.extract %in from 1 for 3 : (i4) -> i3
// Chunk 0: [0] (scalar logic)
%in_1 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%chunk_0 = comb.xor %in_1, %in_0 : i1
// Re-concat the vectorized chunks
%final = comb.concat %chunk_1, %chunk_0 : i3, i1
hw.output %final : i4

Patterns not transformed
The pass does not modify modules with cross-bit dependencies or non-linear control flows.
For example:

// cross-dependency example (should remain unchanged)
hw.module @cross_dependency(in %in : i2, out out : i2) {
  %0 = comb.extract %in from 0 : (i2) -> i1
  %1 = comb.extract %6 from 1 : (i2) -> i1
  %2 = comb.xor %0, %1 : i1
  %3 = comb.extract %in from 1 : (i2) -> i1
  %4 = comb.extract %6 from 0 : (i2) -> i1
  %5 = comb.xor %3, %4 : i1
  %6 = comb.concat %5, %2 : i1, i1
  hw.output %6 : i2
}

pronesto · 2025-12-12T00:44:12Z

Hi everyone, just a gentle ping on this PR. It has been open for a while, and I wanted to check whether there is anything we can do on our side to help move the review forward. Many thanks!

uenoku · 2025-11-22T01:18:17Z