Skip to content

Conversation

@mafeguimaraes
Copy link
Contributor

This patch introduces the HWVectorization pass, which identifies bitwise patterns in hardware modules that can be represented as vectorized operations instead of per-bit logic.
The pass aims to simplify the IR by grouping related scalar bit operations (such as comb.extract and comb.concat) into higher-level vector constructs like comb.reverse, comb.replicate, or direct multi-bit comb.and, comb.or, and comb.xor.

The pass scans each hw.module and identifies groups of bit-level operations that can be merged into vector-level constructs. This version supports several key patterns based on bit-level dataflow analysis and structural analysis.

This patch was co-authored by @RosaUlisses.

Supported transformations include:

1. Linear concatenations (identity):

  • Pattern: Bits are extracted in ascending order (identity permutation) and concatenated.

  • Transformation: The entire comb.concat chain is replaced with the original input vector.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%concat = comb.concat %3, %2, %1, %0 : i1, i1, i1, i1
hw.output %concat : i4

// After
hw.output %in : i4

2. Bit reversal:

  • Pattern: Bits are extracted in descending (reverse) order and concatenated.

  • Transformation: The chain is replaced with a single comb.reverse.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%rev = comb.concat %0, %1, %2, %3 : i1, i1, i1, i1
hw.output %rev : i4

// After
%0 = comb.reverse %in : i4
hw.output %0 : i4

3. Structural Patterns (e.g., Vectorized Mux)

  • Pattern: Isomorphic, bit-parallel logic cones are detected. For example, a scalarized mux structure that uses a replicated i1 control signal for each bit.

  • Transformation: The replicated scalar operations are collapsed into equivalent vector-level operations (e.g., comb.replicate, comb.and, comb.xor, comb.or).

// Before (scalarized mux)
%sel_inv = comb.xor %sel, %true : i1
%and_a = comb.and %a, %sel : i1
%and_b = comb.and %b, %sel_inv : i1
%mux = comb.or %and_a, %and_b : i1
...
(repeated for each bit)

// After (vectorized mux)
%true = hw.constant true
%sel_vec = comb.replicate %sel : (i1) -> i4
%a_masked = comb.and %a, %sel_vec : i4
%sel_inv_vec = comb.xor %sel_vec, (comb.replicate %true) : i4
%b_masked = comb.and %b, %sel_inv_vec : i4
%mux = comb.or %a_masked, %b_masked : i4
hw.output %mux : i4

4. Partial Vectorization (Chunking):

  • Pattern: The pass identifies contiguous sub-ranges (chunks) that can be vectorized independently, even if the entire bus cannot be.

  • Transformation: The pass vectorizes the identifiable chunks (e.g., a linear chunk) and leaves the remaining scalar or structural logic as another chunk, then concatenates the chunks back together.

// Before (Mixed linear and structural patterns)
// out[3:1] = in[3:1] (linear)
// out[0]   = in[1] ^ in[0] (structural)
%in_3 = comb.extract %in from 3 : (i4) -> i1
%in_2 = comb.extract %in from 2 : (i4) -> i1
%in_1 = comb.extract %in from 1 : (i4) -> i1
// Logic for bit 0
%in_1_for_0 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%bit_0 = comb.xor %in_1_for_0, %in_0 : i1
// Final concatenation
%concat = comb.concat %in_3, %in_2, %in_1, %bit_0 : i1, i1, i1, i1
hw.output %concat : i4

// After (Partially vectorized)
// Chunk 1: [3:1] (vectorized)
%chunk_1 = comb.extract %in from 1 for 3 : (i4) -> i3
// Chunk 0: [0] (scalar logic)
%in_1 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%chunk_0 = comb.xor %in_1, %in_0 : i1
// Re-concat the vectorized chunks
%final = comb.concat %chunk_1, %chunk_0 : i3, i1
hw.output %final : i4

Patterns not transformed
The pass does not modify modules with cross-bit dependencies or non-linear control flows.
For example:

// cross-dependency example (should remain unchanged)
hw.module @cross_dependency(in %in : i2, out out : i2) {
  %0 = comb.extract %in from 0 : (i2) -> i1
  %1 = comb.extract %6 from 1 : (i2) -> i1
  %2 = comb.xor %0, %1 : i1
  %3 = comb.extract %in from 1 : (i2) -> i1
  %4 = comb.extract %6 from 0 : (i2) -> i1
  %5 = comb.xor %3, %4 : i1
  %6 = comb.concat %5, %2 : i1, i1
  hw.output %6 : i2
}

@mafeguimaraes mafeguimaraes force-pushed the feature/hw-vectorization-pass branch 3 times, most recently from 59a27df to 7dfbad0 Compare November 10, 2025 19:23
@pronesto
Copy link

Hi everyone, just a gentle ping on this PR. It has been open for a while, and I wanted to check whether there is anything we can do on our side to help move the review forward. Many thanks!

bit &operator=(const bit &other);
bool operator==(const bit &other) const;

bool left_adjacent(const bit &other);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please use camelBack (as noted in MLIR style guide https://mlir.llvm.org/getting_started/DeveloperGuide/#style-guide)


Block &block = module.getBody().front();
auto outputOp = dyn_cast<hw::OutputOp>(block.getTerminator());
if (!outputOp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-statement is not necessary as hw::OutputOp is guaranteed by a verifier.


bool containsLLHD = false;
module.walk([&](mlir::Operation *op) {
if (op->getDialect()->getNamespace() == "llhd") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this gives up when there is llhd?

Comment on lines 816 to 875
} else if (auto andOp = dyn_cast<comb::AndOp>(op)) {
Value lhs = andOp.getInputs()[0];
Value rhs = andOp.getInputs()[1];
if (isa_and_nonnull<hw::ConstantOp>(rhs.getDefiningOp()))
return findBitSource(lhs, bitIndex, depth + 1);
if (isa_and_nonnull<hw::ConstantOp>(lhs.getDefiningOp()))
return findBitSource(rhs, bitIndex, depth + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following the code correctly but these parts seem not correct. Is it necessary to check the value of the constant? Also can't we simply treat and/or/xor as the source op? here?

Comment on lines 828 to 848
bool vectorizer::cleanup_dead_ops(Block &block) {
bool overallChanged = false;
bool changedInIteration = true;
while (changedInIteration) {
changedInIteration = false;
llvm::SmallVector<Operation *, 16> deadOps;
for (Operation &op : block) {
if (op.use_empty() && !op.hasTrait<mlir::OpTrait::IsTerminator>()) {
deadOps.push_back(&op);
}
}
if (!deadOps.empty()) {
changedInIteration = true;
overallChanged = true;
for (Operation *op : deadOps) {
op->erase();
}
}
}
return overallChanged;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use https://github.com/llvm/llvm-project/blob/db557bee1e2c128e77805deb86c1f364b5c29e70/mlir/lib/Transforms/Utils/RegionUtils.cpp#L495? There are few issues around side-effecting op and O(N^2) fixpoint iterations here so would be nice to simply use a library function.

Comment on lines 188 to 195
llvm::DenseSet<mlir::Value> sources;
for (const auto &[_, bit] : bits) {
if (!sources.contains(bit.source))
sources.insert(bit.source);
if (sources.size() >= 2)
return false;
}
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: using DenseSet is certainly overkill here, e.g.:

Suggested change
llvm::DenseSet<mlir::Value> sources;
for (const auto &[_, bit] : bits) {
if (!sources.contains(bit.source))
sources.insert(bit.source);
if (sources.size() >= 2)
return false;
}
return true;
mlir::Value source;
for (const auto &[_, bit] : bits) {
if(source && source != bit.source) return false;
source = bit.source;
}
return true;

@mafeguimaraes mafeguimaraes force-pushed the feature/hw-vectorization-pass branch 3 times, most recently from a756a7e to ae4c6b6 Compare December 16, 2025 11:43
@mafeguimaraes mafeguimaraes force-pushed the feature/hw-vectorization-pass branch from ae4c6b6 to 1f3df90 Compare December 16, 2025 11:46
@mafeguimaraes mafeguimaraes force-pushed the feature/hw-vectorization-pass branch from dcbe132 to 80f2389 Compare December 16, 2025 12:32
@mafeguimaraes
Copy link
Contributor Author

Hi @uenoku,

Thank you very much for the review and for pointing out the issues with the previous approach. It was really helpful.

I’ve reworked findBitSource to keep it strictly structural again, and moved all boolean reasoning into a separate helper (isBitConstant). This helper is intentionally limited: it only proves constants through structural traversal and identity propagation (e.g., and(x, 1) and or(x, 0)), and does not attempt general boolean simplification.

The helper is used only to recognize identity masks in and/or, which allows handling the mux-like pattern in test_mux without turning findBitSource into a semantic evaluator.

Please let me know if this direction looks more reasonable to you, or if you’d prefer an even more conservative restriction.

Thanks again for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants