Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions crates/rustc_codegen_spirv/src/builder/libm_intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ pub enum LibmCustomIntrinsic {
Tgamma,
Log1p,
NextAfter,
Powi,
Remainder,
RemQuo,
Scalbn,
Expand Down Expand Up @@ -157,6 +158,7 @@ pub const TABLE: &[(&str, LibmIntrinsic)] = &[
),
("pow", LibmIntrinsic::GLOp(GLOp::Pow)),
("powf", LibmIntrinsic::GLOp(GLOp::Pow)),
("powi", LibmIntrinsic::Custom(LibmCustomIntrinsic::Powi)),
(
"remainder",
LibmIntrinsic::Custom(LibmCustomIntrinsic::Remainder),
Expand Down Expand Up @@ -306,6 +308,12 @@ impl Builder<'_, '_> {
LibmIntrinsic::Custom(LibmCustomIntrinsic::NextAfter) => {
self.undef_zombie(result_type, "NextAfter not supported yet")
}
LibmIntrinsic::Custom(LibmCustomIntrinsic::Powi) => {
assert_eq!(args.len(), 2);
// Convert integer exponent to float, then use GLOp::Pow
let float_exp = self.sitofp(args[1], args[0].ty);
self.gl_op(GLOp::Pow, result_type, [args[0], float_exp])
}
LibmIntrinsic::Custom(LibmCustomIntrinsic::Remainder) => {
self.undef_zombie(result_type, "Remainder not supported yet")
}
Expand Down
8 changes: 8 additions & 0 deletions crates/rustc_codegen_spirv/src/codegen_cx/declare.rs
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,14 @@ impl<'tcx> CodegenCx<'tcx> {
}
}

// Check for usage of `num_traits` intrinsics (like Float::powi) that we can optimize
if self.tcx.crate_name(def_id.krate) == self.sym.num_traits && !def_id.is_local() {
let item_name = self.tcx.item_name(def_id);
if let Some(&intrinsic) = self.sym.libm_intrinsics.get(&item_name) {
self.libm_intrinsics.borrow_mut().insert(def_id, intrinsic);
}
}
Comment on lines +169 to +175
Copy link
Contributor

@nazar-pc nazar-pc Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how things work more generally, but why are things like this even necessary? I'd expect the generic machinery of the compiler to optimize it to some canonical form that the target would like the most.

So it shouldn't make any difference if one writes x.powi(2) or x * x or uses a few layers of zero-cost abstractions in the process, should all result in identical SPIR-V, just like it does with LLVM. What am I missing?

Copy link
Collaborator Author

@LegNeato LegNeato Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we are using num_traits to get the APIs we need off primitives, and num_traits knows nothing about the intrinsics available on the platform. It is using a loop as impl https://github.com/rust-num/num-traits/blob/master/src/pow.rs#L179. Pretty much no compiler would be able to tell that code can be replaced with a cast and a powi, and rustc doesn't. Instead somewhere needs to map the call / the semantics to the underlying fast impl, which is what the backend is supposed to do. It already does it for many ops, we just missed a spot (I guess? This all predates me).

One can argue we shouldn't use num_traits and instead should just hang our own impls off the types, I have no clue why it was chosen vs that route. Probably a make it work first then make it fast tradeoff (as the num_traits impl does work on spirv).

I have a new optimizer where we can control things much more rather than relying on spirv-opt, but this replacement happens before the optimizer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, that definitely looks like a problem. Maybe worth creating an issue to look into getting rid of num_traits? Doing things like looping instead of native exponentiation is definitely going to hurt and there might be more examples like it.

Copy link
Member

@Firestar99 Firestar99 Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't get rid of num_traits for silly rust reasons:
core::f32::powf is marked as #[rustc_allow_incoherent_impl], meaning it doesn't actually "exist" unless some other crate declares it. I'm not too familiar on the details, but it seems like when you have std it exists and if you only have core it's declared but doesn't exist and using it will fail with:

  error[E0599]: no function or associated item named `powf` found for type `f32` in the current scope
     --> crates/restir-shader/src/material/pbr/eval.rs:159:25
      |
  159 |     f0 + (1.0 - f0) * f32::powf(f32::clamp(1.0 - cos_theta, 0.0, 1.0), 5.0)
      |                            ^^^^ function or associated item not found in `f32`
      |
      = help: items from traits can only be used if the trait is in scope

see rust-lang/rust#149347 on this sillyness

The workaround has usually been this use statement:

#[cfg(target_arch = "spirv")]
use spirv_std::num_traits::Float;

num_traits with std feature will call std functions and with libm feature / in #[no_std] environments call the appropriate libm function. For SPIR-V, our spirv-std configures the feature for you and we then intercept those libm calls and replace them with intrinsics.

While researching this, I've noticed this has been discussed in 2021 on the embark repo already, and their conclusion was:

I think this is a fully external issue, then - rust-gpu is working "as intended", faithfully compiling the code that it is given, which doesn't use an intrinsic. The way to fix this, then, would be to change upstream code to be less branchy (e.g. requesting that libm adds a powi intrinsic, and having num-traits call it).

I don't mind if we want to merge this anyway, so it's not a trap to end users @LegNeato.

But num-traits isn't a perfect solution either: Technically, it doesn't actually reimplement all f32 functions. For example, it excludes euclid, for which you need to use the Euclid trait, which takes args by reference instead of by value, so you'll get awful code like this (which initially sparked my investigation into this last summer):

#[cfg(feature = "std")]
let rem = |x: f32| x.rem_euclid(1.);
#[cfg(not(feature = "std"))]
let rem = |x: f32| x.rem_euclid(&1.);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also in favor of replacing num-traits with an in-house solution in the future. It would give Rust-GPU the necessary stability it needs. If num-traits does some change in the future it could silently break Rust-GPU in subtle ways that aren't immediately obvious, which is poison.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Firestar99 created an issue for this, let's just merge for now as the current state is obviously bad and this makes it (slightly) better?


// Check if this is a From trait implementation
if let Some(impl_def_id) = self.tcx.impl_of_method(def_id)
&& let Some(trait_ref) = self.tcx.impl_trait_ref(impl_def_id)
Expand Down
2 changes: 2 additions & 0 deletions crates/rustc_codegen_spirv/src/symbols.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ pub struct Symbols {
pub vector: Symbol,
pub v1: Symbol,
pub libm: Symbol,
pub num_traits: Symbol,
pub entry_point_name: Symbol,
pub spv_khr_vulkan_memory_model: Symbol,

Expand Down Expand Up @@ -416,6 +417,7 @@ impl Symbols {
vector: Symbol::intern("vector"),
v1: Symbol::intern("v1"),
libm: Symbol::intern("libm"),
num_traits: Symbol::intern("num_traits"),
entry_point_name: Symbol::intern("entry_point_name"),
spv_khr_vulkan_memory_model: Symbol::intern("SPV_KHR_vulkan_memory_model"),

Expand Down
13 changes: 13 additions & 0 deletions tests/compiletests/ui/dis/powi.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
// Test that `Float::powi` uses GLSL.std.450 Pow instead of a loop-based implementation.
// See https://github.com/Rust-GPU/rust-gpu/issues/516

// build-pass
// compile-flags: -C llvm-args=--disassemble-entry=main

use spirv_std::num_traits::Float;
use spirv_std::spirv;

#[spirv(fragment)]
pub fn main(input: f32, output: &mut f32) {
*output = input.powi(2);
}
12 changes: 12 additions & 0 deletions tests/compiletests/ui/dis/powi.stderr
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
%1 = OpFunction %2 None %3
%4 = OpLabel
OpLine %5 11 12
%6 = OpLoad %7 %8
OpLine %5 12 20
%9 = OpConvertSToF %7 %10
%11 = OpExtInst %7 %12 26 %6 %9
OpLine %5 12 4
OpStore %13 %11
OpNoLine
OpReturn
OpFunctionEnd
Loading