-
Notifications
You must be signed in to change notification settings - Fork 205
Open
Labels
coreInternal engine: Shape, Storage, TensorEngine, iteratorsInternal engine: Shape, Storage, TensorEngine, iteratorsenhancementNew feature or requestNew feature or requestperformancePerformance improvements or optimizationsPerformance improvements or optimizations
Description
Overview
Extend the IL kernel generator's SIMD unary operation support beyond the current Negate/Abs/Sqrt.
Parent issue: #545
Current State
| Operation | SIMD Status | Implementation |
|---|---|---|
| Negate | ✅ SIMD | Vector256.op_UnaryNegation |
| Abs | ✅ SIMD | Vector256.Abs() |
| Sqrt | ✅ SIMD | Vector256.Sqrt() |
| Floor | ❌ Scalar | Math.Floor per-element |
| Ceil | ❌ Scalar | Math.Ceiling per-element |
| Round | ❌ Scalar | Math.Round per-element |
| Exp | ❌ Scalar | Math.Exp per-element |
| Log | ❌ Scalar | Math.Log per-element |
| Sin/Cos/Tan | ❌ Scalar | Math.Sin/Cos/Tan per-element |
SIMD eligibility check: ILKernelGenerator.cs:2086
Vector operation dispatch: ILKernelGenerator.cs:2980-3014
Task List
Tier 1: Quick Wins (Vector256 methods exist in .NET)
-
SIMD Floor
- .NET has
Vector256.Floor() - Add to
CanUseUnarySimd()eligibility - Add to
EmitUnaryVectorOperation()dispatch - Expected: 2× speedup
- .NET has
-
SIMD Ceiling
- .NET has
Vector256.Ceiling() - Same implementation pattern as Floor
- Expected: 2× speedup
- .NET has
-
SIMD Truncate
- .NET has
Vector256.Truncate() - Same implementation pattern
- Expected: 2× speedup
- .NET has
Tier 2: Medium Effort
- SIMD Round
- May need
Vector256.Round()or composition - Check .NET 8+ availability
- Expected: 1.5-2× speedup
- May need
Tier 3: Transcendentals (Complex)
-
SIMD Exp/Log (research)
- No
Vector256.Exp()in .NET BCL - Options:
- Polynomial approximation (Remez/minimax)
- External library (MathNet.Numerics)
- P/Invoke to Intel SVML
- Wait for .NET Tensor primitives
- Expected: 2-4× speedup if implemented
- No
-
SIMD Sin/Cos/Tan (research)
- Same challenge as Exp/Log
- Range reduction + polynomial approximation
- More complex due to periodicity
Implementation Details
Floor/Ceil (Tier 1)
// In CanUseUnarySimd (line ~2086), add:
|| key.Op == UnaryOp.Floor
|| key.Op == UnaryOp.Ceil
// In EmitUnaryVectorOperation (line ~2980), add:
case UnaryOp.Floor:
var floorMethod = typeof(Vector256).GetMethod("Floor",
new[] { typeof(Vector256<>).MakeGenericType(GetClrType(type)) });
il.EmitCall(OpCodes.Call, floorMethod, null);
break;
case UnaryOp.Ceil:
var ceilMethod = typeof(Vector256).GetMethod("Ceiling",
new[] { typeof(Vector256<>).MakeGenericType(GetClrType(type)) });
il.EmitCall(OpCodes.Call, ceilMethod, null);
break;Transcendentals (Tier 3) - Research Notes
Exp approximation approach:
// Exp(x) via range reduction + polynomial
// 1. Clamp x to avoid overflow
// 2. n = round(x / ln2), r = x - n*ln2
// 3. exp(r) ≈ polynomial (|r| < ln2/2)
// 4. result = 2^n * exp(r)
public static Vector256<float> Exp(Vector256<float> x)
{
var ln2 = Vector256.Create(0.693147180559945f);
var invLn2 = Vector256.Create(1.44269504088896f);
// ... polynomial coefficients ...
}Files to Modify
| File | Changes |
|---|---|
ILKernelGenerator.cs:2086 |
Add Floor/Ceil/Round to eligibility |
ILKernelGenerator.cs:2980 |
Add vector operation dispatch |
SimdKernels.cs (optional) |
C# fallback implementations |
Benchmarks
[Benchmark] public NDArray Floor_10M() => np.floor(_array);
[Benchmark] public NDArray Ceil_10M() => np.ceil(_array);
[Benchmark] public NDArray Exp_10M() => np.exp(_array);
[Benchmark] public NDArray Sin_10M() => np.sin(_array);NumPy Baseline (10M float64)
| Operation | NumPy Time |
|---|---|
| np.floor | ~8 ms |
| np.ceil | ~8 ms |
| np.exp | ~20 ms |
| np.sin | ~50 ms |
Success Criteria
- Floor/Ceil SIMD: ≥1.5× faster than current scalar
- All existing unary tests pass
- No accuracy regression vs scalar implementation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
coreInternal engine: Shape, Storage, TensorEngine, iteratorsInternal engine: Shape, Storage, TensorEngine, iteratorsenhancementNew feature or requestNew feature or requestperformancePerformance improvements or optimizationsPerformance improvements or optimizations