Skip to content

Modernize unmanaged allocation: Marshal.AllocHGlobal → NativeMemory #528

@Nucs

Description

@Nucs

Overview

Replace legacy Marshal.AllocHGlobal/FreeHGlobal with the modern NativeMemory API (.NET 6+) across all unmanaged allocation sites, enabling aligned allocation for future SIMD vectorization and zero-initialized allocation for np.zeros.

Problem

All unmanaged memory allocation in NumSharp goes through Marshal.AllocHGlobal/FreeHGlobal at 5 call sites in 2 files:

File Call Purpose
UnmanagedMemoryBlock1.cs:31` Marshal.AllocHGlobal(new IntPtr(bytes)) Primary array allocation
UnmanagedMemoryBlock1.cs:995` Marshal.FreeHGlobal(Address) Deallocation in Disposer
StackedMemoryPool.cs:90 Marshal.AllocHGlobal(SingleSize) Pool overflow allocation
StackedMemoryPool.cs:169 Marshal.FreeHGlobal(addr) Pool cleanup
StackedMemoryPool.cs:238 individualyAllocated.ForEach(Marshal.FreeHGlobal) Pool disposal

Marshal.AllocHGlobal wraps LocalAlloc on Windows (Win32 legacy) and malloc on Unix — no alignment guarantees beyond platform default (8 or 16 bytes), no zero-init option, and IntPtr return type requires casting.

Proposal

Replace with equivalent NativeMemory calls:

// Drop-in replacement:
var ptr = (IntPtr)NativeMemory.Alloc((nuint)bytes);
NativeMemory.Free((void*)ptr);

// Aligned (enables future SIMD):
var ptr = (IntPtr)NativeMemory.AlignedAlloc((nuint)bytes, alignment: 32);
NativeMemory.AlignedFree((void*)ptr);

// Zero-initialized (optimized np.zeros):
var ptr = (IntPtr)NativeMemory.AllocZeroed((nuint)bytes);
  • Replace Marshal.AllocHGlobalNativeMemory.Alloc in UnmanagedMemoryBlock1.cs`
  • Replace Marshal.FreeHGlobalNativeMemory.Free in Disposer
  • Replace alloc/free in StackedMemoryPool.cs (3 sites)
  • Update AllocationType enum if needed (new variant or replace AllocHGlobal wholesale)
  • Add AllocZeroed fast path for np.zeros / np.zeros_like
  • Add allocation benchmarks to NumSharp.Benchmark
  • Verify all tests pass

Evidence

  • NativeMemory.AlignedAlloc allows 32-byte (AVX2) or 64-byte (AVX-512) alignment — prerequisite for SIMD vectorization of arithmetic loops
  • NativeMemory.AllocZeroed delegates to calloc / OS zero-page mapping — potentially faster than Alloc + manual Unsafe.InitBlock
  • NativeMemory.Alloc returns void* directly, avoiding IntPtr round-trip in a codebase that immediately casts to T*
  • The Disposer class already dispatches on AllocationType enum — clean extension point

Scope / Non-goals

  • In scope: Replace 5 allocation sites, add benchmarks, optional AllocZeroed fast path
  • Not in scope: SIMD vectorization of arithmetic loops (separate effort), changing StackedMemoryPool pooling strategy, NativeMemory.AlignedRealloc

Benchmark / Performance

Must benchmark before merging. The allocation hot path affects every NDArray creation.

Benchmark What to measure
Allocation throughput NativeMemory.Alloc vs Marshal.AllocHGlobal at small (<1KB), medium (1KB-1MB), large (>1MB) sizes
Aligned overhead AlignedAlloc(32) vs Alloc for the same sizes
Zero-init AllocZeroed vs Alloc + Unsafe.InitBlock / Span.Clear
Pool interaction StackedMemoryPool.Take/Return with both APIs
End-to-end np.arange(N), np.zeros(N), a + b for representative array sizes

Breaking changes

None — internal implementation detail, no public API changes.

Related issues

Metadata

Metadata

Assignees

Labels

architectureCross-cutting structural changes affecting multiple componentscoreInternal engine: Shape, Storage, TensorEngine, iteratorsenhancementNew feature or requestrefactorCode cleanup without behavior change

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions