Some intrinsics for a size N are introduce in the same generation that introduces a register of size `2N.
_mm_srlv_epi32 (128 bits) is introduced in Avx2, along with _mm256
- Plently of 128 and 256 API introduced in AVX512
I'm wondering how to implement that in Xsimd.
My best understanding is that when compiling with AVX2, std::make_sized_batch<uint8_t, 16>() will return an xsimd::sse4.2 architecture and the dispatch mechanism cannot know from requires_arch that the AVX2 128 bit instruction is available.
One way to work around it is using if constexpr(supported_architectures::contains<avx2>()) but that seems to duplicate the dispatch mechanism.
Another possibility could be to decouple the architecture from the register type.
What do you think @JohanMabille @serge-sans-paille ?