FLOPs per cycle for Sandy Bridge and Haswell and others SSE2 / AVX / AVX2 / AVX-512 Understanding FLOPs per Cycle for Intel Architectures A Focus on Sandy Bridge and Haswell In the world of computing performance metrics are essential for evalua 3 min read 07-10-2024 5
Compiling legacy GCC code with AVX vector warnings Navigating Legacy GCC Code and AVX Vector Warnings A Practical Guide The Problem You re working with legacy C C code compiled with GCC and suddenly you encounte 3 min read 07-10-2024 5
Unpacking nibbles to bytes - Direct instructions/ Efficient Way to implement and keep sign Unpacking Nibbles to Bytes Efficient Implementation and Maintaining Sign In programming particularly in data manipulation there can be a need to convert smaller 2 min read 20-09-2024 15
Per-element atomicity of vector load/store and gather/scatter? Diving Deep into Per Element Atomicity of Vector Operations on x86 This article delves into the complex world of vector load store gather and scatter instructio 2 min read 06-09-2024 13
Differences between AVX and AVX2 Demystifying AVX and AVX 2 A Guide to Understanding the Differences The Intel Advanced Vector Extensions AVX and AVX 2 are instruction sets designed to accelera 2 min read 04-09-2024 25
AVX2 consuming bytes whilst producing uints? SIMD Optimization for Grayscale to Premultiplied Alpha Conversion Converting a grayscale image to a premultiplied alpha image with a specified color presents an 2 min read 29-08-2024 21
Question about AVX-2, x86-64 function calls and Compilers Optimizing Memory Initialization with AVX 2 A Deep Dive into Compiler Choices and Performance This article explores the intricacies of optimizing memory initial 2 min read 28-08-2024 22
AVX shuffle with types other than byte AVX Shuffle with Types Other Than Byte The AVX instruction set provides a powerful mechanism for shuffling data within vectors While the intrinsics for mm256 sh 2 min read 27-08-2024 19