Memory Alignment Issues with GCC Vector Extension Understanding Memory Alignment Issues with GCC Vector Extensions Vector extensions in GCC like m128 and m256 offer significant performance gains by allowing par 3 min read 07-10-2024 20
Neon on Raspberry Pi 5 to accelerate RGB2GRay, 128bit (Q register) slower than 64bit(D register), why? Understanding the Performance of Neon on Raspberry Pi 5 RGB 2 Gray Conversion When working with image processing on the Raspberry Pi 5 particularly when convert 3 min read 28-09-2024 17
inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch Understanding Inlining Failures in SIMD Programming The Case of mm mullo epi32 When working with SIMD Single Instruction Multiple Data operations in C or C deve 2 min read 28-09-2024 17
Why doesn't this SIMD code show better performance? Understanding SIMD Performance Why Doesnt This Code Show Improvements Single Instruction Multiple Data SIMD is a parallel processing technique that allows for t 3 min read 20-09-2024 20
Unpacking nibbles to bytes - Direct instructions/ Efficient Way to implement and keep sign Unpacking Nibbles to Bytes Efficient Implementation and Maintaining Sign In programming particularly in data manipulation there can be a need to convert smaller 2 min read 20-09-2024 24
Setting target for static inline variables Understanding Static Inline Variables in C C In the world of C and C programming managing variable scope and memory efficiently is crucial One particular area t 2 min read 15-09-2024 26
Setting GCC target options (AVX2) for static inline variables with a pragma doesn't work? Understanding GCC Target Options Why AVX 2 Pragma for Static Inline Variables Doesn t Work When it comes to optimizing C and C code the GNU Compiler Collection 3 min read 15-09-2024 44
Push XMM register to the stack Pushing and Popping XMM Registers to the Stack Many developers working with x86 assembly language encounter the need to store and retrieve values held in XMM re 2 min read 07-09-2024 19
Does SIMD require a multi-core CPU? SIMD Not Just for Multi Core CPUs SIMD or Single Instruction Multiple Data is a powerful technique for accelerating computationally intensive tasks But does it 2 min read 05-09-2024 35
Differences between AVX and AVX2 Demystifying AVX and AVX 2 A Guide to Understanding the Differences The Intel Advanced Vector Extensions AVX and AVX 2 are instruction sets designed to accelera 2 min read 04-09-2024 37
AVX2 computing of byte array Optimizing Byte Array Processing with AVX 2 A Deep Dive This article explores techniques for optimizing byte array processing using AVX 2 a powerful SIMD instru 2 min read 30-08-2024 31
AVX2 MaskLoad/MaskStore of ushorts? AVX 2 Mask Load Mask Store with U Shorts Understanding the Challenges This article explores the intricacies of using AVX 2s Mask Load and Mask Store instruction 2 min read 29-08-2024 26
Why is ARM NEON SIMD Sum is slower than serial sum? Unmasking the Mystery Why is ARM NEON SIMD Sum Slower than Serial Sum The world of optimized code can be perplexing and one such puzzle arises when comparing th 2 min read 29-08-2024 24
Speed-up byte signature scanning in memory using SIMD Supercharge Your Byte Signature Scanning with SIMD Finding specific byte sequences within large blocks of memory is a common task in many applications from secu 2 min read 29-08-2024 40
AVX2 consuming bytes whilst producing uints? SIMD Optimization for Grayscale to Premultiplied Alpha Conversion Converting a grayscale image to a premultiplied alpha image with a specified color presents an 2 min read 29-08-2024 30
Twice as slow SIMD performance without extra copy Twice as slow SIMD performance without extra copy This article explores a puzzling performance disparity observed in SIMD Single Instruction Multiple Data code 2 min read 28-08-2024 30