SSE2 Color Blending Speeding Up Color Blending with SSE 2 Instructions Color blending is a fundamental operation in computer graphics used to create realistic and visually appealin 2 min read 06-10-2024 10
GCC generates slow code when targeting more recent sse version Why Your GCC Compiled Code is Slow The SSE Version Conundrum Modern CPUs boast advanced instruction sets like SSE Streaming SIMD Extensions to accelerate perfor 2 min read 04-10-2024 12
inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch Understanding Inlining Failures in SIMD Programming The Case of mm mullo epi32 When working with SIMD Single Instruction Multiple Data operations in C or C deve 2 min read 28-09-2024 9
Why does removing instructions from my SSE intrinsic function make it slower? Why Does Removing Instructions from My SSE Intrinsic Function Make It Slower When optimizing code that employs SIMD Single Instruction Multiple Data capabilitie 3 min read 23-09-2024 19
Push XMM register to the stack Pushing and Popping XMM Registers to the Stack Many developers working with x86 assembly language encounter the need to store and retrieve values held in XMM re 2 min read 07-09-2024 15
Per-element atomicity of vector load/store and gather/scatter? Diving Deep into Per Element Atomicity of Vector Operations on x86 This article delves into the complex world of vector load store gather and scatter instructio 2 min read 06-09-2024 13
Speed-up byte signature scanning in memory using SIMD Supercharge Your Byte Signature Scanning with SIMD Finding specific byte sequences within large blocks of memory is a common task in many applications from secu 2 min read 29-08-2024 28
How to implement real-time responses in a Flask-based chatbot with OpenAI Assistants API? Implementing Real Time Responses in a Flask Based Chatbot with Open AI Assistants API This article explores how to implement real time responses in a Flask base 3 min read 28-08-2024 15
Twice as slow SIMD performance without extra copy Twice as slow SIMD performance without extra copy This article explores a puzzling performance disparity observed in SIMD Single Instruction Multiple Data code 2 min read 28-08-2024 22
Why CSAPP say Gcc do not use vcvtss2sd? Why CSAPP says GCC does not use vcvtss2sd The statement in the book Computer Systems A Programmers Perspective 3rd Edition about GCC not using vcvtss2sd for sin 2 min read 27-08-2024 23