Home
Articles
Questions
Free courses
America Zip Code
Color Code
Login
AVX intrinsic and matrix multiplication with c language
c
matrix
avx
What makes numpy.sum faster than an optimized C loop?
python
floating-point
c
numpy
avx
x86 Intrinsic: optimize a Matrix multiply of complex floats
c
x86
matrix-multiplication
complex-numbers
avx
x86 Intrinsic : FIR for complex float input
c
convolution
complex-numbers
sse
avx
Count leading zeros in __m256i word
c
x86
simd
avx
intrinsics
Optimize SIMD Version of Range Generation Algorithm
c
simd
x86-64
avx
C simd AVX1 m256 horizontal max min normalisation
c
simd
avx
AV512: Best way to combine horizontal sum and broadcast
c
intel
avx
avx512
How can I most efficiently convert an __m256i vector containing 32 unsigned 8-bit integers to four __m256 vectors of 32-bit floats?
c
avx2
simd
avx
intrinsics
Which contexts need to be saved in x86-64 with a c function return?
c
x86-64
abi
avx
context-switch
`_mm256_zeroall()` can't initialize register variables
c++
c
assembly
x86
avx
Can Apache web server make use of CPU AVX instructions?
apache
performance
avx
Using integer literal instead of constant variable makes program slower
c
x86
icc
avx
How to formally verify correctness of vectorized C code (or Fortran)
c
vectorization
avx
frama-c
formal-verification
AVX2 intrinsic function __mm256_div_epi32 was not declared in this scope
c++
avx2
avx
Optimizing MatMult with AVX
c++
avx2
matrix-multiplication
avx
Horizontal min on avx2 8 float register and shuffle paired registers alongside
c++
avx2
simd
sse
avx
AVX2 Function To Set a Bit to 1 performs the same as SSE2
c++
optimization
bit-manipulation
sse
avx
Convert DWORD to Unsigned char. AVX
c++
type-conversion
sse
avx
intrinsics
Tensorflow-io import returns error "Illegal instruction (core dumped)"
python
tensorflow
tensorflow2.0
avx
Previous
1
(current)
2
3
4
5
Next