sgemm : AVX Q4_0 and Q8_0 (#6891)

Commit

2 years ago

sgemm : AVX Q4_0 and Q8_0 (#6891) * basic avx implementation * style * combine denibble with load * reduce 256 to 128 (and back!) conversions * sse load * Update sgemm.cpp * oops oops