onnxruntime
7489bfee - Enable AVX NE CONVERT for FP16 to FP32 cast (#21183)

Commit
1 year ago
Enable AVX NE CONVERT for FP16 to FP32 cast (#21183) ### Description Implementation of a new cast assembly kernel that uses AVX_NE_CONVERT instructions to accelerate casting from FP16 to FP32. Added CPUID checks to determine support of the ISA. ### Motivation and Context Currently FP16 models executed on systems that lack complete FP16 operator support use single precision on every node to run the model, this means the original FP16 weights have to be casted to FP32 in order to run the model properly, this change aims to accelerate the casting by using upconvert instructions and therefore improve performance.
Author
Erick Muñoz
Parents
Loading