onnxruntime
530a1fbb - [QNN EP] Add BFloat16 dtype support in QNN EP (#26987)

Commit
70 days ago
[QNN EP] Add BFloat16 dtype support in QNN EP (#26987) ### Description - QNN NPU backend supports BFloat16 dtype for many operators - QNN EP adds a new session option "htp_bf16_enable" to enable Users to signal processing the Float32 graph in BFloat16 precision - When User specifies "htp_bf16_enable", the QNN EP lowers incoming Float32 Ort graph into BFloat16 QNN graph. - The ORT CPU fallback still receives Float32 partitions. - The lowered QNN graph still accepts float32 inputs, outputs and constant initializers. The QNN EP inserts Cast operators to do the necessary precision switch. ### Motivation and Context - This enables computing accuracy sensitive float32 models in bfloat16 precision on Qualcomm NPU accelerator to improve inference time w.r.t computing in float32 precision. --------- Co-authored-by: Ashwath Shankarnarayan <ashwshan@qti.qualcomm.com>
Author
Parents
Loading