llama.cpp
f9bc66c3 - CANN: Update several operators to support FP16 data format (#16251)

Commit

24 days ago

CANN: Update several operators to support FP16 data format (#16251) Many Ascend operators internally use FP16 precision for computation. If input data is in FP32, it must first be cast to FP16 before computation, and then cast back to FP32 after computation, which introduces unnecessary cast operations. Moreover, FP16 computation requires significantly less workload compared to FP32, leading to noticeable efficiency improvements. In this change, `get_rows`, `rms_norm`, and `flash_attn_ext` are extended to support multiple data types. Validation on the Qwen2 0.5b model shows correct accuracy and about 10% performance gain in concurrent scenarios. Co-authored-by: noemotiovon <757486878@qq.com>

References

#16251 - CANN: Update several operators to support FP16 data format

Author

hipudding

Parents

a31cf36a

llama.cpp f9bc66c3 - CANN: Update several operators to support FP16 data format (#16251)

llama.cpp
f9bc66c3 - CANN: Update several operators to support FP16 data format (#16251)