CANN: refactor mask handling and improve performance in FA (#15561)
* CANN(flash-attn): refactor mask handling and improve performance
1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode.
2. Optimized performance in non-alibi scenarios by reducing one repeat operation.
3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16.
Signed-off-by: noemotiovon <757486878@qq.com>
* [CANN]: fix review
Signed-off-by: noemotiovon <757486878@qq.com>
* [CANN]: Optimization FA BNSD to BSND
Signed-off-by: noemotiovon <757486878@qq.com>
---------
Signed-off-by: noemotiovon <757486878@qq.com>