vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (#24123)
* vulkan: add support for valve fp16 dot2 extension
* use macro for dot2 path choice
* properly check for the feature
* add dot_product abstraction to reduce preprocessor branching