vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (#19281)
Write out a 2-bit code per block and avoid loading the mask when it
matches these two common cases.
Apply this optimization when the mask is relatively large (i.e. prompt
processing).