[AMDGPU] Fix matchPERM byte tracker for SRA past operand width (#198708)
Bytes past the operand are 0 for SRL but the sign bit for SRA. The old
code treated both as 0, so v_perm_b32 picked the wrong byte for SRA
Example:
`ashr x, 24` keeps only x's byte 0 in the result. The upper bytes are
copies of x's sign bit, not bytes of x. The matcher used to map them
back to bytes of x, producing a perm mask that ignored the sign extend