[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851)
M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is
effectively useless. Represent this by restricting the dest RC of that
instruction to `SReg_32_XM0` which excludes M0.
There is a lot of test changes due to the register class changing, but
most changes are trivial. In some cases, an extra register and
`s_mov_b32` is needed.
Fixes SWDEV-513269