[AMDGPU] Implement vop3p complex pattern optmization for gisel (#130234)
Seeking opportunities to optimize VOP3P instructions by altering opsel,
opsel_hi, neg, neg_hi bits
Tests differences:
1. fix op_sel_hi bit for inline constant:
1. `CodeGen/AMDGPU/packed-fp32.ll`
2. use neg bit to remove xor with 0x80008000
1. `CodeGen/AMDGPU/strict_fsub.f16.ll`
2. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll`
3. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll`
4. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll`
5. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll`
6. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll`
7. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll`
3. Remove xor 0x80008000, and use opsel, opsel_hi to remove alignbit
1. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll`