[AMDGPU] Use value's DebugLoc for bitcast in performStoreCombine (#186766)
## Description
When `AMDGPUTargetLowering::performStoreCombine` inserts a synthetic
bitcast to convert vector types (e.g. `<1 x float>` → `i32`) for stores,
the bitcast inherits the **store's** SDLoc. When
`DAGCombiner::visitBITCAST` later folds `bitcast(load)` → `load`, the
resulting load loses its original debug location.
## Analysis
The bitcast is **not** present in the initial SelectionDAG — it is
inserted during DAGCombine by
`AMDGPUTargetLowering::performStoreCombine`. This can be observed with
`-debug-only=isel,dagcombine`:
```
Initial selection DAG: no bitcast, load is v1f32 directly used by store
Combining: t17: ch = store ... /tmp/beans.c:6:14
... into: t20: ch = store ... /tmp/beans.c:6:14
Combining: t19: i32 = bitcast [ORD=3] # D:1 t13, /tmp/beans.c:6:14
... into: t21: i32,ch = load ... /tmp/beans.c:6:14
```
In `performStoreCombine` (`AMDGPUISelLowering.cpp`):
```cpp
SDLoc SL(N); // N = store node → SL has store's DebugLoc
...
SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val);
// bitcast gets store's DebugLoc, not load's
```
When `visitBITCAST` folds `bitcast(load)` → `load`, it uses `SDLoc(N)`
(the bitcast's loc = store's loc), so the resulting load loses its
original debug location.
```
Before (initial DAG):
t13: v1f32 = load ... line 2 ; original load
t14: ch = store t13, ... line 3 ; store
After performStoreCombine:
t13: v1f32 = load ... line 2 ; original load
t19: i32 = bitcast t13 line 3 ; synthetic bitcast (store's loc!)
t20: ch = store t19, ... line 3
After visitBITCAST folds (incorrect):
t21: i32 = load ... line 0 ; lost debug location
After visitBITCAST folds (expected):
t21: i32 = load ... line 2 ; preserves load's location
```
## Fix
Target-specific fix in `AMDGPUISelLowering.cpp` `performStoreCombine`:
use `DAG.getBitcast()` instead of `DAG.getNode(ISD::BITCAST, SL, ...)`.
`getBitcast()` internally uses `SDLoc(V)` (the value operand's SDLoc),
so the synthetic bitcast naturally inherits the load's DebugLoc instead
of the store's:
```cpp
// Before:
SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val);
if (OtherUses) {
SDValue CastBack = DAG.getNode(ISD::BITCAST, SL, VT, CastVal);
// After:
SDValue CastVal = DAG.getBitcast(NewVT, Val);
if (OtherUses) {
SDValue CastBack = DAG.getBitcast(VT, CastVal);
```
This is consistent with `performLoadCombine` where the bitcast also uses
the load's `SDLoc`.