let inductor generate broadcast when loading a single value (#92595)
For better perf with MLIR triton.
Changes
```
tmp32 = tl.load(seed3 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None)
```
to
```
tmp32_load = tl.load(seed3+(0)); tmp32 = tl.broadcast_to(tmp32_load, [XBLOCK, RBLOCK])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92595
Approved by: https://github.com/Chillee