[inductor] Move `tl.broadcast` call out codegen.common (#98304)
This makes only a cosmetic change to the generated code, but means
triton's broadcasting logic doesn't leak out into the CSE class.
Before:
```python
tmp5_load = tl.load(in_ptr1 + (0))
tmp5 = tl.broadcast_to(tmp5_load, [XBLOCK])
```
After:
```python
tmp5 = tl.load(in_ptr1 + (0))
tmp6 = tl.broadcast_to(tmp5, [XBLOCK])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98304
Approved by: https://github.com/ngimel