Improve repeat_interleave with scalar repeat value (#102570)
`repeat_interleave_symint` is currently implemented by guarding on the `SymInt`
and converting it to a tensor to pass to the Tensor overload. This instead
implements it as a copy of an expanded tensor, which can be done without guards
and is also much more efficient in eager mode to boot.
For example, these are timings for `x.repeat_interleave(100, dim=-1)` with `x.shape == (1000, 100)`
| Device | Time (Master) | Time (This PR) | Speedup |
|--------|---------------|-----------------|---------|
| cpu | 18.8 ms | 3.5 ms | 5.4 |
| cuda | 271 us | 134 us | 2.0 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102570
Approved by: https://github.com/lezcano