Add equal_to_1 to triton_meta for user-written Triton kernels (#120579)
Summary: Previously, we omitted `equal_to_1` from the `triton_meta` part of the `@user_autotune` decorator. For user-written Triton kernels, this could lead to perf regressions, as the kernel in the Inductor codegen is compiled without `equal_to_1` specialization.
Fixes #120478. The repro from the issue, on A100:
Before this PR:
```
Triton matmul: 0.0167 seconds
Triton matmul compiled: 0.0751 seconds
```
After this PR:
```
Triton matmul: 0.0168 seconds
Triton matmul compiled: 0.0072 seconds
```
Test Plan:
```
$ python test/dynamo/test_triton_kernels.py -k test_triton_kernel_equal_to_1_arg
...
----------------------------------------------------------------------
Ran 3 tests in 3.545s
OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120579
Approved by: https://github.com/oulgen, https://github.com/jansel, https://github.com/chenyang78