[inductor] fix duplicate arg handling in triton templates (#105315)
Fixes #105212
De-duplicate kernel args in codegen and autotuning of `torch.mm` and `torch.bmm`.
refer to https://github.com/pytorch/pytorch/issues/105212#issuecomment-1637168866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105315
Approved by: https://github.com/jansel