Make triton_meta be part of user defined triton kernel cache (#120809)
Tensors with different shapes will generate different triton meta (divisibility rules), we need this to be part of the cache key.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120809
Approved by: https://github.com/chenyang78, https://github.com/jansel