[inuctor][easy] avoid duplicate kernel definitions (#105099)
When running BertForMaskedLM , I found if I enable the kernel benchmark, essentially identical kernels will be defined once for each call site. The reason is the benchmark harness of those kernels uses different seed_offset for each invocation. We should be safe to just force seed_offset to be 0 so we can deduplicate identical kernel definitions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105099
Approved by: https://github.com/jansel