[inductor] Add CPU-side profiler event names for templates and foreach kernels (#108449)
This passes in the descriptive kernel name as part of the triton_meta dict that gets passed to the CachingAutotuner, for foreach kernels and templates.
Before:
<img width="684" alt="Screenshot 2023-09-01 at 11 56 02 AM" src="https://github.com/pytorch/pytorch/assets/5067123/c14e13fc-0d9e-425a-a08b-613ef42aa264">
After:
<img width="562" alt="Screenshot 2023-09-01 at 2 13 00 PM" src="https://github.com/pytorch/pytorch/assets/5067123/551bb9a9-865b-401e-b6e0-8ebbe5431565">
This PR also refactors the "magic strings" (KERNEL_NAME and DESCRIPTIVE_KRNL_NAME) into an enum in utils.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108449
Approved by: https://github.com/jansel