[jiterator] De-template launch_jitted_reduce_kernel (#80138)
As with `jitted_gpu_kernel_impl`, this
1. Hoists static variables out and into a parent funciton
2. Moves template arguments into the `jit::KernelDescriptor` struct,
as well as changing `vt0` to just be a runtime argument
3. Changes the types of pass-through arguments to `void*`
On my build I see a 0.5 MB decrease in binary size for `libtorch_cuda.so`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80138
Approved by: https://github.com/ngimel