Improve DistanceKernel.cu (#83811)
include device_sqrt
replace reduce_agg by BlockReduce
choose implementation by impl_fptr instead of error-prone copy-and-paste
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83811
Approved by: https://github.com/ngimel